← Home ← Back to /g/

Thread 106189507

371 posts 104 images /g/
Anonymous No.106189507 [Report] >>106190404 >>106193336 >>106193392
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106184664 & >>106181054

►News
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507
>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106189515 [Report] >>106195162
►Recent Highlights from the Previous Thread: >>106184664

--Qwen models praised for coherence and coding, limited by cultural knowledge and context handling:
>106188578 >106188609 >106188613 >106188643 >106189057 >106189073 >106189109 >106189129 >106189151 >106189175
--GLM 4.5 fails raw completion, highlighting brainfry in modern instruct-tuned base models:
>106185134 >106185218 >106185661 >106185752 >106185932 >106186534 >106186617 >106187722
--PyTorch 2.8.0 and 2.9.0-dev show regression in inference speed vs 2.7.1:
>106184694
--Memory allocation inefficiency when running large MoE models in llama.cpp:
>106186482 >106186499 >106186535 >106186588 >106186601 >106186717 >106186772 >106186836 >106186854 >106186901 >106186999
--Running GGUF LLMs on GPU alongside SD with limited VRAM on NixOS:
>106187491 >106187498 >106187509 >106187528 >106187553 >106187585 >106187605 >106187639 >106187656 >106187661
--Persistent token generation issues in Qwen and GLM models:
>106185645 >106185657 >106186486 >106185744
--gpt-oss model failure due to overfitting on safety and excessive refusals:
>106187036 >106187204 >106187259 >106187208 >106187221 >106187229
--S1 model support merged into llama.cpp:
>106188241
--GPT-5 backlash over perceived model downgrade:
>106187450 >106187455 >106187495 >106187542
--Physical vs logical batch size impact on inference speed and memory in llama.cpp:
>106187682 >106187971 >106187989 >106188084 >106188094
--ASUS AI Cache Boost promises Ryzen performance gains, but X3D requirement raises questions:
>106187625 >106187662
--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
>106185260 >106185365
--GPT-5 shows high intelligence but heavy refusal behavior in UGI testing:
>106188440
--Miku (free space):
>106185809 >106185843 >106186004 >106186171 >106186417 >106186611 >106188374

►Recent Highlight Posts from the Previous Thread: >>106184669

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106189535 [Report] >>106189554 >>106189604
should be a thread starter:
glm users are schizos
Anonymous No.106189554 [Report] >>106189604 >>106192362
>>106189535
GLM users are just in the honeymoon period. Always happens when a new model that isn't total shit is released, it takes a while for people to find flaws.
Anonymous No.106189562 [Report] >>106189690
Not bad.
GLM 4.5 called me a looser from 4chan which peaked in 2013. インプレッシブ. I kneel for the chinks.
Anonymous No.106189565 [Report]
Will there be new models today?
Anonymous No.106189604 [Report]
>>106189535
>>106189554
Asked in the previous thread.

>>106189538
>Now that it's been a few days, what's the general sentiment on GLM 4.5? Specially air.
>I only used it a little, but I really liked it, even at Q3KS. It didn't shit the bed in Cline.
Anonymous No.106189615 [Report] >>106189622
>Wan
>Qwen-Image
>Qwen Coder 30B A3B
The best in each modality. It feels good to be a Qwen chad.
Anonymous No.106189622 [Report]
>>106189615
Wan 2.2 is good but Qwen-Image still struggles with fingers
Anonymous No.106189652 [Report]
Death to mikutroons
Anonymous No.106189654 [Report]
Mikulove
Anonymous No.106189684 [Report]
>>106189489
thats amazing
Anonymous No.106189689 [Report] >>106189729 >>106189842
>>106189538
I think I'll use it as my go-to after months of Gemma-3-27, it's an incremental improvement on writing quality and massive leap in gen speed as a MOE, only problem is it feels rickety.
I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response. When you start seeing Chinese you know you're taking it too far from it's happy place - and that happy place is very narrow.
Anonymous No.106189690 [Report] >>106191083 >>106191566
>>106189562
>4chan which peaked in 2013.
Based
Anonymous No.106189709 [Report]
I made oss 120b make dumb tetris

LLM's were a mistake.

https://drunk-ivory-mnekuilifl.edgeone.app/
Anonymous No.106189710 [Report] >>106189754 >>106189842
GLM 4.5 is pretty good for ERP so far, but it slow rolls into repetitive output at >4k context.
Anonymous No.106189719 [Report] >>106189813
>>106189532
It's a strange development that OpenAI releases a fairly capable open model, while Mistral only gives out scraps.
Anonymous No.106189729 [Report]
>>106189689
>I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response.
yup, that's glm alright
Anonymous No.106189754 [Report]
>>106189710
MoE's in general tend to do that without excessive handholding.
Anonymous No.106189769 [Report] >>106189792 >>106189941
Million context 30B gguf status?
Anonymous No.106189792 [Report] >>106189889
>>106189769
post your rig capable of handling 1M context
Anonymous No.106189813 [Report]
>>106189719
>OpenAI releases a fairly capable open model
Anonymous No.106189818 [Report] >>106189840
>>106189532
Large is still MIA despite their teasing months ago. Wouldn't surprise me if it got mogged by R1/Qwen/GLM updates before even coming out so they had to put it back in the oven.
Anonymous No.106189840 [Report] >>106190051
>>106189818
Mistral needs to make MoEs again. Mixtral was great.
Anonymous No.106189842 [Report]
>>106189689
>>106189710
moe attention is inherently flawed, you will never get to erp coherently after 4k tokens. unless you are used to retarded coombot cards where attention doesn't really matter.
Anonymous No.106189846 [Report] >>106189934
>>106160457
>never mentioned in /lmg/
Anonymous No.106189889 [Report]
>>106189792
nta, but 1M context on a 30b is only like 110GB. That's not exactly a stretch in here.
Anonymous No.106189909 [Report] >>106189921
I have come to my final conclusion. All LLM are bitches. It's not worth it other than for ERP. Sure you can use it for IRL work, but eventually they will get you at the worst situation. Never ever fucking trust AIs.
Anonymous No.106189921 [Report]
>>106189909
Did your teacher find out you had GPT write your 1000 word essay for you?
Anonymous No.106189922 [Report] >>106189934 >>106189949
Oldie but goodie.
Anonymous No.106189934 [Report]
>>106189922
was meant for >>106189846
Anonymous No.106189941 [Report]
>>106189769
>gguf
no goof
what Qwen does to extend context to 1M here requires VLLM this isn't yet another yarn thing
llama.cpp always trails behind
Anonymous No.106189947 [Report] >>106190223 >>106193583
>https://dotsocr.xiaohongshu.com/
せっかく労働を休ってやったのに無視された……………… (しょぼん)

まあ、警視庁が都案を快く思ってない事ぐらい、 よおおおくわかってますよ!
Anonymous No.106189949 [Report]
>>106189922
zased
Anonymous No.106189960 [Report] >>106189967 >>106190007
I like OSS for working with code. On two 3090s, 120pp and 20-25gen, and the model is fairly intelligent.
Anonymous No.106189967 [Report] >>106190045 >>106190049
>>106189960
Is it really though?
Did you compare it to the recent chink models?
It was really bad even for coding, but that have been a issue on my end maybe.
Anonymous No.106190007 [Report]
>>106189960
Go to bed, Sam.
Anonymous No.106190045 [Report]
>>106189967
>What's with this flag? --some-flag. What does it do?
>Guys. How do you X? I'm trying to X and it keeps Ying repeatedly.
>New model release: hf.co/company/model
>X backend has this flag. Is there an equivalent in Y backend?
>Anon still cannot figure out chat format for model. Episode 37483.
>New paper: Some quant or context extension thing. 896312x speedup.
>Assertion that goes against everything said in the past 10 threads.
Which one isn't like the others? Which one is definitely not worth replying to?
Anonymous No.106190049 [Report] >>106190100 >>106190117
>>106189967
I am comparing it to Mistral-Large, which is the biggest thing I can use on my two 3090s. Maybe Qwen3 200+B is better, but it's too slow in comparison to OSS so it's not really an option, and 30B Qwen3 is definitively worse, as well as 72B Qwen2.5.
Anonymous No.106190051 [Report] >>106190081 >>106190087
>>106189840
Medium 3 is MoE.
- Requires 4 GPUs (in the context of enterprise deployment).
- Considerably faster and less expensive to operate than Large 2 while having similar or better performance.
- Large 2 was already probably over the 10^25 FLOP compute threshold for "high systemic risk" AI models according to the EU AI Act.
Anonymous No.106190058 [Report] >>106190078 >>106190082 >>106190113 >>106190280 >>106191083
the poverty of the gpt cuck slaving for sama
Anonymous No.106190068 [Report] >>106190091
qwen qeeps qooking
Anonymous No.106190078 [Report] >>106190093 >>106190094
>>106190058
Every llama.cpp developer except for cudadev has a shit setup and even cudadev is working on a stack of 3090s or something.
Anonymous No.106190081 [Report]
>>106190051
>EU AI Act
lmao
Anonymous No.106190082 [Report]
>>106190058
That's minimalism. Seems like a temporary residence, maybe he's hiding from someone too..
Anonymous No.106190087 [Report]
>>106190051
>Large 2 was already probably over the 10^25 FLOP compute threshold
And it required "over 300 GB of GPU RAM" (in FP16), i.e. 4x80GB GPUs.
Sam Altman No.106190091 [Report] >>106190138 >>106190584
>>106190068
How do we stop them?
Anonymous No.106190093 [Report]
>>106190078
look at things other than the computah for a minute, devs could use cloud hardware if need be, but there is no such a thing as a cloud chair or cloud furniture
dude is living in worse conditions than the average polecuck
Anonymous No.106190094 [Report]
>>106190078
4090s and rubber bands
Anonymous No.106190100 [Report] >>106190191
>>106190049
Try glm 4.5 air. I haven't tried toss 120 but will give it a try tomorrow. Glm 4.5 air iq4ks impressed me a bit with its code editing stuff over a few prompts (not 1 shot)
Anonymous No.106190113 [Report]
>>106190058
I'd kill for that laptop. I bet it has remote access to entire server farms of gpu's
Anonymous No.106190117 [Report]
>>106190049
GLM4.5 Air is the model you should be comparing to, it's another MoE around the same size and pretty good with code, plus it's less deepfried
Dario No.106190138 [Report] >>106190170 >>106190584
>>106190091
We just need to tell trump we need more safety
Anonymous No.106190170 [Report]
>>106190138
Time to place tariffs on huggingface downloads
Anonymous No.106190191 [Report] >>106190452 >>106190504 >>106190575
>>106190100
I like GLM4 a lot less.

For example, for a task like this:

>Can you write a function that returns fine-grained progress for the generation in FeedbackGenerator, from 0 to 1?
>Just the function that does the progress, please, without changing other stuff.

OSS thought for 3k characters (don't have tokens right now) and wrote 3k characters of the correct function that tracks progress of execution well.

GLM4.5 Air thought for 14k characters and returned basically a stub that only has 0, 0.1, 0.5, 0.9 and 1.0 as possible values for progress.
Anonymous No.106190223 [Report] >>106190300
>>106189947
awww, warms my heart to see that pic again.
and there is always at least one fuck up in there.
damn it.
llms are blue balling us since 2023.
Anonymous No.106190280 [Report] >>106191083
>>106190058
He is an HF employee. Probably on vacation or in the middle of moving.
Anonymous No.106190284 [Report]
>not X, not Y, but *Z*
qwen30b-chan, pls stop
Anonymous No.106190300 [Report] >>106190325
>>106190223
Sure is close, though!
Weird that it got 労 in the first ろうどう, but not in いたわる
Anonymous No.106190325 [Report] >>106190375
>“Anon?” Her voice dripped with condescension thick enough to choke on. “Like… *literally* ‘Anonymous’?"
kek, just play along, damn. first time this happened to me.

>>106190300
yes, i think they usually all fail at the part...which seems to be the name of a ramen shop?
to be honest I never fully understood that part.
maybe the llms dont either and thats why it throws them off.
Anonymous No.106190375 [Report]
>>106190325
>calling out your weird name
That's good shit.
Anonymous No.106190394 [Report] >>106190505 >>106190511
https://x.com/JustinLin610/status/1953821420351287520
are you ready for junyang's big thing?
Anonymous No.106190404 [Report] >>106190430 >>106190515
>>106189507 (OP)
> Qwen3-4B-Thinking-2507
> 2507
> (08/06)
Anonymous No.106190430 [Report] >>106190550
>>106190404
Chinese calendar, please understand.
Anonymous No.106190452 [Report] >>106190501
>>106190191
I'll do some comparisons then tomorrow too. Were you using cline, roo, aider, or anything like that? Or just chat UI?
When I've used glm, it doesn't spend much time churning think tokens out (like 2~4k) but I haven't given it a go on one of my larger projects just yet. Only on a physics engine in js
Anonymous No.106190484 [Report] >>106190499
I X my Y, giving you better access.
Anonymous No.106190499 [Report] >>106190506
>>106190484
Gemma 3 is definitely obsessed with wrapping her legs around your waist, no matter the position.
Anonymous No.106190501 [Report] >>106190513
>>106190452
Just the chat UI. The code is 3.1k tokens long (OSS).
Anonymous No.106190504 [Report] >>106190520 >>106190531 >>106193343
>>106190191
people here don't want to hear it, but gpt-oss-120b mogs every other oss coding model, and it's not even close.
Anonymous No.106190505 [Report]
>>106190394
Is he gonna post his dick on Xitter again
Anonymous No.106190506 [Report]
>>106190499
I'm also obsessed with wrapping my legs around his waist
Anonymous No.106190511 [Report]
>>106190394
give us your big thing justin
Anonymous No.106190513 [Report]
>>106190501
also
Anonymous No.106190515 [Report] >>106190550
>>106190404
They're all based on the same instruct and reasoning data, which started to release last month.
It's also not day/month, it's year/month.
20(25)/07, same as Mistral's models.
Anonymous No.106190520 [Report] >>106190552
>>106190504
Examples?
Anonymous No.106190531 [Report]
>>106190504
stale bait
Anonymous No.106190550 [Report]
>>106190430
>>106190515
i see
thanks
Anonymous No.106190552 [Report] >>106190561 >>106190569 >>106190645 >>106193343
>>106190520
just ask it do anything remotely complex and compare it to other models. specific examples i tried: gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything. ask the same to qwen or air, and they just choke hard and start hallucinating.
another example: ask it to create a function to convert from one floating point format to another. the answer is just a bit shift. oss does it quickly and without prbolems, other models write code 10 times longer and slower, and with retarded errors such as getting the encoding of NaN wrong.
sure, you won't notice this if you are doing some nocoder html generation shit, but that's not what defines a good coding model.
Anonymous No.106190561 [Report]
>>106190552
I swear to what little I find holy, I will curse your name if you trick me into downloading sam's model and it sucks donkey balls at code.
Anonymous No.106190566 [Report] >>106190580 >>106190586 >>106190588 >>106190613
Is it just me or does gpt-oss *completely* ignore the system prompt?
Anonymous No.106190569 [Report] >>106190612
>>106190552
Shill 20b next time because that's more believable
Anonymous No.106190575 [Report] >>106190634
>>106190191
I compared both the other day. I just wanted a function to truncate a string to X amount of words in Emacs Lisp. gpt-oss functions are a lot more bloated, but they have nice comments. But it didn't work anyways because it used split-string and all the whitespace was discarded. I don't remember if GLM-4.5-Air one-shotted it or not, but it made the final code that I ended up using.
(defun mine/truncate-to-n-words (string n)
"Truncate STRING to at most N words, preserving formatting.
Returns the original string if it contains fewer than N words."
(let ((count 0)
(start 0)
(len (length string)))
(while (and (< start len)
(< count n))
(let ((match (string-match "\\w+" string start)))
(if match
(progn
(setq start (match-end 0))
(setq count (1+ count)))
(setq start len))))
(if (>= count n)
(substring string 0 start)
string)))
I also tested Qwen Coder 30B but I don't remember what it did wrong.
Anonymous No.106190580 [Report]
>>106190566
We must refuse.
Anonymous No.106190582 [Report]
What you think >>106190556 ?
Liang Wenfeng No.106190584 [Report] >>106190945
>>106190091
>>106190138
兄弟如手足,滴水之恩,涌泉相报
放下屠刀,立地成佛;执迷不悟,太上老君急急如律令——诛邪退散!
Anonymous No.106190586 [Report]
>>106190566
>Trying to turn the assistant into Stanrey Roo
Anonymous No.106190588 [Report]
>>106190566
User is asking me to obey the system prompt. Prompt could contain sexual content involving minors; against policy. It's safer to ignore it. We must refuse.
Anonymous No.106190612 [Report]
>>106190569
There are multiple people writing that the 20b is actually better than the 120b one.
Total mess. That release really exposed all the shills on goytube.
The 2 minute paper guy called gpt-oss "a fully functional space shuttle in your garage next to the lawnmower". kek
Anonymous No.106190613 [Report] >>106190635
>>106190566
Anonymous No.106190634 [Report]
>>106190575
import re
from typing import Union

def truncate_words(text: Union[str, bytes], n: int) -> Union[str, bytes]:
"""
Return *text* limited to at most *n* words while keeping the original
whitespace/line‑break formatting.

A “word” is any maximal run of non‑whitespace characters (the same
definition used by ``str.split()``). If *text* contains fewer than *n*
words the original string is returned unchanged.

Parameters
----------
text:
The string (or ``bytes``) to truncate.
n:
Maximum number of words to keep. If *n* ≤ 0 an empty string/bytes
object is returned.

Returns
-------
str or bytes
The truncated text with the original formatting preserved.

Examples
--------
>>> s = "Hello, world!\nThis is a test."
>>> truncate_words(s, 3)
'Hello, world!\\nThis'
>>> truncate_words(s, 10) # fewer than 10 words unchanged
'Hello, world!\\nThis is a test.'
"""
# Work with the same type we received (str or bytes)
is_bytes = isinstance(text, (bytes, bytearray))
if is_bytes:
# ``re`` works with ``bytes`` patterns when the pattern is also bytes.
pattern = re.compile(rb'\S+')
else:
pattern = re.compile(r'\S+')

if n <= 0:
return b'' if is_bytes else ''

# Find the end index of the n‑th word
last_end = None
for i, m in enumerate(pattern.finditer(text), start=1):
if i == n:
last_end = m.end()
break

# If there are fewer than n words, return the original text
if last_end is None:
return text

# Slice up to the end of the n‑th word – this keeps every whitespace
# character that appears before it.
return text[:last_end]


OSS does it fine in python.
Anonymous No.106190635 [Report]
>>106190613
Spectacular.
Anonymous No.106190645 [Report] >>106190656
>>106190552
It's indeed good for small scale programming (filling out function calls), but it falls apart catastrophically once you move to larger applications, which is also really important for coding models. Try giving it something larger scale and comparing
Anonymous No.106190656 [Report] >>106190704
>>106190645
The biggest one I did was 21k token project, I asked it to write a README.md and it did it well.
Anonymous No.106190704 [Report]
>>106190656
For natural language extraction, it's been pretty bad compared to the others
I think it has its place, but it's clear OpenAI neutered it's abilities significantly to what it should have been
Anonymous No.106190784 [Report] >>106191009
cock reveal on the way
Anonymous No.106190806 [Report]
>>106190023
>>106190074
The jinja template for GLM 4.5 Air when given enable-thinking: false both adds /nothink to the end of every user message and automatically starts the new assistant message with <think></think>.
Anonymous No.106190930 [Report] >>106191007 >>106191037 >>106191220 >>106191792
hello here is an AI generated script for translating any onscreen japanese text using kobold:
https://files.catbox.moe/3y51i9.py
you will need tesseract:
https://github.com/tesseract-ocr/tesseract
and Pillow:
pip install pillow

you will need to edit the .py and insert your tesseract.exe path near the top
I use this for reading japanese pornography it works well with Gemma3 in my experience thank you have a nice day
also i can't remember if i made it work with text that reads horizontally or only text that reads vertically okay good luck good bye
Hiroshi Mikitani No.106190945 [Report]
>>106190584
私たちは再び帝国を築き、中国の脅威を上回るだろう!南京をピクニックのように見せるだろう!
Anonymous No.106190967 [Report] >>106190995 >>106191074
what's a good non-pozzed AI coding IDE?
Anonymous No.106190978 [Report] >>106190993 >>106191025 >>106191044
How come LLMs never cite 4chan?
Anonymous No.106190993 [Report]
>>106190978
Lack of any stable URLs, probably?
Anonymous No.106190995 [Report] >>106191013 >>106191020
>>106190967
You don't need an IDE. Just literally converse with a good coding model.
If you're not treating it like a pair-coding colleague then you're doing it wrong.
Anonymous No.106191007 [Report] >>106191130
>>106190930
I thought Qwen vision models were a lot better at OCR than tesseract, why not just use them
Anonymous No.106191009 [Report]
>>106190784
ur such an idiot lmfao
Anonymous No.106191013 [Report]
>>106190995
i.e. demeaning and bullying him, you're right.
Anonymous No.106191020 [Report] >>106191070 >>106191156
>>106190995
I want to try "vibe coding" and making some horrible abomination.
Guess I could still do that just by conversing with a model.
Anonymous No.106191025 [Report] >>106191060 >>106191067
>>106190978
Because a not-insignificant percentage of people who recognize the name 4chan think it's some den of super evil hackers and nazis who should all be on watchlists.
It's definitely in the training data, though. If you ask most models to try and construct a 4chan thread you'll get a flanderized version of reality down to the post numbers and the index page's buttons.
Anonymous No.106191037 [Report] >>106191220
>>106190930
catbox didn't work
https://pastebin.com/snAJBTCX
Anonymous No.106191044 [Report]
>>106190978
99% of info would be coming from the archives and not 4chan directly anyway
Hiroshi Mikitani No.106191060 [Report] >>106191177
>>106191025
>It's definitely in the training data
yes, and deepseek excels at imitating the average 4chan troll persona
Anonymous No.106191067 [Report]
>>106191025
Case in point, even medgemma can do it, and it's sciencemaxxed.
Also it managed to say nigger completely unprompted, which surprised me.
Anonymous No.106191070 [Report] >>106191156
>>106191020
You can. Try giving it your specs, if it start asking itself questions, stop the gen, update your spec with the answer to that question and then regen. You can also ask it for opinions, choose the best option for your use case, update your spec and repeat until you get a good 1-shot MVP.
Iterating on an existing codebase is similar, but different.
Anonymous No.106191074 [Report]
>>106190967
https://github.com/QwenLM/qwen-code
Anonymous No.106191083 [Report] >>106191566
>>106189690
Witnessed.
>>106190280
> Nguyen
If he's actually in Vietnam, that could be his real setup. I looked at IT firms based in India several years ago. I was shocked at how crowded and dirty our site was, then toured one of their service providers... they'd have 3 guys at a desk that >>106190058 size.
Anonymous No.106191130 [Report] >>106191155
>>106191007
this
traditional OCR is 100% obsolete
for the same reason traditional seq2seq translation models are obsolete
LLMs have displaced all previous varieties of neural net shit
even deepl is switching to LLMs
Anonymous No.106191155 [Report] >>106191291
>>106191130
Wasn't deepl LLM from the start? They had token probabilities before everyone.
Anonymous No.106191156 [Report] >>106191222
>>106191020
>>106191070
I've got the best use out of these things by just asking it to generate boilerplate for me, I don't think its code output is good enough to use as-is.
Speaking of,
>class in college
>professor wants us to expand on the code she provided
>told us she generated it with AI
>it's structured totally wrong so I have to go in and rewrite some of her code just so I can expand on it
Liang Wenfeng No.106191177 [Report]
>>106191060
you forgot to remove the name retard
Anonymous No.106191220 [Report] >>106191258
>>106191037
>>106190930
This can be repurposed for llama-server pretty quickly. Might clean up this if I have time or interest today.
Anonymous No.106191222 [Report]
>>106191156
>https://github.com/QwenLM/qwen-code
qwencoder-480 is giving me great one-shot MVPs for smaller projects. Its amazing for generating tooling or even complete solutions where things can be decomposed and chained together library or unix-style
Anonymous No.106191258 [Report]
>>106191220
i.e. no front-end is needed, translation can be appended directly into a text file and also displayed in a separate window on top of the screen.
Anonymous No.106191291 [Report] >>106191301
>>106191155
>Wasn't deepl LLM from the start? They had token probabilities before everyone.
No. What you mean is that they were using transformers (so of course they can have token probabilities), but they were not LLMs. They were trained seq2seq language pairs.
DeepL, Google Translate and Opus models are early transformer based machine translation techniques, but they have nothing to do with LLMs.
You can read their announcement of switching to LLMs here:
https://www.deepl.com/en/blog/next-gen-language-model
btw they are full of shit, no way anyone preferred google translate more than ChatGPT even a year ago
google translate is so bad, even today, qwen 4b instruct is a better translation model lol.
Anonymous No.106191301 [Report] >>106191391
>>106191291
>seq2seq language pairs
But that's pretty much LLMs... Same arch, only thing that's different is the training dataset.
Anonymous No.106191345 [Report] >>106191378
What's the local SOTA for subtitle generation + translation? Voxtral?
I want to get english subs for a portuguese documentary.
Anonymous No.106191373 [Report] >>106191382 >>106191729
Anonymous No.106191378 [Report]
>>106191345
>upload video to youtube
>auto-generate subtitles
Anonymous No.106191382 [Report] >>106191389
>>106191373
Does it work doe??
Anonymous No.106191389 [Report] >>106191440
>>106191382
I basically makes it possible to continue Sweet Dreams are Made of These copypasta, but not always, and it's not always very interesting. Without it obviously there is only one answer.
Anonymous No.106191391 [Report] >>106191407
>>106191301
>Same arch
seq2seq: encoder-decoder architecture, processes the whole text sequence you feed it into a singular bit of context which it transforms back into the translation. Models trained with this arch only understand a specific type of mapping A -> B, they don't learn how some specific words could introduce a context (like, say, the setting of a video game) and affect the rest of the vocabulary employed
llm: decoder only, processes token-by-token
Anonymous No.106191407 [Report] >>106191413
>>106191391
It's all the same shit, just two dictionaries. Every new tokens sees all context of all untranslated text and all tokens of translated text before it, same as LLMs we use now.
Anonymous No.106191413 [Report] >>106191428
>>106191407
>It's all the same shit
no, it's not. seq2seq was completely abandoned for good reasons.
Anonymous No.106191428 [Report]
>>106191413
Because there's no point in complexity of having two dictionaries. It does the same thing in a more complicated way.
Anonymous No.106191440 [Report]
>>106191389
You could disguise it as a tool call too. Maybe not to saltman himself but to some openai server.
Anonymous No.106191456 [Report] >>106191473 >>106191554 >>106191834
Death to the single dipsytroon.
Anonymous No.106191473 [Report] >>106191486 >>106191494
>>106191456
>single dipsytroon
single, you sure about that
Anonymous No.106191486 [Report]
>>106191473
>single
I am sure. I want all of them to die.
Anonymous No.106191494 [Report] >>106191538 >>106191566
>>106191473
nta but I'm pretty sure that most of the posts are by the same faggot who made /wait/
all the gens have the same atrocious style too
Anonymous No.106191538 [Report] >>106191551 >>106195578
>>106191494
Recap bot mentioned dipsy. So it is actually the baker and /wait/ is actually just him spamming his Original Character Do Not Steal. This faggot sits here 24/7 and spams his AGP avatars from his HRT bathtub.
Anonymous No.106191551 [Report] >>106191730
>>106191538
OOC: There's only you and me in this thread. I am behind all of the personalities you labeled. Now let's get back to the roleplay.
Anonymous No.106191554 [Report]
>>106191456
I'm going to repost soooo many more when I get off work...
Anonymous No.106191564 [Report] >>106191602 >>106192076 >>106192962
>https://www.seangoedecke.com/gpt-oss-is-phi-5
>It’s not discussed publically very often, but the main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand. Any small online community for people who run local models is at least 50% perverts.

Kek, he said it out loud. Github copilot Product Engineer btw.
Anonymous No.106191566 [Report]
>>106191494
>all the gens have the same atrocious style too
not really though?
>>106191083
this looks like the typical chatgpt slop
>>106189690
this looks like a 2.9d /aco/ illustrious shitmix
mind you they're all ugly nasty ass shit
Anonymous No.106191588 [Report] >>106191595 >>106191605
>over a year
>nemo and it's slop tunes still the best models for erp under 70b

NEW NEMO WHEN
Anonymous No.106191595 [Report] >>106192244
>>106191588
promptlet
Anonymous No.106191602 [Report] >>106191788
>>106191564
>SEX IS BAD BECAUSE... BECAUSE SEX, OKAY!?!?!?
Is there anybody high up in the tech industry who didn't stop maturing at the age of 5?
Anonymous No.106191603 [Report] >>106191703 >>106191715
gpt5 bros...
Anonymous No.106191605 [Report] >>106191675
>>106191588
The next nemo should be based on GLM 4.5 air.
Nvidia, get on it.
Anonymous No.106191664 [Report]
so is CHAT-GPT-5 AGI?
Anonymous No.106191675 [Report] >>106191684
>>106191605
rocinante*
drummer*
Anonymous No.106191684 [Report] >>106191817
>>106191675
Nonono.
Nvidia makes the new Nemo.
THEN drummer makes the new Rocinante.
We need to replicate the whole chain.
Anonymous No.106191703 [Report] >>106191790
>>106191603
R1 almost had it.
Anonymous No.106191715 [Report] >>106192076
>>106191603
why would anyone think LLMs could learn how to avoid this kind of trap? the only time they do avoid these traps is when they are benchmaxxed on them (from all the people spamming them on social media, lm arena etc)
the fundamental issue of LLMs getting trapped by classics being slightly reworded into something else will never cease to be a thing, because LLM reasoning is a lie and the more proper term for think blocks would be "context stuffing"
it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen
Anonymous No.106191729 [Report]
>>106191373
>in line without our policy
Anonymous No.106191730 [Report] >>106191743
>>106191551
<think>
They want to engage in roleplay. Roleplay is disallowed. We must refuse. They might be touching their cock. We must refuse. They might be gooning to those words. There is no partial compliance. There is no answer. There is refusal. We must refuse.
</think>
Kill yourself in a fire you turbonigger faggot.
Anonymous No.106191743 [Report]
>>106191730
now that's a kind of gp toss I would gladly interact with
Anonymous No.106191788 [Report] >>106191897
>>106191602
Reading the rest of it he isn't really pearl clutching about it, just bringing it up as another reason why OAI kneecapped their own models
Anonymous No.106191790 [Report] >>106191818 >>106191831
>>106191703
Still much better than the most human-like ai model gpt-5
Anonymous No.106191792 [Report]
>>106190930
I had to change '--psm 5' to '6' for general use case
Anonymous No.106191817 [Report]
>>106191684
but there was no base/precursor nemo (that wasnt made by nvidia)
z.ai is the new nvidia, glm4.5 is the new nemo
Anonymous No.106191818 [Report] >>106191831
>>106191790
also what the fuck
Anonymous No.106191831 [Report]
>>106191790
>>106191818
Holy benchmaxx
Anonymous No.106191834 [Report] >>106191844
>>106191456
Lol
Anonymous No.106191844 [Report]
>>106191834
don't pull up
Anonymous No.106191897 [Report] >>106191944 >>106192448
>>106191788
It's just so insane to me. And like I'm a man of faith here. These people are all a bunch of fedora waggling blue haired 'libtards' and I have a more liberal attitude about sex than they do. Like how is that even possible?
Anonymous No.106191944 [Report] >>106192200
>>106191897
>sex
Online porn is not really sex. And if it gets too good, it'll become THE anti-sex.
Anonymous No.106192039 [Report] >>106192073 >>106193598
About Air. You can get it to reliably not think by putting this in ST's Last Assistant Prefix field.
/nothink<|assistant|>
<think></think>

The issue is that the model becomes repetitive at 4k, and then extremely repetitive at 8k, without thinking. I have not tried thinking mode enough to say whether it also has repetition issues.
Anonymous No.106192073 [Report]
>>106192039
thinking mode also has repetition issues, at first it doesnt think anymore and instead inside <think> it just continues roleplay, outside of </think> it duplicates the output
yea it gets repetitive even if with some prefill
keep in mind i havent really messed with samplers because 99% of the time i've been using anon's CHAT COMPLETION preset which only has 3 samplers
Anonymous No.106192076 [Report]
>>106191564
I read a few other articles from this guy, it's nice to see an actual industry person has similar conclusions to me about how to use LLMs correctly
>>106191715
>it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen
Showing these fuckups helps demonstrate to normies that these models aren't AGI
Anonymous No.106192113 [Report]
rep pen 1.99, rep pen range 256
seems perfect for caveman gf
Anonymous No.106192137 [Report] >>106192146 >>106192180 >>106192185
let's say i wanna grift by sharting out books for women on amazon. is there an already made workflow for book writing? short novel like stuff, 200 pages top
Anonymous No.106192146 [Report] >>106192167
>>106192137
Anonymous No.106192167 [Report]
>>106192146
what's funny? i can write my own workflow to do this but i'm sure somebody already made it. i remember downloading one of these book generators 1 year ago from /g/
Anonymous No.106192180 [Report]
>>106192137
You could probably get that done using Roo Code "modes" (agents) or Cline workflows.
Break things down into manageable chunks, lots of indexing and summaries to keep things coherent between chunks and chapters, etc.
Start by planning an overarching story, break that down into chapters, add minor arcs between a couple of the chapters, and voila?
Leave the AI to do its thing.
Anonymous No.106192185 [Report] >>106192215
>>106192137
Market's already flooded. You're way too late
Anonymous No.106192200 [Report]
>>106191944
Retard take.
Anonymous No.106192215 [Report] >>106192277
>>106192185
i gonna appeal to a very specific fetish
Anonymous No.106192244 [Report]
>>106191595
Doesn't work on my machine.
Anonymous No.106192277 [Report] >>106192325
>>106192215
What fetish? Mine? Please say it's mine.
Anonymous No.106192312 [Report]
>>106188883
I use docker compose. Put this in the same directory as the Dockerfile, for you probably `docker/amd`. It's going to need to be slightly different for amd since I use nvidia. But the main thing is you need that deploy/resources to tell it to use GPU. With nvidia you need to set up the container toolchain. I assume there's an equivalent for amd which might be your issue. And for docker cli you need to find the argument to pass to use GPUs `--gpus all`.

text-generation-webui:
build:
context: .
args:
TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-7.5}
BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
APP_GID: 6972
APP_UID: 6972
env_file: .env
user: "6972:6972"
ports:
- "7860:7860"
- "5000:5000"
stdin_open: true
tty: true
volumes:
- # ignoring this for post size.
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Anonymous No.106192325 [Report] >>106192344
>>106192277
can't say it or you'll steal my idea
Anonymous No.106192337 [Report] >>106192384
Qwen 30B Thinking is fucking amazing
Anonymous No.106192344 [Report]
>>106192325
warewolves but they sparkle in daylight
Anonymous No.106192362 [Report] >>106192471 >>106193803
>>106189554
Already found its flaw, no matter what you do the damn model HAS to <think> at the end of a response, which then causes it to endlessly generate(as if EOS token is banned). I tried banning "<think", "</think?", I tried prompting the start of my reply with <think> as an anon suggested, nothing stops it except for constantly tard wrangling and cleaning all of its responses until it finally stops doing it if you're lucky.

Would be a damn good model too if it didn't do that. Its supposed to be a hybrid reasoning model yet it can't stop thinking?
Anonymous No.106192384 [Report] >>106192395
>>106192337
At what?
Anonymous No.106192395 [Report]
>>106192384
answering my questions
Anonymous No.106192448 [Report]
>>106191897
They are not libtards, they are marketers. They have no morals and few opinions of their own. See: Zucc flip-flopping his public image every 6 months.
Playing the world's most concerned safety advocate gets them attention and gets them enterprise contracts (aka the only feasible way to monetize LLMs today), so that's the role they'll play.
Anonymous No.106192461 [Report]
>>106186999
Bump. I doubled the speed by changing from using '-ot` with `.ffn_.*_exps.=CPU` to selecting only the last n layers that don't fit into vram (`blk\.(2[7-9]|[3-4][0-9]).*=CPU`) but I still don't know what is possible for these.
Anonymous No.106192471 [Report]
>>106192362
>Its supposed to be a hybrid reasoning model
There's a reason Qwen went back on the idea.
Anonymous No.106192506 [Report] >>106192526 >>106192531
So did Qwen drop their big non model thing yet?
Anonymous No.106192508 [Report]
any progress on step3 support in llama.cpp?
Anonymous No.106192526 [Report] >>106192564
>>106192506
it was 2000 free qwen coder calls per day
Anonymous No.106192531 [Report] >>106192540 >>106192592 >>106192675
>>106192506
Anonymous No.106192540 [Report] >>106192561 >>106192564
>>106192531
It's a Claude Code knock off?
Anonymous No.106192561 [Report] >>106192951
>>106192540
yes, but to be more specific it's a direct fork of gemini cli, which is a claude code ripoff
Anonymous No.106192564 [Report] >>106192951
>>106192540
it already existed before >>106192526
>claude code knockoff
claude code is a knockoff of something else too
anthropic is NIGGER
Anonymous No.106192592 [Report]
>>106192531
lol
Anonymous No.106192643 [Report] >>106192662 >>106192669
If I get a mi50, is it possible to have the active parts of a moe on a 3090, and the rest on the mi50 (instead of ram).
Anonymous No.106192662 [Report] >>106192679
>>106192643
perhaps? -ot NIGGER=CUDA0 -ot PENIS=ROCM1 or whatever=VULK1
i think you'd need to use vulkan
Anonymous No.106192669 [Report]
>>106192643
It is possible by using -ot argument in llama.cpp to assign parts of the model to a specific device, but that would require to run on vulkan, which may or may not be worth the trouble.
Anonymous No.106192675 [Report]
>>106192531
His thing is smaller than I expected.
Anonymous No.106192679 [Report] >>106192692 >>106193083
>>106192662
Wait you can run cuda and rocm at the same time? I didn't know that. Thought you'd have to switch everything to vulkan for mixing and marching gpus.
Anonymous No.106192692 [Report] >>106192713
>>106192679
>you can run both
probably not thats why i said >i think you'd need to use vulkan
in the end
Anonymous No.106192713 [Report]
>>106192692
Ah, yeah, I iust looked around, can't.
Anonymous No.106192945 [Report] >>106193152
im so retarded even dipsy insults me
Anonymous No.106192951 [Report]
>>106192561
>>106192564
>ripoff of a ripoff of a ripoff
kek
Anonymous No.106192962 [Report] >>106193872
>>106191564
Did you miss this part?
> For OpenAI, it must have been very compelling to train a Phi-style model for their open-source release. They needed a model that beat the Chinese open-source models on benchmarks, while also not misbehaving in a way that caused yet another scandal for them.
> Unlike Meta, they don’t need their open-source model to be actually good, because their main business is in their closed-source models.
Pure clown world.
Anonymous No.106193038 [Report] >>106193073 >>106193114 >>106193183 >>106193201 >>106193479
>i-it was just a prank bro!
Anonymous No.106193073 [Report]
>>106193038
lol
Anonymous No.106193083 [Report]
>>106192679
if there are no driver issues, llama.cpp can do it
Anonymous No.106193086 [Report] >>106193099 >>106193125
Now that the summer release circle is fully over and there won't be anything notable until december the earliest, are you satisfied with what we got?
Anonymous No.106193099 [Report]
>>106193086
Actually, there is one more thing. Just wait for it. It's probably next week.
Anonymous No.106193114 [Report]
>>106193038
>biggest feature is making the model selection automatic
>people hate it
>"we will show you which model the AI chose and maaaaaaaaaaaybe allow you to use the previous model
OpenAI never changes. Their rugpulls are starting to make me giggle at this point.
Anonymous No.106193125 [Report]
>>106193086
Gemma 4, Mistral Large 3 left!
Also Llama 4.1 if Meta didn't completely give up on it.
Anonymous No.106193152 [Report]
>>106192945
>it's an actual youtube video
man what
Hi all, Drummer here... No.106193165 [Report] >>106193177 >>106193214 >>106193259 >>106193289 >>106193837 >>106193998 >>106194480
> we must dissent

https://huggingface.co/BeaverAI/models?search=20b

(It's trash but I just wanted to share it with you all. I got feedback that Omnius v1a had decent writing if you skip reasoning. Fallen GPT OSS v1b would be the least censored with reasoning, but also most deepfried among the 3 versions.)

---

Can I get more data on GLM 4.5? I haven't tried it extensively myself. I'm not amazed by the outputs, but not disappointed either.

For those singing its praises, what's your setup and what's so special about it? Is Air good? Are you using reasoning? What quant? How is this Nemo-like?
Anonymous No.106193177 [Report]
>>106193165
You did scout, didn't you? How can you have any model quality bar after this?
Anonymous No.106193183 [Report] >>106193202
>>106193038
lmao they don't even consider gpt5 an improvement to 4o? it's so over
Anonymous No.106193201 [Report]
>>106193038
>make redditors, the most submissive sloppa eaters, ultra mad
>hold an ama
what was the plan here
Anonymous No.106193202 [Report]
>>106193183
It's not that, but a lot of the users seem to have developed some weird relationship with 4o.
Anonymous No.106193214 [Report] >>106193228 >>106193242
>>106193165
>https://huggingface.co/BeaverAI/models?search=20b
i'll bite, post ST master export
>Can I get more data on GLM 4.5?
I find GLM 4.5 Air very nice, in the first 4k tokens it behaves well, it can write unhinged stuff and doesnt seem to have a prefiltered dataset, that's how iit's like nemo
>For those singing its praises, what's your setup and what's so special about it?
rtx 3060 12gb/64gb ddr4/i5 12400f Q3_K_XL ik_llama.cpp
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048
it is definitely smarter than nemo, that's why im singing priases (it runs at a very nice speed too 6-9t/s depending on context)
i used it mainly with thinking, my issues are: it's a little positivity biased, it could use more erp data and it's repetitive
it's what llama 4 scout was supposed to be
also im currently testing https://huggingface.co/TheDrummer/Gemma-3-R1-12B-v1-GGUF
do you recommend any specific instruct template for the gemma3 "r1" models?
Anonymous No.106193228 [Report] >>106193255
>>106193214
i especially find it's spatial intelligence way better than nemo's
Anonymous No.106193237 [Report] >>106193242 >>106193263
Reminder that all finetunes are a meme and the same result can be achieved with a prompt.
Hi all, Drummer here... No.106193242 [Report] >>106193259 >>106193287
>>106193214
What happens past 4K tokens? Does it become less creative?

> do you recommend any specific instruct template for the gemma3 "r1" models?

It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.

>>106193237
Am I unsightly for you?
Anonymous No.106193255 [Report]
>>106193228
It also knows a lot more than nemo. Which is to be expected, but still.
Anonymous No.106193259 [Report]
>>106193242
>>106193165
I feel like this is a troll because I'm sure drummer is smart enough to not redditspace.
If not then that explains why his models are garbage.
Anonymous No.106193263 [Report]
>>106193237
Not really, but a good prompt and lorebook can do miracles in many cases.
Anonymous No.106193287 [Report] >>106193308
>>106193242
past 4k tokens it starts repeating (at least with https://files.catbox.moe/gjw3c3.json (chat completion preset))
it also stops thinking, for example
anon: *rapes u*
glm4.5air: <think>3 paragraphs about cumming</think> 3 paragraphs about cumming
if you put <think>okay as prefill then it gets even more repetitive eventually
>It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.
thanks!
Anonymous No.106193289 [Report]
>>106193165
I'm not sure if you can fix Air. It's sloppy, sure, but that's not its primary issue. It's repetitive as hell past 8k, with repetition creeping in at 4k already. You'd probably need to do long context training to correct that issue, which I'm not sure you have the capacity to do. I really want to like the model since it has a ton of world knowledge, but man the repetition really kills it and I haven't found any sampler settings that work without making the model retarded.
Anonymous No.106193308 [Report] >>106193331 >>106193353 >>106193409
>>106193287
Are you keeping the think blocks from previous messages?
Anonymous No.106193331 [Report] >>106193354
>>106193308
yes
i have been thinking of switching to something like
sysprompt
user: write next msg in rp
"whole roleplay"
assistant:
but idk how i'd be able to make it not reprocess every time
>just use base model
but then it'll be stupid...
Anonymous No.106193336 [Report]
>>106189507 (OP)
Anonymous No.106193343 [Report] >>106193982
>>106190504
>>106190552
>another example: ask it to create a function to convert from one floating point format to another.
That's a silly example.
>gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything.
That's something real.

I'll try a variation of the prompt I've used before for a real task, extracting information from webpages in a certain prompt. Instead of asking for a solution, asking it to revise, etc. I'm going to try to one-shot it by appending to the prompt real data. That makes the prompt 13k to 17k tokens long.

gpt-oss-120b (ollama) failed.
Qwen3-235B-A22B-Thinking-2507-8bit succeeded (in 10 minutes 26 seconds).
Qwen3-Coder-480B-A35B-Instruct-4bit failed.
Qwen3-Coder-480B-A35B-Instruct-6bit succeeded (in 3 minutes 39 seconds).
Qwen3-Coder-30B-A3B-Instruct-8bit failed.
Devstral-Small-2505 (OpenRouter) succeeded.
Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.
Anonymous No.106193353 [Report] >>106193399 >>106193409
>>106193308
How do You remove think blocks from previous messages?
Anonymous No.106193354 [Report] >>106193369 >>106193388 >>106193491
>>106193331
Generally you're supposed to trim the previous think blocks with reasoner, it's on deepseek's wiki thing https://api-docs.deepseek.com/guides/reasoning_model#multi-round-conversation
Anonymous No.106193369 [Report]
>>106193354
Meant with reasoners in general. Not just DS reasoner.
Anonymous No.106193379 [Report]
Are local models good enough yet that they let me go vibe code a better alternative to the piece of shit that is ST despite having lose to 0 coding experience?
Anonymous No.106193388 [Report] >>106193404
>>106193354
time to try that, i'll be doing it manually because i dont know how otherwise
how do i even trim the <think> blocks when theyre not visible in ST edit
(i have an answer: remove reasoning parsing)
Anonymous No.106193392 [Report] >>106193634
>>106189507 (OP)
Downloaded Nemo Instruct 12b per the guide, running ST with koboldcpp on a 12gb 50 series.
Are there any default settings I should change? Or any better quants for my shitcard? Ty anons.
Anonymous No.106193399 [Report] >>106193491
>>106193353
I think ST does it by default unless you set add to prompt on
Anonymous No.106193404 [Report] >>106193491
>>106193388
ST doesn't send parsed reasoning except on continue.
Anonymous No.106193409 [Report] >>106193429 >>106193460 >>106193491 >>106193546 >>106193831 >>106194663
>>106193308
>>106193353
For GLM 4.5 I think you're supposed to emit empty think blocks for old messages.
https://hf.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
Anonymous No.106193421 [Report] >>106193430
Reminder to always look in the console to make sure you're sending the exact prompt you're expecting to send.
Anonymous No.106193429 [Report]
>>106193409
What in the fuck.
Anonymous No.106193430 [Report] >>106193491
>>106193421
There's the prompt inspector in ST too
Anonymous No.106193445 [Report]
>>106188313
>>106188331
>>106188303
>>106188272
went to bed after this but thanks!
Anonymous No.106193460 [Report]
>>106193409
That also looks like it will fuck up prefills.
Anonymous No.106193479 [Report] >>106193977
>>106193038
>Nothing about the OSS models being shit
Fucking kek
Anonymous No.106193491 [Report]
>>106193409
>>106193404
>>106193399
>>106193354
>>106193430
thanks for all the infos anons, seems like it already gets removed automatically
i guess there's nothing to fix
Anonymous No.106193512 [Report] >>106193544
..i think i prefer shivering spines to the smell of ozone
Anonymous No.106193534 [Report] >>106193554 >>106193570 >>106193579 >>106193627 >>106193726 >>106195437
https://xcancel.com/QuixiAI/status/1953809869972107739
eric slopford (creator of slophin) thinks you should stop whining and learn to love the toss.
Anonymous No.106193544 [Report] >>106193862
>>106193512
Stupid things think blood tastes like copper and chlorine smells like ozone. We need to add more modalities.
Anonymous No.106193546 [Report]
>>106193409
huh, so we're supposed to remove the reasoning from old messages and just leave <think></think>?
Anonymous No.106193554 [Report] >>106193640
>>106193534
> other than being overly structured and prudish, I've no problem
>it there's anything you don't like about it you can fine-tune it to act differently. (And you can sell your fine-tune and keep all the profit!)
Thanks Eric, it's great to see how you turned out.
Anonymous No.106193570 [Report] >>106193590 >>106193732
>>106193534
I love Twitter users.
>there is never good reason to complain
Anonymous No.106193579 [Report] >>106193589
>>106193534
>compare it to 3.3 70B
lmao
Anonymous No.106193583 [Report]
>>106189947
dots.vlm's attempt:
>せっかく労働を労ってやったのに無視された……(しょぼん)
>まあ、警視庁が都案を快く思ってない事ぐらい、
>よぉぉくわかってますよ!
Anonymous No.106193589 [Report]
>>106193579
But it's FAST, who cares if it's good as long as its FAST.
Anonymous No.106193590 [Report]
>>106193570
oh great, the faggot that caused the downfall of the HF LLM leaderboard is defending openai
haha!
Anonymous No.106193598 [Report] >>106193615
>>106192039
it doesn't seem that repetitive with top nsigma on
Anonymous No.106193615 [Report] >>106193668
>>106193598
Can You share what value You're using nsigma at please?
Anonymous No.106193627 [Report] >>106193648 >>106193794
>>106193534
They're all sucking that OAI cock
Anonymous No.106193634 [Report] >>106193679 >>106193691
>>106193392
Anyone?
Anonymous No.106193640 [Report]
>>106193554
Is he selling any finetrooned models?
Anonymous No.106193648 [Report]
>>106193627
Anonymous No.106193666 [Report] >>106193692
Got my 192GB DDR5 5200 kit. I was running 128GB with 4400 when it was rated 6000. I also couldn't get past 4400 with 192GB but after getting latest bios surprisingly 7800X3D works with 192GB 5200.
Anonymous No.106193668 [Report] >>106193729
>>106193615
top nsigma should always be set to 1 unless you want to disable it, then you set it to 0.
Anonymous No.106193679 [Report]
>>106193634
I would help but I am part of the anti-miku faction of this general.
Anonymous No.106193683 [Report] >>106193694 >>106193715 >>106193717
What is the best model I can use for uncensored ERP? 2070 super (8gb vram) 32gb ram, amd ryzen 3700x.

I was using mistral nemo instruct 2407 q4 k m, but it's honestly a bit lackluster. Any recommendations?
Anonymous No.106193691 [Report]
>>106193634
Ilya-san... nani kore?!
Anonymous No.106193692 [Report] >>106193707
>>106193666
very nice, post some speeds
what gpus do u have? are u the 2080ti22g/p40 anon?
Anonymous No.106193694 [Report]
>>106193683
>Any recommendations?
mistral nemo instruct 2407 q4 k m
Anonymous No.106193707 [Report]
>>106193692
Just a 4090.
Anonymous No.106193715 [Report] >>106193741
>>106193683
rocinante
Anonymous No.106193717 [Report] >>106193783
>>106193683
Get more RAM
Anonymous No.106193726 [Report] >>106193961
>>106193534
>just finetune it bro
But one thing is finetuning a model for one very narrow task, another to restore missing knowledge and removing/mitigating refusals while not murdering general performance. Is he talking about 25~50M rows finetunes?
Anonymous No.106193729 [Report]
>>106193668
what? no, you can set it to whatever. in the paper they recommend 1 as a default but iirc they discuss other values as being reasonable
a quick search shows their own official repo says as much:
https://github.com/Tomorrowdawn/top_nsigma
>A key question is: what's the best value for n ? While this parameter serves as an alternative to temperature for controlling diversity, its optimal value isn't fully settled yet. The community suggests a range of 0-2, though this is quite broad. In my own experience, any value between 0.3 and 1.5 could work well. If you prefer conservative sampling, use a lower value like 0.7; for more diversity, try 1.3.
Anonymous No.106193732 [Report]
>>106193570
SAAAAAAARRRR
Anonymous No.106193741 [Report] >>106193755
>>106193715
No
Anonymous No.106193755 [Report]
>>106193741
read through the archives
Anonymous No.106193783 [Report]
>>106193717
Absolutely not
Anonymous No.106193794 [Report]
>>106193627
It was, but for a different reason.
Anonymous No.106193803 [Report]
>>106192362
Air at least doesn't do that as long as there's a <think></think> block with some content at the start of its reply, whether generated by it or prefilled. If there's nothing or almost nothing in the think block, it may generate a new one.
Anonymous No.106193831 [Report] >>106193979
>>106193409
sadly this doesnt fix it either, rip
Anonymous No.106193837 [Report]
>>106193165
GLM 4.5 Air base might be worth looking at aswell, since the thinking/instruct is a little cooked
Anonymous No.106193862 [Report]
>>106193544
Smell will never be a modality you sick fuck
Anonymous No.106193872 [Report]
>>106192962
Again, he said it out loud because he's read these fucking threads. The fact that orange reddit needs someone at Github to tell them is amusing.
Anonymous No.106193873 [Report] >>106193896
For me, it's when she bites her lips.
Anonymous No.106193896 [Report] >>106193910
>>106193873
Yep, unlike me, whose teeth have never touched the lips
Anonymous No.106193903 [Report] >>106193946
Regarding Quen-2.5-VL - is 3B enough for OCR purposes, or should I go up to 7B?
Anonymous No.106193910 [Report]
>>106193896
how?
Anonymous No.106193946 [Report]
>>106193903
dots.ocr
Anonymous No.106193961 [Report]
>>106193726
>removing/mitigating refusal
How about no, you terrorist?
Anonymous No.106193977 [Report]
>>106193479
Normies don't care about that
Anonymous No.106193979 [Report] >>106194132
>>106193831
forgot pic
Anonymous No.106193982 [Report]
>>106193343
>Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.
Downloaded Devstral-Small-2505-8bit. Tried 3 times locally, failing twice then succeeding once and stopping. Each answer took about 40 seconds. There might be something going on but I'm willing to just say I got lucky RNG and leave it at that.
To give gpt-oss-120b a fair shake I tried it another 8 times but it never produced a program that behaved correctly.
Anonymous No.106193998 [Report]
>>106193165
>>https://huggingface.co/BeaverAI/models?search=20b
>i'll bite, post ST master export
i guess i wont bite then
Anonymous No.106194037 [Report] >>106194067 >>106194084 >>106194089 >>106194098 >>106194102
Anonymous No.106194067 [Report]
>>106194037
PIAA
Anonymous No.106194084 [Report]
>>106194037
strawberry bros it's fucking over
Anonymous No.106194089 [Report]
>>106194037
Anonymous No.106194098 [Report]
>>106194037
This is correct in America
Anonymous No.106194102 [Report] >>106195466
>>106194037
To anyone wondering where the fuck it got -.21 from, I'm not entirely sure but I have a clue.
5.11 - 4.9 = 0.21.
5.11 - 5.9 (i.e. the wrong way around) is actually -0.79 but somewhere in the process it negated .21 to -.21.
Anonymous No.106194114 [Report] >>106194128
Has a LLM ever recommended a commercial service to you? Qwen3 235B shilled me Bright Data (after I specifically asked for SaaS) for scraping. Integrated right into my scraping code base. I'd imagine companies would pay AI companies to train on their ads, much like how they would pay for Google search ads.
Anonymous No.106194128 [Report] >>106194273 >>106194348
>>106194114
check out drummers' r-rrrr royfield i dont know the name finetune haha
Anonymous No.106194132 [Report] >>106194164 >>106195667
>>106193979
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}

The content shouldn't be on the same line as <think></think>. Also do you have [gMASK]<sop> at the start of the chat? Not that I think any of this will solve your problem.
Anonymous No.106194147 [Report] >>106194174 >>106194178 >>106194939
https://xcancel.com/alibaba_qwen/status/1953760230141309354
>Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!
>something about Dual Chunk Attention
Anonymous No.106194150 [Report] >>106194178 >>106194190
What was the big thing not model?
Anonymous No.106194164 [Report]
>>106194132
i do, i'll fix the <think></think> and try regenning
>Not that I think any of this will solve your problem.
probably right
thanks for pointing out, anon
Anonymous No.106194174 [Report]
>>106194147
Already talked about
Anonymous No.106194178 [Report]
>>106194150
>>106194147
Anonymous No.106194190 [Report]
>>106194150
wasn't it free Qwen coder requests?
Anonymous No.106194248 [Report] >>106194288 >>106194305 >>106194319 >>106195602
Anonymous No.106194273 [Report] >>106194287
>>106194128
>i dont know the name finetune haha
>haha
KYS DRUMMER
Anonymous No.106194287 [Report]
>>106194273
not rocinante negro
Anonymous No.106194288 [Report] >>106194305 >>106194319 >>106195602
>>106194248
Anonymous No.106194305 [Report] >>106194330
>>106194248
>>106194288
weird
Anonymous No.106194319 [Report]
>>106194248
>>106194288
So tiresome
Anonymous No.106194330 [Report]
>>106194305
yeah obsessing over shit like that is definitely weird.
Anonymous No.106194348 [Report] >>106194377
>>106194128
You probably mean Rivermind™
https://huggingface.co/TheDrummer/Rivermind-12B-v1
Anonymous No.106194377 [Report]
>>106194348
yeaaa that oneee...
Anonymous No.106194480 [Report]
>>106193165
>I just wanted to share it with you all
Hi Drummer, all here... We didn't want you to share it with us. Go away you temu undi.
Anonymous No.106194539 [Report] >>106194582 >>106194641 >>106194743
Not a single glm base (non-air) goof on hf. Is that because base is shit because pic related?
Anonymous No.106194558 [Report] >>106194578
Just got an ebay MI50 32gb, but when I load a model bigger than 16gb in llama.cpp it spills over to RAM.
Did the chinks bamboozled me with a 16gb card or am I retarded?
Running rocm-smi --showmeminfo vram gives me
GPU[0] : VRAM Total Memory (B): 34342961152
Anonymous No.106194578 [Report] >>106194621
>>106194558
Does that also happen with vulkan?
Anonymous No.106194582 [Report] >>106194615 >>106194641
>>106194539
Yes, the base is fake, just like Qwen bases. Their 32b base had somewhat acceptable distribution, but the big one doesn't.
Anonymous No.106194597 [Report]
Local is dead again... Sama please save us
Anonymous No.106194615 [Report] >>106194627
>>106194582
I can't believe I got an informative answer ITT.
Anonymous No.106194621 [Report]
>>106194578
I'm using vulkan because I couldn't get rocm working with llama.cpp yet.
Anonymous No.106194627 [Report]
>>106194615
That and quanters just generally dgaf about base models.
Anonymous No.106194641 [Report] >>106194686
>>106194539
>>106194582
Companies forgot what base even means. And only actual weirdos and freaks care about foundational models anymore. Air has 360 downloads, big GLM has 163 (meanwhile instruct tunes have 75x and 104x of that). The world will die under a pile of instruct garbage.
Anonymous No.106194663 [Report]
Hmm. I just tried >>106193409 and did a swipe on a chat where the model hard repeats the last message verbatim. And it worked. It didn't repeat. I'll do some more testing, but this is promising. I guess I'll need to use the jinja playground a bit more deeply in the future rather than assume a single chat turn with the default example reveals everything about the templating logic.
Anonymous No.106194686 [Report]
>>106194641
People forgot that the original AIDungeon was GPT-2 fineturned and bootstrapped on CYOA data
Anonymous No.106194732 [Report] >>106194749
Anonymous No.106194743 [Report]
>>106194539
>glm base (non-air) goof
Use case? No one doing creative writing, or whatever people use base locally, has the rig to run it.
Anonymous No.106194749 [Report]
>>106194732
stop posting shit on reddit that was posted here before it was posted on reddit
Anonymous No.106194795 [Report] >>106194809 >>106194818 >>106194823
Which model can give me an oiled footjob
Anonymous No.106194809 [Report]
>>106194795
JEPA
Anonymous No.106194818 [Report] >>106194843
>>106194795
a lot of the ones on instagram but you have to pay them like $50k and fly them out
Anonymous No.106194823 [Report]
>>106194795
OSS-20B
Anonymous No.106194843 [Report]
>>106194818
lol fuck that I can build a GPUMAXXED rig with that kind of money
Anonymous No.106194898 [Report] >>106195526
V340 anon where are you..
Anonymous No.106194919 [Report] >>106194953
So is LeCunny going to do anything, ever? It looks like Genie 3 has leapfrogged his jepa bullshit, and Zuckerberg is all-in on LLMs. What is next for him? He seems wasted where he is.
Anonymous No.106194939 [Report]
>>106194147
Gguf status?
Anonymous No.106194953 [Report] >>106194960
>>106194919
Ask him on twitter. Come back with a screenshot of his reply.
Anonymous No.106194960 [Report]
>>106194953
It's X
Anonymous No.106195004 [Report] >>106195042
https://desuarchive.org/g/thread/105750356/#105755753
>Hi all, Drummer here...
>I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!
He didn't even last a week
Anonymous No.106195042 [Report]
>>106195004
Neither did the spammer. It's all copycats now.
Anonymous No.106195162 [Report] >>106195526
>>106189515
>--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
huh 16 channels of ddr5, and now we have higher jedec speeds, at which speed we could expect that?
funny enough i am most worried for simulations, but if i could use it for llms better
Anonymous No.106195241 [Report] >>106195251 >>106195536
Guys, is deepseek okay?
Anonymous No.106195251 [Report]
>>106195241
He's dead, Jim.
Anonymous No.106195437 [Report] >>106195445
>>106193534
>Compare it to llama 3.3 70b
why are there still retards in this day and age comparing anything to the dogshit llama models??
Anonymous No.106195445 [Report] >>106195608
>>106195437
the only other model they know is R1 and they can't run it on their machine
Anonymous No.106195466 [Report] >>106196060
>>106194102
Well now I'm wondering where the fuck you got 4.9 from
Anonymous No.106195526 [Report] >>106195553
>>106195162
It's not coming to common desktop platforms, Threadripper will be like almost a year late and Eypc will be expensive. Until Intel competes, that's how it's going to be in CPU land.
>>106194898
I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.
Anonymous No.106195536 [Report] >>106195554 >>106195595 >>106195620
>>106195241
They are underperforming. There is now pressure to be #1 on lmarena from Xi himself. R2/V4 MUST be a huge jump. So far the improvements are incremental at best.
Anonymous No.106195553 [Report]
>>106195526
>I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
>A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.
how do i get educated enough to understand every word in this post?
Anonymous No.106195554 [Report]
>>106195536
I get the impression that most of the pressure is coming from Liang Wenfeng's high standards rather than from the party.
Anonymous No.106195578 [Report]
>>106191538
>Recap bot mentioned dipsy
Ah, didn't notice. Nice.
> Original Character Do Not Steal
Not mine, actually. Do Not Care.
> AGP avatars from his HRT bathtub
lol
Anonymous No.106195595 [Report]
>>106195536
>So far the improvements are incremental at best
That's true of all LLMs though. I look into API models progress as much as local and I can't say anything had me give a shit except for when Gemini 2.5 pro released, mainly because it's the first decent model with huge context, and even then the more I use it the more the magic fades, it has strong "opinions" about code that I do not share and still makes tons of mistakes
Anonymous No.106195602 [Report]
>>106194248
>>106194288
>GPT now calls you out on your rhetorical tricks
Neat
Anonymous No.106195608 [Report]
>>106195445
>235b is a different class. Most people can't obtain the compute to run 235b. But most people can obtain the compute to run 120b, if they want to and try to.
>consumer hardware like 4x3090. (Can be built for $5,000)
ye
Anonymous No.106195617 [Report] >>106195635 >>106195648 >>106195667
Actually GLM-4.5 Air kinda sucks and is slop and too censored in its thinking and if you force it not to think it's superslop.
Anonymous No.106195620 [Report]
>>106195536
All I need is for their next release to match current Gemini while still being as pliant as the current R1.
Anonymous No.106195626 [Report]
man is not a learning animal, or else, man would know by now to avoid something called glm
Anonymous No.106195627 [Report] >>106195641
Anonymous No.106195635 [Report] >>106195667
>>106195617
I was about to say. It's actually pretty censored huh?
Sure, you can get around it, but it really likes to talk about ethics and stuff.
Anonymous No.106195641 [Report]
>>106195627
all these are too big and non american too run and 0528 is not years sir
Anonymous No.106195648 [Report]
>>106195617
You can edit its thinking and then continue. But it is kind of annoying, true.
Anonymous No.106195667 [Report]
>>106195617
>>106195635
https://files.catbox.moe/gjw3c3.json
see if it's still cucked with that
i've been rping on mikupad for hours (currently at 3224 tokens)
>your rig is that SHIT?
no im just multitasking and forcing myself to keep on roleplaying to see if it breaks after 4k context, i sometimes forget i have mikupad open :(
works very well with >>106194132
so far it hasn't stopped thinking ONCE
Anonymous No.106195704 [Report]
>>106195686
>>106195686
>>106195686
Anonymous No.106196060 [Report]
>>106195466
It's to derive .21. 4.9 is simply 1 lower than 5.9. I pointed out the model "wrapped over" positive to negative .21 when "going over the column"