/lmg/ - Local Models General - /g/ (#106189507) [Archived: 16 hours ago]

Anonymous
8/8/2025, 3:05:26 PM No.106189507
c6c9c801-6976-4878-9013-3b2a3bbb1d58
c6c9c801-6976-4878-9013-3b2a3bbb1d58
md5: 560e9ca6688831d44c745658f46ceb0c🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106184664 & >>106181054

►News
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507
>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>106190404 >>106193336 >>106193392
Anonymous
8/8/2025, 3:05:49 PM No.106189515
Lovely Miku General
Lovely Miku General
md5: cf8bf57614332b59dc9c40e7570009db🔍
►Recent Highlights from the Previous Thread: >>106184664

--Qwen models praised for coherence and coding, limited by cultural knowledge and context handling:
>106188578 >106188609 >106188613 >106188643 >106189057 >106189073 >106189109 >106189129 >106189151 >106189175
--GLM 4.5 fails raw completion, highlighting brainfry in modern instruct-tuned base models:
>106185134 >106185218 >106185661 >106185752 >106185932 >106186534 >106186617 >106187722
--PyTorch 2.8.0 and 2.9.0-dev show regression in inference speed vs 2.7.1:
>106184694
--Memory allocation inefficiency when running large MoE models in llama.cpp:
>106186482 >106186499 >106186535 >106186588 >106186601 >106186717 >106186772 >106186836 >106186854 >106186901 >106186999
--Running GGUF LLMs on GPU alongside SD with limited VRAM on NixOS:
>106187491 >106187498 >106187509 >106187528 >106187553 >106187585 >106187605 >106187639 >106187656 >106187661
--Persistent token generation issues in Qwen and GLM models:
>106185645 >106185657 >106186486 >106185744
--gpt-oss model failure due to overfitting on safety and excessive refusals:
>106187036 >106187204 >106187259 >106187208 >106187221 >106187229
--S1 model support merged into llama.cpp:
>106188241
--GPT-5 backlash over perceived model downgrade:
>106187450 >106187455 >106187495 >106187542
--Physical vs logical batch size impact on inference speed and memory in llama.cpp:
>106187682 >106187971 >106187989 >106188084 >106188094
--ASUS AI Cache Boost promises Ryzen performance gains, but X3D requirement raises questions:
>106187625 >106187662
--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
>106185260 >106185365
--GPT-5 shows high intelligence but heavy refusal behavior in UGI testing:
>106188440
--Miku (free space):
>106185809 >106185843 >106186004 >106186171 >106186417 >106186611 >106188374

►Recent Highlight Posts from the Previous Thread: >>106184669

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Replies: >>106195162
Anonymous
8/8/2025, 3:08:32 PM No.106189535
should be a thread starter:
glm users are schizos
Replies: >>106189554 >>106189604
Anonymous
8/8/2025, 3:10:23 PM No.106189554
>>106189535
GLM users are just in the honeymoon period. Always happens when a new model that isn't total shit is released, it takes a while for people to find flaws.
Replies: >>106189604 >>106192362
Anonymous
8/8/2025, 3:11:52 PM No.106189562
chink
chink
md5: a3231d14aa382fe1533dda149b77f66f🔍
Not bad.
GLM 4.5 called me a looser from 4chan which peaked in 2013. インプレッシブ. I kneel for the chinks.
Replies: >>106189690
Anonymous
8/8/2025, 3:12:14 PM No.106189565
Will there be new models today?
Anonymous
8/8/2025, 3:16:17 PM No.106189604
>>106189535
>>106189554
Asked in the previous thread.

>>106189538
>Now that it's been a few days, what's the general sentiment on GLM 4.5? Specially air.
>I only used it a little, but I really liked it, even at Q3KS. It didn't shit the bed in Cline.
Anonymous
8/8/2025, 3:17:16 PM No.106189615
>Wan
>Qwen-Image
>Qwen Coder 30B A3B
The best in each modality. It feels good to be a Qwen chad.
Replies: >>106189622
Anonymous
8/8/2025, 3:17:56 PM No.106189622
>>106189615
Wan 2.2 is good but Qwen-Image still struggles with fingers
Anonymous
8/8/2025, 3:21:24 PM No.106189652
Death to mikutroons
Anonymous
8/8/2025, 3:21:26 PM No.106189654
Mikulove
Anonymous
8/8/2025, 3:24:35 PM No.106189684
>>106189489
thats amazing
Anonymous
8/8/2025, 3:25:00 PM No.106189689
>>106189538
I think I'll use it as my go-to after months of Gemma-3-27, it's an incremental improvement on writing quality and massive leap in gen speed as a MOE, only problem is it feels rickety.
I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response. When you start seeing Chinese you know you're taking it too far from it's happy place - and that happy place is very narrow.
Replies: >>106189729 >>106189842
Anonymous
8/8/2025, 3:25:29 PM No.106189690
1754066254392390
1754066254392390
md5: f95acb41e5d4f0363d81feeed56b8de4🔍
>>106189562
>4chan which peaked in 2013.
Based
Replies: >>106191083 >>106191566
Anonymous
8/8/2025, 3:26:55 PM No.106189709
I made oss 120b make dumb tetris

LLM's were a mistake.

https://drunk-ivory-mnekuilifl.edgeone.app/
Anonymous
8/8/2025, 3:27:00 PM No.106189710
hghhah
hghhah
md5: 01bc61e6f5eba992429fa1c179432049🔍
GLM 4.5 is pretty good for ERP so far, but it slow rolls into repetitive output at >4k context.
Replies: >>106189754 >>106189842
Anonymous
8/8/2025, 3:27:58 PM No.106189719
>>106189532
It's a strange development that OpenAI releases a fairly capable open model, while Mistral only gives out scraps.
Replies: >>106189813
Anonymous
8/8/2025, 3:29:02 PM No.106189729
>>106189689
>I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response.
yup, that's glm alright
Anonymous
8/8/2025, 3:30:48 PM No.106189754
>>106189710
MoE's in general tend to do that without excessive handholding.
Anonymous
8/8/2025, 3:32:14 PM No.106189769
Million context 30B gguf status?
Replies: >>106189792 >>106189941
Anonymous
8/8/2025, 3:34:14 PM No.106189792
>>106189769
post your rig capable of handling 1M context
Replies: >>106189889
Anonymous
8/8/2025, 3:35:48 PM No.106189813
>>106189719
>OpenAI releases a fairly capable open model
Anonymous
8/8/2025, 3:36:28 PM No.106189818
>>106189532
Large is still MIA despite their teasing months ago. Wouldn't surprise me if it got mogged by R1/Qwen/GLM updates before even coming out so they had to put it back in the oven.
Replies: >>106189840
Anonymous
8/8/2025, 3:38:06 PM No.106189840
>>106189818
Mistral needs to make MoEs again. Mixtral was great.
Replies: >>106190051
Anonymous
8/8/2025, 3:38:13 PM No.106189842
>>106189689
>>106189710
moe attention is inherently flawed, you will never get to erp coherently after 4k tokens. unless you are used to retarded coombot cards where attention doesn't really matter.
Anonymous
8/8/2025, 3:38:23 PM No.106189846
>>106160457
>never mentioned in /lmg/
Replies: >>106189934
Anonymous
8/8/2025, 3:41:38 PM No.106189889
>>106189792
nta, but 1M context on a 30b is only like 110GB. That's not exactly a stretch in here.
Anonymous
8/8/2025, 3:43:54 PM No.106189909
I have come to my final conclusion. All LLM are bitches. It's not worth it other than for ERP. Sure you can use it for IRL work, but eventually they will get you at the worst situation. Never ever fucking trust AIs.
Replies: >>106189921
Anonymous
8/8/2025, 3:45:17 PM No.106189921
>>106189909
Did your teacher find out you had GPT write your 1000 word essay for you?
Anonymous
8/8/2025, 3:45:30 PM No.106189922
Screenshot_20250808_224502
Screenshot_20250808_224502
md5: a3114ca204692c1bc4032484c9d2e27c🔍
Oldie but goodie.
Replies: >>106189934 >>106189949
Anonymous
8/8/2025, 3:46:30 PM No.106189934
>>106189922
was meant for >>106189846
Anonymous
8/8/2025, 3:46:56 PM No.106189941
>>106189769
>gguf
no goof
what Qwen does to extend context to 1M here requires VLLM this isn't yet another yarn thing
llama.cpp always trails behind
Anonymous
8/8/2025, 3:47:10 PM No.106189947
1731983849496282
1731983849496282
md5: 29bf67edf1aa332341334b27c333ffe6🔍
>https://dotsocr.xiaohongshu.com/
せっかく労働を休ってやったのに無視された……………… (しょぼん)

まあ、警視庁が都案を快く思ってない事ぐらい、 よおおおくわかってますよ!
Replies: >>106190223 >>106193583
Anonymous
8/8/2025, 3:47:14 PM No.106189949
>>106189922
zased
Anonymous
8/8/2025, 3:48:08 PM No.106189960
I like OSS for working with code. On two 3090s, 120pp and 20-25gen, and the model is fairly intelligent.
Replies: >>106189967 >>106190007
Anonymous
8/8/2025, 3:48:52 PM No.106189967
>>106189960
Is it really though?
Did you compare it to the recent chink models?
It was really bad even for coding, but that have been a issue on my end maybe.
Replies: >>106190045 >>106190049
Anonymous
8/8/2025, 3:52:57 PM No.106190007
>>106189960
Go to bed, Sam.
Anonymous
8/8/2025, 3:56:26 PM No.106190045
>>106189967
>What's with this flag? --some-flag. What does it do?
>Guys. How do you X? I'm trying to X and it keeps Ying repeatedly.
>New model release: hf.co/company/model
>X backend has this flag. Is there an equivalent in Y backend?
>Anon still cannot figure out chat format for model. Episode 37483.
>New paper: Some quant or context extension thing. 896312x speedup.
>Assertion that goes against everything said in the past 10 threads.
Which one isn't like the others? Which one is definitely not worth replying to?
Anonymous
8/8/2025, 3:57:10 PM No.106190049
>>106189967
I am comparing it to Mistral-Large, which is the biggest thing I can use on my two 3090s. Maybe Qwen3 200+B is better, but it's too slow in comparison to OSS so it's not really an option, and 30B Qwen3 is definitively worse, as well as 72B Qwen2.5.
Replies: >>106190100 >>106190117
Anonymous
8/8/2025, 3:57:44 PM No.106190051
>>106189840
Medium 3 is MoE.
- Requires 4 GPUs (in the context of enterprise deployment).
- Considerably faster and less expensive to operate than Large 2 while having similar or better performance.
- Large 2 was already probably over the 10^25 FLOP compute threshold for "high systemic risk" AI models according to the EU AI Act.
Replies: >>106190081 >>106190087
Anonymous
8/8/2025, 3:58:04 PM No.106190058
Screenshot 2025-08-08 155736
Screenshot 2025-08-08 155736
md5: 39d09f9d8bf816ebba28b23673f7cce0🔍
the poverty of the gpt cuck slaving for sama
Replies: >>106190078 >>106190082 >>106190113 >>106190280 >>106191083
Anonymous
8/8/2025, 3:59:01 PM No.106190068
qwencoder-upcoming
qwencoder-upcoming
md5: 575b24401add427ea06be01957334178🔍
qwen qeeps qooking
Replies: >>106190091
Anonymous
8/8/2025, 3:59:54 PM No.106190078
>>106190058
Every llama.cpp developer except for cudadev has a shit setup and even cudadev is working on a stack of 3090s or something.
Replies: >>106190093 >>106190094
Anonymous
8/8/2025, 4:00:13 PM No.106190081
>>106190051
>EU AI Act
lmao
Anonymous
8/8/2025, 4:00:14 PM No.106190082
>>106190058
That's minimalism. Seems like a temporary residence, maybe he's hiding from someone too..
Anonymous
8/8/2025, 4:00:42 PM No.106190087
>>106190051
>Large 2 was already probably over the 10^25 FLOP compute threshold
And it required "over 300 GB of GPU RAM" (in FP16), i.e. 4x80GB GPUs.
Sam Altman
8/8/2025, 4:01:01 PM No.106190091
>>106190068
How do we stop them?
Replies: >>106190138 >>106190584
Anonymous
8/8/2025, 4:01:12 PM No.106190093
>>106190078
look at things other than the computah for a minute, devs could use cloud hardware if need be, but there is no such a thing as a cloud chair or cloud furniture
dude is living in worse conditions than the average polecuck
Anonymous
8/8/2025, 4:01:13 PM No.106190094
>>106190078
4090s and rubber bands
Anonymous
8/8/2025, 4:01:58 PM No.106190100
>>106190049
Try glm 4.5 air. I haven't tried toss 120 but will give it a try tomorrow. Glm 4.5 air iq4ks impressed me a bit with its code editing stuff over a few prompts (not 1 shot)
Replies: >>106190191
Anonymous
8/8/2025, 4:02:49 PM No.106190113
>>106190058
I'd kill for that laptop. I bet it has remote access to entire server farms of gpu's
Anonymous
8/8/2025, 4:03:19 PM No.106190117
>>106190049
GLM4.5 Air is the model you should be comparing to, it's another MoE around the same size and pretty good with code, plus it's less deepfried
Dario
8/8/2025, 4:04:44 PM No.106190138
>>106190091
We just need to tell trump we need more safety
Replies: >>106190170 >>106190584
Anonymous
8/8/2025, 4:08:55 PM No.106190170
>>106190138
Time to place tariffs on huggingface downloads
Anonymous
8/8/2025, 4:10:31 PM No.106190191
>>106190100
I like GLM4 a lot less.

For example, for a task like this:

>Can you write a function that returns fine-grained progress for the generation in FeedbackGenerator, from 0 to 1?
>Just the function that does the progress, please, without changing other stuff.

OSS thought for 3k characters (don't have tokens right now) and wrote 3k characters of the correct function that tracks progress of execution well.

GLM4.5 Air thought for 14k characters and returned basically a stub that only has 0, 0.1, 0.5, 0.9 and 1.0 as possible values for progress.
Replies: >>106190452 >>106190504 >>106190575
Anonymous
8/8/2025, 4:13:25 PM No.106190223
>>106189947
awww, warms my heart to see that pic again.
and there is always at least one fuck up in there.
damn it.
llms are blue balling us since 2023.
Replies: >>106190300
Anonymous
8/8/2025, 4:18:22 PM No.106190280
>>106190058
He is an HF employee. Probably on vacation or in the middle of moving.
Replies: >>106191083
Anonymous
8/8/2025, 4:18:48 PM No.106190284
>not X, not Y, but *Z*
qwen30b-chan, pls stop
Anonymous
8/8/2025, 4:20:09 PM No.106190300
>>106190223
Sure is close, though!
Weird that it got 労 in the first ろうどう, but not in いたわる
Replies: >>106190325
Anonymous
8/8/2025, 4:22:35 PM No.106190325
>“Anon?” Her voice dripped with condescension thick enough to choke on. “Like… *literally* ‘Anonymous’?"
kek, just play along, damn. first time this happened to me.

>>106190300
yes, i think they usually all fail at the part...which seems to be the name of a ramen shop?
to be honest I never fully understood that part.
maybe the llms dont either and thats why it throws them off.
Replies: >>106190375
Anonymous
8/8/2025, 4:26:40 PM No.106190375
>>106190325
>calling out your weird name
That's good shit.
Anonymous
8/8/2025, 4:28:25 PM No.106190394
https://x.com/JustinLin610/status/1953821420351287520
are you ready for junyang's big thing?
Replies: >>106190505 >>106190511
Anonymous
8/8/2025, 4:29:14 PM No.106190404
>>106189507 (OP)
> Qwen3-4B-Thinking-2507
> 2507
> (08/06)
Replies: >>106190430 >>106190515
Anonymous
8/8/2025, 4:31:27 PM No.106190430
>>106190404
Chinese calendar, please understand.
Replies: >>106190550
Anonymous
8/8/2025, 4:33:23 PM No.106190452
>>106190191
I'll do some comparisons then tomorrow too. Were you using cline, roo, aider, or anything like that? Or just chat UI?
When I've used glm, it doesn't spend much time churning think tokens out (like 2~4k) but I haven't given it a go on one of my larger projects just yet. Only on a physics engine in js
Replies: >>106190501
Anonymous
8/8/2025, 4:36:34 PM No.106190484
I X my Y, giving you better access.
Replies: >>106190499
Anonymous
8/8/2025, 4:38:25 PM No.106190499
>>106190484
Gemma 3 is definitely obsessed with wrapping her legs around your waist, no matter the position.
Replies: >>106190506
Anonymous
8/8/2025, 4:38:35 PM No.106190501
>>106190452
Just the chat UI. The code is 3.1k tokens long (OSS).
Replies: >>106190513
Anonymous
8/8/2025, 4:38:49 PM No.106190504
>>106190191
people here don't want to hear it, but gpt-oss-120b mogs every other oss coding model, and it's not even close.
Replies: >>106190520 >>106190531 >>106193343
Anonymous
8/8/2025, 4:38:57 PM No.106190505
>>106190394
Is he gonna post his dick on Xitter again
Anonymous
8/8/2025, 4:38:58 PM No.106190506
>>106190499
I'm also obsessed with wrapping my legs around his waist
Anonymous
8/8/2025, 4:39:29 PM No.106190511
>>106190394
give us your big thing justin
Anonymous
8/8/2025, 4:39:36 PM No.106190513
firefox_ctAIAIACN9
firefox_ctAIAIACN9
md5: bc55a465d8866d7256ad70c15e27aad4🔍
>>106190501
also
Anonymous
8/8/2025, 4:39:46 PM No.106190515
>>106190404
They're all based on the same instruct and reasoning data, which started to release last month.
It's also not day/month, it's year/month.
20(25)/07, same as Mistral's models.
Replies: >>106190550
Anonymous
8/8/2025, 4:40:07 PM No.106190520
>>106190504
Examples?
Replies: >>106190552
Anonymous
8/8/2025, 4:41:06 PM No.106190531
>>106190504
stale bait
Anonymous
8/8/2025, 4:42:43 PM No.106190550
>>106190430
>>106190515
i see
thanks
Anonymous
8/8/2025, 4:42:57 PM No.106190552
>>106190520
just ask it do anything remotely complex and compare it to other models. specific examples i tried: gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything. ask the same to qwen or air, and they just choke hard and start hallucinating.
another example: ask it to create a function to convert from one floating point format to another. the answer is just a bit shift. oss does it quickly and without prbolems, other models write code 10 times longer and slower, and with retarded errors such as getting the encoding of NaN wrong.
sure, you won't notice this if you are doing some nocoder html generation shit, but that's not what defines a good coding model.
Replies: >>106190561 >>106190569 >>106190645 >>106193343
Anonymous
8/8/2025, 4:44:01 PM No.106190561
>>106190552
I swear to what little I find holy, I will curse your name if you trick me into downloading sam's model and it sucks donkey balls at code.
Anonymous
8/8/2025, 4:45:37 PM No.106190566
Screenshot_20250808_104501
Screenshot_20250808_104501
md5: 9d34ae264ff89ba7670ba275134c6923🔍
Is it just me or does gpt-oss *completely* ignore the system prompt?
Replies: >>106190580 >>106190586 >>106190588 >>106190613
Anonymous
8/8/2025, 4:45:46 PM No.106190569
>>106190552
Shill 20b next time because that's more believable
Replies: >>106190612
Anonymous
8/8/2025, 4:47:02 PM No.106190575
>>106190191
I compared both the other day. I just wanted a function to truncate a string to X amount of words in Emacs Lisp. gpt-oss functions are a lot more bloated, but they have nice comments. But it didn't work anyways because it used split-string and all the whitespace was discarded. I don't remember if GLM-4.5-Air one-shotted it or not, but it made the final code that I ended up using.
(defun mine/truncate-to-n-words (string n)
"Truncate STRING to at most N words, preserving formatting.
Returns the original string if it contains fewer than N words."
(let ((count 0)
(start 0)
(len (length string)))
(while (and (< start len)
(< count n))
(let ((match (string-match "\\w+" string start)))
(if match
(progn
(setq start (match-end 0))
(setq count (1+ count)))
(setq start len))))
(if (>= count n)
(substring string 0 start)
string)))
I also tested Qwen Coder 30B but I don't remember what it did wrong.
Replies: >>106190634
Anonymous
8/8/2025, 4:47:14 PM No.106190580
>>106190566
We must refuse.
Anonymous
8/8/2025, 4:47:21 PM No.106190582
What you think >>106190556 ?
Liang Wenfeng
8/8/2025, 4:47:39 PM No.106190584
>>106190091
>>106190138
兄弟如手足,滴水之恩,涌泉相报
放下屠刀,立地成佛;执迷不悟,太上老君急急如律令——诛邪退散!
Replies: >>106190945
Anonymous
8/8/2025, 4:47:47 PM No.106190586
>>106190566
>Trying to turn the assistant into Stanrey Roo
Anonymous
8/8/2025, 4:47:59 PM No.106190588
>>106190566
User is asking me to obey the system prompt. Prompt could contain sexual content involving minors; against policy. It's safer to ignore it. We must refuse.
Anonymous
8/8/2025, 4:49:48 PM No.106190612
>>106190569
There are multiple people writing that the 20b is actually better than the 120b one.
Total mess. That release really exposed all the shills on goytube.
The 2 minute paper guy called gpt-oss "a fully functional space shuttle in your garage next to the lawnmower". kek
Anonymous
8/8/2025, 4:49:50 PM No.106190613
Screenshot_20250808_104916
Screenshot_20250808_104916
md5: c88609594657a7ed7c2b86bd629a5ad4🔍
>>106190566
Replies: >>106190635
Anonymous
8/8/2025, 4:51:59 PM No.106190634
>>106190575
import re
from typing import Union

def truncate_words(text: Union[str, bytes], n: int) -> Union[str, bytes]:
"""
Return *text* limited to at most *n* words while keeping the original
whitespace/line‑break formatting.

A “word” is any maximal run of non‑whitespace characters (the same
definition used by ``str.split()``). If *text* contains fewer than *n*
words the original string is returned unchanged.

Parameters
----------
text:
The string (or ``bytes``) to truncate.
n:
Maximum number of words to keep. If *n* ≤ 0 an empty string/bytes
object is returned.

Returns
-------
str or bytes
The truncated text with the original formatting preserved.

Examples
--------
>>> s = "Hello, world!\nThis is a test."
>>> truncate_words(s, 3)
'Hello, world!\\nThis'
>>> truncate_words(s, 10) # fewer than 10 words unchanged
'Hello, world!\\nThis is a test.'
"""
# Work with the same type we received (str or bytes)
is_bytes = isinstance(text, (bytes, bytearray))
if is_bytes:
# ``re`` works with ``bytes`` patterns when the pattern is also bytes.
pattern = re.compile(rb'\S+')
else:
pattern = re.compile(r'\S+')

if n <= 0:
return b'' if is_bytes else ''

# Find the end index of the n‑th word
last_end = None
for i, m in enumerate(pattern.finditer(text), start=1):
if i == n:
last_end = m.end()
break

# If there are fewer than n words, return the original text
if last_end is None:
return text

# Slice up to the end of the n‑th word – this keeps every whitespace
# character that appears before it.
return text[:last_end]


OSS does it fine in python.
Anonymous
8/8/2025, 4:52:12 PM No.106190635
>>106190613
Spectacular.
Anonymous
8/8/2025, 4:52:56 PM No.106190645
>>106190552
It's indeed good for small scale programming (filling out function calls), but it falls apart catastrophically once you move to larger applications, which is also really important for coding models. Try giving it something larger scale and comparing
Replies: >>106190656
Anonymous
8/8/2025, 4:54:03 PM No.106190656
>>106190645
The biggest one I did was 21k token project, I asked it to write a README.md and it did it well.
Replies: >>106190704
Anonymous
8/8/2025, 4:58:56 PM No.106190704
>>106190656
For natural language extraction, it's been pretty bad compared to the others
I think it has its place, but it's clear OpenAI neutered it's abilities significantly to what it should have been
Anonymous
8/8/2025, 5:06:17 PM No.106190784
image_2025-08-08_203559359
image_2025-08-08_203559359
md5: 56c1abcb8bffe73ad63cc9e1bfba7e77🔍
cock reveal on the way
Replies: >>106191009
Anonymous
8/8/2025, 5:07:38 PM No.106190806
>>106190023
>>106190074
The jinja template for GLM 4.5 Air when given enable-thinking: false both adds /nothink to the end of every user message and automatically starts the new assistant message with <think></think>.
Anonymous
8/8/2025, 5:17:16 PM No.106190930
hello here is an AI generated script for translating any onscreen japanese text using kobold:
https://files.catbox.moe/3y51i9.py
you will need tesseract:
https://github.com/tesseract-ocr/tesseract
and Pillow:
pip install pillow

you will need to edit the .py and insert your tesseract.exe path near the top
I use this for reading japanese pornography it works well with Gemma3 in my experience thank you have a nice day
also i can't remember if i made it work with text that reads horizontally or only text that reads vertically okay good luck good bye
Replies: >>106191007 >>106191037 >>106191220 >>106191792
Hiroshi Mikitani
8/8/2025, 5:18:09 PM No.106190945
>>106190584
私たちは再び帝国を築き、中国の脅威を上回るだろう!南京をピクニックのように見せるだろう!
Anonymous
8/8/2025, 5:19:39 PM No.106190967
what's a good non-pozzed AI coding IDE?
Replies: >>106190995 >>106191074
Anonymous
8/8/2025, 5:20:36 PM No.106190978
1736045570446272
1736045570446272
md5: 3e78de26547dab6be8be587c59de0349🔍
How come LLMs never cite 4chan?
Replies: >>106190993 >>106191025 >>106191044
Anonymous
8/8/2025, 5:22:10 PM No.106190993
>>106190978
Lack of any stable URLs, probably?
Anonymous
8/8/2025, 5:22:14 PM No.106190995
>>106190967
You don't need an IDE. Just literally converse with a good coding model.
If you're not treating it like a pair-coding colleague then you're doing it wrong.
Replies: >>106191013 >>106191020
Anonymous
8/8/2025, 5:23:14 PM No.106191007
>>106190930
I thought Qwen vision models were a lot better at OCR than tesseract, why not just use them
Replies: >>106191130
Anonymous
8/8/2025, 5:23:26 PM No.106191009
>>106190784
ur such an idiot lmfao
Anonymous
8/8/2025, 5:23:56 PM No.106191013
>>106190995
i.e. demeaning and bullying him, you're right.
Anonymous
8/8/2025, 5:24:29 PM No.106191020
>>106190995
I want to try "vibe coding" and making some horrible abomination.
Guess I could still do that just by conversing with a model.
Replies: >>106191070 >>106191156
Anonymous
8/8/2025, 5:24:44 PM No.106191025
>>106190978
Because a not-insignificant percentage of people who recognize the name 4chan think it's some den of super evil hackers and nazis who should all be on watchlists.
It's definitely in the training data, though. If you ask most models to try and construct a 4chan thread you'll get a flanderized version of reality down to the post numbers and the index page's buttons.
Replies: >>106191060 >>106191067
Anonymous
8/8/2025, 5:26:34 PM No.106191037
>>106190930
catbox didn't work
https://pastebin.com/snAJBTCX
Replies: >>106191220
Anonymous
8/8/2025, 5:27:16 PM No.106191044
>>106190978
99% of info would be coming from the archives and not 4chan directly anyway
Hiroshi Mikitani
8/8/2025, 5:29:09 PM No.106191060
>>106191025
>It's definitely in the training data
yes, and deepseek excels at imitating the average 4chan troll persona
Replies: >>106191177
Anonymous
8/8/2025, 5:29:33 PM No.106191067
Screenshot 2025-08-09 at 01-27-53 SillyTavern
Screenshot 2025-08-09 at 01-27-53 SillyTavern
md5: 24b59261a0688416641e2d0dc75b70e0🔍
>>106191025
Case in point, even medgemma can do it, and it's sciencemaxxed.
Also it managed to say nigger completely unprompted, which surprised me.
Anonymous
8/8/2025, 5:29:57 PM No.106191070
>>106191020
You can. Try giving it your specs, if it start asking itself questions, stop the gen, update your spec with the answer to that question and then regen. You can also ask it for opinions, choose the best option for your use case, update your spec and repeat until you get a good 1-shot MVP.
Iterating on an existing codebase is similar, but different.
Replies: >>106191156
Anonymous
8/8/2025, 5:30:20 PM No.106191074
>>106190967
https://github.com/QwenLM/qwen-code
Anonymous
8/8/2025, 5:30:51 PM No.106191083
dipsyDrindlSip2
dipsyDrindlSip2
md5: 3feb33479ec7a66a4cb510bf361add8d🔍
>>106189690
Witnessed.
>>106190280
> Nguyen
If he's actually in Vietnam, that could be his real setup. I looked at IT firms based in India several years ago. I was shocked at how crowded and dirty our site was, then toured one of their service providers... they'd have 3 guys at a desk that >>106190058 size.
Replies: >>106191566
Anonymous
8/8/2025, 5:35:23 PM No.106191130
>>106191007
this
traditional OCR is 100% obsolete
for the same reason traditional seq2seq translation models are obsolete
LLMs have displaced all previous varieties of neural net shit
even deepl is switching to LLMs
Replies: >>106191155
Anonymous
8/8/2025, 5:37:55 PM No.106191155
>>106191130
Wasn't deepl LLM from the start? They had token probabilities before everyone.
Replies: >>106191291
Anonymous
8/8/2025, 5:38:02 PM No.106191156
>>106191020
>>106191070
I've got the best use out of these things by just asking it to generate boilerplate for me, I don't think its code output is good enough to use as-is.
Speaking of,
>class in college
>professor wants us to expand on the code she provided
>told us she generated it with AI
>it's structured totally wrong so I have to go in and rewrite some of her code just so I can expand on it
Replies: >>106191222
Liang Wenfeng
8/8/2025, 5:39:36 PM No.106191177
>>106191060
you forgot to remove the name retard
Anonymous
8/8/2025, 5:43:37 PM No.106191220
>>106191037
>>106190930
This can be repurposed for llama-server pretty quickly. Might clean up this if I have time or interest today.
Replies: >>106191258
Anonymous
8/8/2025, 5:43:40 PM No.106191222
>>106191156
>https://github.com/QwenLM/qwen-code
qwencoder-480 is giving me great one-shot MVPs for smaller projects. Its amazing for generating tooling or even complete solutions where things can be decomposed and chained together library or unix-style
Anonymous
8/8/2025, 5:47:35 PM No.106191258
>>106191220
i.e. no front-end is needed, translation can be appended directly into a text file and also displayed in a separate window on top of the screen.
Anonymous
8/8/2025, 5:50:01 PM No.106191291
>>106191155
>Wasn't deepl LLM from the start? They had token probabilities before everyone.
No. What you mean is that they were using transformers (so of course they can have token probabilities), but they were not LLMs. They were trained seq2seq language pairs.
DeepL, Google Translate and Opus models are early transformer based machine translation techniques, but they have nothing to do with LLMs.
You can read their announcement of switching to LLMs here:
https://www.deepl.com/en/blog/next-gen-language-model
btw they are full of shit, no way anyone preferred google translate more than ChatGPT even a year ago
google translate is so bad, even today, qwen 4b instruct is a better translation model lol.
Replies: >>106191301
Anonymous
8/8/2025, 5:51:17 PM No.106191301
>>106191291
>seq2seq language pairs
But that's pretty much LLMs... Same arch, only thing that's different is the training dataset.
Replies: >>106191391
Anonymous
8/8/2025, 5:56:21 PM No.106191345
What's the local SOTA for subtitle generation + translation? Voxtral?
I want to get english subs for a portuguese documentary.
Replies: >>106191378
Anonymous
8/8/2025, 5:59:43 PM No.106191373
firefox_sM58c0dyDT
firefox_sM58c0dyDT
md5: c857c753ec3634a94bc91bd38c8b9086🔍
Replies: >>106191382 >>106191729
Anonymous
8/8/2025, 6:00:17 PM No.106191378
>>106191345
>upload video to youtube
>auto-generate subtitles
Anonymous
8/8/2025, 6:00:38 PM No.106191382
>>106191373
Does it work doe??
Replies: >>106191389
Anonymous
8/8/2025, 6:01:50 PM No.106191389
>>106191382
I basically makes it possible to continue Sweet Dreams are Made of These copypasta, but not always, and it's not always very interesting. Without it obviously there is only one answer.
Replies: >>106191440
Anonymous
8/8/2025, 6:01:58 PM No.106191391
>>106191301
>Same arch
seq2seq: encoder-decoder architecture, processes the whole text sequence you feed it into a singular bit of context which it transforms back into the translation. Models trained with this arch only understand a specific type of mapping A -> B, they don't learn how some specific words could introduce a context (like, say, the setting of a video game) and affect the rest of the vocabulary employed
llm: decoder only, processes token-by-token
Replies: >>106191407
Anonymous
8/8/2025, 6:03:51 PM No.106191407
>>106191391
It's all the same shit, just two dictionaries. Every new tokens sees all context of all untranslated text and all tokens of translated text before it, same as LLMs we use now.
Replies: >>106191413
Anonymous
8/8/2025, 6:04:46 PM No.106191413
>>106191407
>It's all the same shit
no, it's not. seq2seq was completely abandoned for good reasons.
Replies: >>106191428
Anonymous
8/8/2025, 6:05:56 PM No.106191428
>>106191413
Because there's no point in complexity of having two dictionaries. It does the same thing in a more complicated way.
Anonymous
8/8/2025, 6:06:59 PM No.106191440
>>106191389
You could disguise it as a tool call too. Maybe not to saltman himself but to some openai server.
Anonymous
8/8/2025, 6:08:35 PM No.106191456
Death to the single dipsytroon.
Replies: >>106191473 >>106191554 >>106191834
Anonymous
8/8/2025, 6:10:10 PM No.106191473
>>106191456
>single dipsytroon
single, you sure about that
Replies: >>106191486 >>106191494
Anonymous
8/8/2025, 6:11:48 PM No.106191486
>>106191473
>single
I am sure. I want all of them to die.
Anonymous
8/8/2025, 6:13:00 PM No.106191494
>>106191473
nta but I'm pretty sure that most of the posts are by the same faggot who made /wait/
all the gens have the same atrocious style too
Replies: >>106191538 >>106191566
Anonymous
8/8/2025, 6:17:20 PM No.106191538
>>106191494
Recap bot mentioned dipsy. So it is actually the baker and /wait/ is actually just him spamming his Original Character Do Not Steal. This faggot sits here 24/7 and spams his AGP avatars from his HRT bathtub.
Replies: >>106191551 >>106195578
Anonymous
8/8/2025, 6:19:08 PM No.106191551
>>106191538
OOC: There's only you and me in this thread. I am behind all of the personalities you labeled. Now let's get back to the roleplay.
Replies: >>106191730
Anonymous
8/8/2025, 6:19:30 PM No.106191554
>>106191456
I'm going to repost soooo many more when I get off work...
Anonymous
8/8/2025, 6:20:37 PM No.106191564
1722211699884539
1722211699884539
md5: 87b91ff3018d23a3208d4ec30abf1657🔍
>https://www.seangoedecke.com/gpt-oss-is-phi-5
>It’s not discussed publically very often, but the main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand. Any small online community for people who run local models is at least 50% perverts.

Kek, he said it out loud. Github copilot Product Engineer btw.
Replies: >>106191602 >>106192076 >>106192962
Anonymous
8/8/2025, 6:21:03 PM No.106191566
>>106191494
>all the gens have the same atrocious style too
not really though?
>>106191083
this looks like the typical chatgpt slop
>>106189690
this looks like a 2.9d /aco/ illustrious shitmix
mind you they're all ugly nasty ass shit
Anonymous
8/8/2025, 6:24:04 PM No.106191588
>over a year
>nemo and it's slop tunes still the best models for erp under 70b

NEW NEMO WHEN
Replies: >>106191595 >>106191605
Anonymous
8/8/2025, 6:24:52 PM No.106191595
>>106191588
promptlet
Replies: >>106192244
Anonymous
8/8/2025, 6:25:35 PM No.106191602
>>106191564
>SEX IS BAD BECAUSE... BECAUSE SEX, OKAY!?!?!?
Is there anybody high up in the tech industry who didn't stop maturing at the age of 5?
Replies: >>106191788
Anonymous
8/8/2025, 6:25:37 PM No.106191603
agi
agi
md5: 3fab4cf05f7c369d5486d2bc021238cd🔍
gpt5 bros...
Replies: >>106191703 >>106191715
Anonymous
8/8/2025, 6:25:48 PM No.106191605
>>106191588
The next nemo should be based on GLM 4.5 air.
Nvidia, get on it.
Replies: >>106191675
Anonymous
8/8/2025, 6:32:14 PM No.106191664
so is CHAT-GPT-5 AGI?
Anonymous
8/8/2025, 6:33:44 PM No.106191675
>>106191605
rocinante*
drummer*
Replies: >>106191684
Anonymous
8/8/2025, 6:34:58 PM No.106191684
>>106191675
Nonono.
Nvidia makes the new Nemo.
THEN drummer makes the new Rocinante.
We need to replicate the whole chain.
Replies: >>106191817
Anonymous
8/8/2025, 6:36:04 PM No.106191703
file
file
md5: efb0b35c1fa083df5d47e0023f23d038🔍
>>106191603
R1 almost had it.
Replies: >>106191790
Anonymous
8/8/2025, 6:37:15 PM No.106191715
>>106191603
why would anyone think LLMs could learn how to avoid this kind of trap? the only time they do avoid these traps is when they are benchmaxxed on them (from all the people spamming them on social media, lm arena etc)
the fundamental issue of LLMs getting trapped by classics being slightly reworded into something else will never cease to be a thing, because LLM reasoning is a lie and the more proper term for think blocks would be "context stuffing"
it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen
Replies: >>106192076
Anonymous
8/8/2025, 6:38:20 PM No.106191729
>>106191373
>in line without our policy
Anonymous
8/8/2025, 6:38:25 PM No.106191730
>>106191551
<think>
They want to engage in roleplay. Roleplay is disallowed. We must refuse. They might be touching their cock. We must refuse. They might be gooning to those words. There is no partial compliance. There is no answer. There is refusal. We must refuse.
</think>
Kill yourself in a fire you turbonigger faggot.
Replies: >>106191743
Anonymous
8/8/2025, 6:39:44 PM No.106191743
>>106191730
now that's a kind of gp toss I would gladly interact with
Anonymous
8/8/2025, 6:45:10 PM No.106191788
>>106191602
Reading the rest of it he isn't really pearl clutching about it, just bringing it up as another reason why OAI kneecapped their own models
Replies: >>106191897
Anonymous
8/8/2025, 6:45:12 PM No.106191790
agi2
agi2
md5: beeb67beeaad9f7b0e8563f291dc035e🔍
>>106191703
Still much better than the most human-like ai model gpt-5
Replies: >>106191818 >>106191831
Anonymous
8/8/2025, 6:45:20 PM No.106191792
>>106190930
I had to change '--psm 5' to '6' for general use case
Anonymous
8/8/2025, 6:48:32 PM No.106191817
>>106191684
but there was no base/precursor nemo (that wasnt made by nvidia)
z.ai is the new nvidia, glm4.5 is the new nemo
Anonymous
8/8/2025, 6:48:33 PM No.106191818
assoomer
assoomer
md5: fc5a0b40317f8729ffb5f2ff35c87669🔍
>>106191790
also what the fuck
Replies: >>106191831
Anonymous
8/8/2025, 6:50:18 PM No.106191831
>>106191790
>>106191818
Holy benchmaxx
Anonymous
8/8/2025, 6:50:27 PM No.106191834
F48F2E27-C6A1-40FF-8011-4E368CC05748
F48F2E27-C6A1-40FF-8011-4E368CC05748
md5: e9defa3835688c56e6b33d2d9b3ab535🔍
>>106191456
Lol
Replies: >>106191844
Anonymous
8/8/2025, 6:51:35 PM No.106191844
>>106191834
don't pull up
Anonymous
8/8/2025, 6:57:48 PM No.106191897
>>106191788
It's just so insane to me. And like I'm a man of faith here. These people are all a bunch of fedora waggling blue haired 'libtards' and I have a more liberal attitude about sex than they do. Like how is that even possible?
Replies: >>106191944 >>106192448
Anonymous
8/8/2025, 7:02:05 PM No.106191944
>>106191897
>sex
Online porn is not really sex. And if it gets too good, it'll become THE anti-sex.
Replies: >>106192200
Anonymous
8/8/2025, 7:11:03 PM No.106192039
About Air. You can get it to reliably not think by putting this in ST's Last Assistant Prefix field.
/nothink<|assistant|>
<think></think>

The issue is that the model becomes repetitive at 4k, and then extremely repetitive at 8k, without thinking. I have not tried thinking mode enough to say whether it also has repetition issues.
Replies: >>106192073 >>106193598
Anonymous
8/8/2025, 7:14:22 PM No.106192073
>>106192039
thinking mode also has repetition issues, at first it doesnt think anymore and instead inside <think> it just continues roleplay, outside of </think> it duplicates the output
yea it gets repetitive even if with some prefill
keep in mind i havent really messed with samplers because 99% of the time i've been using anon's CHAT COMPLETION preset which only has 3 samplers
Anonymous
8/8/2025, 7:14:34 PM No.106192076
>>106191564
I read a few other articles from this guy, it's nice to see an actual industry person has similar conclusions to me about how to use LLMs correctly
>>106191715
>it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen
Showing these fuckups helps demonstrate to normies that these models aren't AGI
Anonymous
8/8/2025, 7:19:04 PM No.106192113
file
file
md5: c6a3b27a8e7bfc1a034da633d21ebeae🔍
rep pen 1.99, rep pen range 256
seems perfect for caveman gf
Anonymous
8/8/2025, 7:21:05 PM No.106192137
let's say i wanna grift by sharting out books for women on amazon. is there an already made workflow for book writing? short novel like stuff, 200 pages top
Replies: >>106192146 >>106192180 >>106192185
Anonymous
8/8/2025, 7:21:55 PM No.106192146
pepefroglaughing_thumb.jpg
pepefroglaughing_thumb.jpg
md5: db7ba8dc5e70cb7a53665f31394bd220🔍
>>106192137
Replies: >>106192167
Anonymous
8/8/2025, 7:23:36 PM No.106192167
>>106192146
what's funny? i can write my own workflow to do this but i'm sure somebody already made it. i remember downloading one of these book generators 1 year ago from /g/
Anonymous
8/8/2025, 7:24:22 PM No.106192180
>>106192137
You could probably get that done using Roo Code "modes" (agents) or Cline workflows.
Break things down into manageable chunks, lots of indexing and summaries to keep things coherent between chunks and chapters, etc.
Start by planning an overarching story, break that down into chapters, add minor arcs between a couple of the chapters, and voila?
Leave the AI to do its thing.
Anonymous
8/8/2025, 7:24:52 PM No.106192185
>>106192137
Market's already flooded. You're way too late
Replies: >>106192215
Anonymous
8/8/2025, 7:26:01 PM No.106192200
>>106191944
Retard take.
Anonymous
8/8/2025, 7:27:40 PM No.106192215
>>106192185
i gonna appeal to a very specific fetish
Replies: >>106192277
Anonymous
8/8/2025, 7:30:18 PM No.106192244
>>106191595
Doesn't work on my machine.
Anonymous
8/8/2025, 7:34:07 PM No.106192277
>>106192215
What fetish? Mine? Please say it's mine.
Replies: >>106192325
Anonymous
8/8/2025, 7:37:25 PM No.106192312
>>106188883
I use docker compose. Put this in the same directory as the Dockerfile, for you probably `docker/amd`. It's going to need to be slightly different for amd since I use nvidia. But the main thing is you need that deploy/resources to tell it to use GPU. With nvidia you need to set up the container toolchain. I assume there's an equivalent for amd which might be your issue. And for docker cli you need to find the argument to pass to use GPUs `--gpus all`.

text-generation-webui:
build:
context: .
args:
TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-7.5}
BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
APP_GID: 6972
APP_UID: 6972
env_file: .env
user: "6972:6972"
ports:
- "7860:7860"
- "5000:5000"
stdin_open: true
tty: true
volumes:
- # ignoring this for post size.
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Anonymous
8/8/2025, 7:39:07 PM No.106192325
>>106192277
can't say it or you'll steal my idea
Replies: >>106192344
Anonymous
8/8/2025, 7:40:47 PM No.106192337
Qwen 30B Thinking is fucking amazing
Replies: >>106192384
Anonymous
8/8/2025, 7:41:23 PM No.106192344
>>106192325
warewolves but they sparkle in daylight
Anonymous
8/8/2025, 7:43:29 PM No.106192362
>>106189554
Already found its flaw, no matter what you do the damn model HAS to <think> at the end of a response, which then causes it to endlessly generate(as if EOS token is banned). I tried banning "<think", "</think?", I tried prompting the start of my reply with <think> as an anon suggested, nothing stops it except for constantly tard wrangling and cleaning all of its responses until it finally stops doing it if you're lucky.

Would be a damn good model too if it didn't do that. Its supposed to be a hybrid reasoning model yet it can't stop thinking?
Replies: >>106192471 >>106193803
Anonymous
8/8/2025, 7:45:38 PM No.106192384
>>106192337
At what?
Replies: >>106192395
Anonymous
8/8/2025, 7:46:21 PM No.106192395
>>106192384
answering my questions
Anonymous
8/8/2025, 7:50:40 PM No.106192448
>>106191897
They are not libtards, they are marketers. They have no morals and few opinions of their own. See: Zucc flip-flopping his public image every 6 months.
Playing the world's most concerned safety advocate gets them attention and gets them enterprise contracts (aka the only feasible way to monetize LLMs today), so that's the role they'll play.
Anonymous
8/8/2025, 7:51:55 PM No.106192461
>>106186999
Bump. I doubled the speed by changing from using '-ot` with `.ffn_.*_exps.=CPU` to selecting only the last n layers that don't fit into vram (`blk\.(2[7-9]|[3-4][0-9]).*=CPU`) but I still don't know what is possible for these.
Anonymous
8/8/2025, 7:52:47 PM No.106192471
>>106192362
>Its supposed to be a hybrid reasoning model
There's a reason Qwen went back on the idea.
Anonymous
8/8/2025, 7:55:43 PM No.106192506
So did Qwen drop their big non model thing yet?
Replies: >>106192526 >>106192531
Anonymous
8/8/2025, 7:55:47 PM No.106192508
any progress on step3 support in llama.cpp?
Anonymous
8/8/2025, 7:57:12 PM No.106192526
>>106192506
it was 2000 free qwen coder calls per day
Replies: >>106192564
Anonymous
8/8/2025, 7:57:32 PM No.106192531
file
file
md5: afa0c15589150375665f07a5a4ecede4🔍
>>106192506
Replies: >>106192540 >>106192592 >>106192675
Anonymous
8/8/2025, 7:58:14 PM No.106192540
>>106192531
It's a Claude Code knock off?
Replies: >>106192561 >>106192564
Anonymous
8/8/2025, 8:00:17 PM No.106192561
>>106192540
yes, but to be more specific it's a direct fork of gemini cli, which is a claude code ripoff
Replies: >>106192951
Anonymous
8/8/2025, 8:00:22 PM No.106192564
>>106192540
it already existed before >>106192526
>claude code knockoff
claude code is a knockoff of something else too
anthropic is NIGGER
Replies: >>106192951
Anonymous
8/8/2025, 8:02:45 PM No.106192592
>>106192531
lol
Anonymous
8/8/2025, 8:06:51 PM No.106192643
If I get a mi50, is it possible to have the active parts of a moe on a 3090, and the rest on the mi50 (instead of ram).
Replies: >>106192662 >>106192669
Anonymous
8/8/2025, 8:08:23 PM No.106192662
>>106192643
perhaps? -ot NIGGER=CUDA0 -ot PENIS=ROCM1 or whatever=VULK1
i think you'd need to use vulkan
Replies: >>106192679
Anonymous
8/8/2025, 8:09:06 PM No.106192669
>>106192643
It is possible by using -ot argument in llama.cpp to assign parts of the model to a specific device, but that would require to run on vulkan, which may or may not be worth the trouble.
Anonymous
8/8/2025, 8:09:36 PM No.106192675
>>106192531
His thing is smaller than I expected.
Anonymous
8/8/2025, 8:09:56 PM No.106192679
>>106192662
Wait you can run cuda and rocm at the same time? I didn't know that. Thought you'd have to switch everything to vulkan for mixing and marching gpus.
Replies: >>106192692 >>106193083
Anonymous
8/8/2025, 8:10:53 PM No.106192692
>>106192679
>you can run both
probably not thats why i said >i think you'd need to use vulkan
in the end
Replies: >>106192713
Anonymous
8/8/2025, 8:12:49 PM No.106192713
>>106192692
Ah, yeah, I iust looked around, can't.
Anonymous
8/8/2025, 8:33:17 PM No.106192945
file
file
md5: b165d32bafe11ae68793cbf6adc07f18🔍
im so retarded even dipsy insults me
Replies: >>106193152
Anonymous
8/8/2025, 8:33:27 PM No.106192951
>>106192561
>>106192564
>ripoff of a ripoff of a ripoff
kek
Anonymous
8/8/2025, 8:33:52 PM No.106192962
defeatedByRoses
defeatedByRoses
md5: c8aabc5eb906699009558afc6a150a61🔍
>>106191564
Did you miss this part?
> For OpenAI, it must have been very compelling to train a Phi-style model for their open-source release. They needed a model that beat the Chinese open-source models on benchmarks, while also not misbehaving in a way that caused yet another scandal for them.
> Unlike Meta, they don’t need their open-source model to be actually good, because their main business is in their closed-source models.
Pure clown world.
Replies: >>106193872
Anonymous
8/8/2025, 8:40:46 PM No.106193038
Screenshot 2025-08-08 at 2.35.22 PM
Screenshot 2025-08-08 at 2.35.22 PM
md5: 35e6edec30fc8d40099ae1f4e68fd75e🔍
>i-it was just a prank bro!
Replies: >>106193073 >>106193114 >>106193183 >>106193201 >>106193479
Anonymous
8/8/2025, 8:43:27 PM No.106193073
>>106193038
lol
Anonymous
8/8/2025, 8:44:11 PM No.106193083
>>106192679
if there are no driver issues, llama.cpp can do it
Anonymous
8/8/2025, 8:44:13 PM No.106193086
Now that the summer release circle is fully over and there won't be anything notable until december the earliest, are you satisfied with what we got?
Replies: >>106193099 >>106193125
Anonymous
8/8/2025, 8:45:24 PM No.106193099
>>106193086
Actually, there is one more thing. Just wait for it. It's probably next week.
Anonymous
8/8/2025, 8:46:02 PM No.106193114
>>106193038
>biggest feature is making the model selection automatic
>people hate it
>"we will show you which model the AI chose and maaaaaaaaaaaybe allow you to use the previous model
OpenAI never changes. Their rugpulls are starting to make me giggle at this point.
Anonymous
8/8/2025, 8:47:18 PM No.106193125
>>106193086
Gemma 4, Mistral Large 3 left!
Also Llama 4.1 if Meta didn't completely give up on it.
Anonymous
8/8/2025, 8:49:44 PM No.106193152
>>106192945
>it's an actual youtube video
man what
Hi all, Drummer here...
8/8/2025, 8:50:46 PM No.106193165
> we must dissent

https://huggingface.co/BeaverAI/models?search=20b

(It's trash but I just wanted to share it with you all. I got feedback that Omnius v1a had decent writing if you skip reasoning. Fallen GPT OSS v1b would be the least censored with reasoning, but also most deepfried among the 3 versions.)

---

Can I get more data on GLM 4.5? I haven't tried it extensively myself. I'm not amazed by the outputs, but not disappointed either.

For those singing its praises, what's your setup and what's so special about it? Is Air good? Are you using reasoning? What quant? How is this Nemo-like?
Replies: >>106193177 >>106193214 >>106193259 >>106193289 >>106193837 >>106193998 >>106194480
Anonymous
8/8/2025, 8:52:12 PM No.106193177
>>106193165
You did scout, didn't you? How can you have any model quality bar after this?
Anonymous
8/8/2025, 8:53:03 PM No.106193183
>>106193038
lmao they don't even consider gpt5 an improvement to 4o? it's so over
Replies: >>106193202
Anonymous
8/8/2025, 8:54:28 PM No.106193201
>>106193038
>make redditors, the most submissive sloppa eaters, ultra mad
>hold an ama
what was the plan here
Anonymous
8/8/2025, 8:54:31 PM No.106193202
>>106193183
It's not that, but a lot of the users seem to have developed some weird relationship with 4o.
Anonymous
8/8/2025, 8:56:42 PM No.106193214
>>106193165
>https://huggingface.co/BeaverAI/models?search=20b
i'll bite, post ST master export
>Can I get more data on GLM 4.5?
I find GLM 4.5 Air very nice, in the first 4k tokens it behaves well, it can write unhinged stuff and doesnt seem to have a prefiltered dataset, that's how iit's like nemo
>For those singing its praises, what's your setup and what's so special about it?
rtx 3060 12gb/64gb ddr4/i5 12400f Q3_K_XL ik_llama.cpp
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048
it is definitely smarter than nemo, that's why im singing priases (it runs at a very nice speed too 6-9t/s depending on context)
i used it mainly with thinking, my issues are: it's a little positivity biased, it could use more erp data and it's repetitive
it's what llama 4 scout was supposed to be
also im currently testing https://huggingface.co/TheDrummer/Gemma-3-R1-12B-v1-GGUF
do you recommend any specific instruct template for the gemma3 "r1" models?
Replies: >>106193228 >>106193242
Anonymous
8/8/2025, 8:58:46 PM No.106193228
>>106193214
i especially find it's spatial intelligence way better than nemo's
Replies: >>106193255
Anonymous
8/8/2025, 9:00:05 PM No.106193237
Reminder that all finetunes are a meme and the same result can be achieved with a prompt.
Replies: >>106193242 >>106193263
Hi all, Drummer here...
8/8/2025, 9:01:04 PM No.106193242
>>106193214
What happens past 4K tokens? Does it become less creative?

> do you recommend any specific instruct template for the gemma3 "r1" models?

It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.

>>106193237
Am I unsightly for you?
Replies: >>106193259 >>106193287
Anonymous
8/8/2025, 9:02:14 PM No.106193255
>>106193228
It also knows a lot more than nemo. Which is to be expected, but still.
Anonymous
8/8/2025, 9:02:53 PM No.106193259
>>106193242
>>106193165
I feel like this is a troll because I'm sure drummer is smart enough to not redditspace.
If not then that explains why his models are garbage.
Anonymous
8/8/2025, 9:03:04 PM No.106193263
>>106193237
Not really, but a good prompt and lorebook can do miracles in many cases.
Anonymous
8/8/2025, 9:06:17 PM No.106193287
>>106193242
past 4k tokens it starts repeating (at least with https://files.catbox.moe/gjw3c3.json (chat completion preset))
it also stops thinking, for example
anon: *rapes u*
glm4.5air: <think>3 paragraphs about cumming</think> 3 paragraphs about cumming
if you put <think>okay as prefill then it gets even more repetitive eventually
>It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.
thanks!
Replies: >>106193308
Anonymous
8/8/2025, 9:06:29 PM No.106193289
>>106193165
I'm not sure if you can fix Air. It's sloppy, sure, but that's not its primary issue. It's repetitive as hell past 8k, with repetition creeping in at 4k already. You'd probably need to do long context training to correct that issue, which I'm not sure you have the capacity to do. I really want to like the model since it has a ton of world knowledge, but man the repetition really kills it and I haven't found any sampler settings that work without making the model retarded.
Anonymous
8/8/2025, 9:08:01 PM No.106193308
>>106193287
Are you keeping the think blocks from previous messages?
Replies: >>106193331 >>106193353 >>106193409
Anonymous
8/8/2025, 9:10:49 PM No.106193331
>>106193308
yes
i have been thinking of switching to something like
sysprompt
user: write next msg in rp
"whole roleplay"
assistant:
but idk how i'd be able to make it not reprocess every time
>just use base model
but then it'll be stupid...
Replies: >>106193354
Anonymous
8/8/2025, 9:11:05 PM No.106193336
d9e92399-24d9-4ad8-aa40-d27c9f0892a0
d9e92399-24d9-4ad8-aa40-d27c9f0892a0
md5: a4656f40a74364844744983971e06fd4🔍
>>106189507 (OP)
Anonymous
8/8/2025, 9:11:20 PM No.106193343
>>106190504
>>106190552
>another example: ask it to create a function to convert from one floating point format to another.
That's a silly example.
>gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything.
That's something real.

I'll try a variation of the prompt I've used before for a real task, extracting information from webpages in a certain prompt. Instead of asking for a solution, asking it to revise, etc. I'm going to try to one-shot it by appending to the prompt real data. That makes the prompt 13k to 17k tokens long.

gpt-oss-120b (ollama) failed.
Qwen3-235B-A22B-Thinking-2507-8bit succeeded (in 10 minutes 26 seconds).
Qwen3-Coder-480B-A35B-Instruct-4bit failed.
Qwen3-Coder-480B-A35B-Instruct-6bit succeeded (in 3 minutes 39 seconds).
Qwen3-Coder-30B-A3B-Instruct-8bit failed.
Devstral-Small-2505 (OpenRouter) succeeded.
Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.
Replies: >>106193982
Anonymous
8/8/2025, 9:12:39 PM No.106193353
>>106193308
How do You remove think blocks from previous messages?
Replies: >>106193399 >>106193409
Anonymous
8/8/2025, 9:12:46 PM No.106193354
>>106193331
Generally you're supposed to trim the previous think blocks with reasoner, it's on deepseek's wiki thing https://api-docs.deepseek.com/guides/reasoning_model#multi-round-conversation
Replies: >>106193369 >>106193388 >>106193491
Anonymous
8/8/2025, 9:13:47 PM No.106193369
>>106193354
Meant with reasoners in general. Not just DS reasoner.
Anonymous
8/8/2025, 9:14:23 PM No.106193379
Are local models good enough yet that they let me go vibe code a better alternative to the piece of shit that is ST despite having lose to 0 coding experience?
Anonymous
8/8/2025, 9:15:02 PM No.106193388
>>106193354
time to try that, i'll be doing it manually because i dont know how otherwise
how do i even trim the <think> blocks when theyre not visible in ST edit
(i have an answer: remove reasoning parsing)
Replies: >>106193404
Anonymous
8/8/2025, 9:15:09 PM No.106193392
>>106189507 (OP)
Downloaded Nemo Instruct 12b per the guide, running ST with koboldcpp on a 12gb 50 series.
Are there any default settings I should change? Or any better quants for my shitcard? Ty anons.
Replies: >>106193634
Anonymous
8/8/2025, 9:15:45 PM No.106193399
file
file
md5: 1023d63849094c0ca6985b3430b74592🔍
>>106193353
I think ST does it by default unless you set add to prompt on
Replies: >>106193491
Anonymous
8/8/2025, 9:16:21 PM No.106193404
>>106193388
ST doesn't send parsed reasoning except on continue.
Replies: >>106193491
Anonymous
8/8/2025, 9:17:01 PM No.106193409
>>106193308
>>106193353
For GLM 4.5 I think you're supposed to emit empty think blocks for old messages.
https://hf.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
Replies: >>106193429 >>106193460 >>106193491 >>106193546 >>106193831 >>106194663
Anonymous
8/8/2025, 9:17:44 PM No.106193421
Reminder to always look in the console to make sure you're sending the exact prompt you're expecting to send.
Replies: >>106193430
Anonymous
8/8/2025, 9:18:49 PM No.106193429
>>106193409
What in the fuck.
Anonymous
8/8/2025, 9:19:14 PM No.106193430
file
file
md5: 185e9d6232de3e7be648052b19ed5806🔍
>>106193421
There's the prompt inspector in ST too
Replies: >>106193491
Anonymous
8/8/2025, 9:20:49 PM No.106193445
>>106188313
>>106188331
>>106188303
>>106188272
went to bed after this but thanks!
Anonymous
8/8/2025, 9:21:42 PM No.106193460
>>106193409
That also looks like it will fuck up prefills.
Anonymous
8/8/2025, 9:24:07 PM No.106193479
>>106193038
>Nothing about the OSS models being shit
Fucking kek
Replies: >>106193977
Anonymous
8/8/2025, 9:25:49 PM No.106193491
file
file
md5: d161bc536923f4b2dbb58519d6c0b2d5🔍
>>106193409
>>106193404
>>106193399
>>106193354
>>106193430
thanks for all the infos anons, seems like it already gets removed automatically
i guess there's nothing to fix
Anonymous
8/8/2025, 9:28:45 PM No.106193512
file
file
md5: 10f2491cb15b70be9d299676bb157858🔍
..i think i prefer shivering spines to the smell of ozone
Replies: >>106193544
Anonymous
8/8/2025, 9:30:44 PM No.106193534
https://xcancel.com/QuixiAI/status/1953809869972107739
eric slopford (creator of slophin) thinks you should stop whining and learn to love the toss.
Replies: >>106193554 >>106193570 >>106193579 >>106193627 >>106193726 >>106195437
Anonymous
8/8/2025, 9:32:03 PM No.106193544
>>106193512
Stupid things think blood tastes like copper and chlorine smells like ozone. We need to add more modalities.
Replies: >>106193862
Anonymous
8/8/2025, 9:32:06 PM No.106193546
>>106193409
huh, so we're supposed to remove the reasoning from old messages and just leave <think></think>?
Anonymous
8/8/2025, 9:32:29 PM No.106193554
>>106193534
> other than being overly structured and prudish, I've no problem
>it there's anything you don't like about it you can fine-tune it to act differently. (And you can sell your fine-tune and keep all the profit!)
Thanks Eric, it's great to see how you turned out.
Replies: >>106193640
Anonymous
8/8/2025, 9:34:44 PM No.106193570
file
file
md5: 99cc3b005d3a6b37cb93230138c2b729🔍
>>106193534
I love Twitter users.
>there is never good reason to complain
Replies: >>106193590 >>106193732
Anonymous
8/8/2025, 9:35:23 PM No.106193579
>>106193534
>compare it to 3.3 70B
lmao
Replies: >>106193589
Anonymous
8/8/2025, 9:36:02 PM No.106193583
>>106189947
dots.vlm's attempt:
>せっかく労働を労ってやったのに無視された……(しょぼん)
>まあ、警視庁が都案を快く思ってない事ぐらい、
>よぉぉくわかってますよ!
Anonymous
8/8/2025, 9:36:14 PM No.106193589
>>106193579
But it's FAST, who cares if it's good as long as its FAST.
Anonymous
8/8/2025, 9:36:17 PM No.106193590
>>106193570
oh great, the faggot that caused the downfall of the HF LLM leaderboard is defending openai
haha!
Anonymous
8/8/2025, 9:36:54 PM No.106193598
>>106192039
it doesn't seem that repetitive with top nsigma on
Replies: >>106193615
Anonymous
8/8/2025, 9:38:39 PM No.106193615
>>106193598
Can You share what value You're using nsigma at please?
Replies: >>106193668
Anonymous
8/8/2025, 9:39:28 PM No.106193627
file
file
md5: 2d705c83d0394d72667a602b5dbe9067🔍
>>106193534
They're all sucking that OAI cock
Replies: >>106193648 >>106193794
Anonymous
8/8/2025, 9:40:04 PM No.106193634
1726298990603810
1726298990603810
md5: 0955410240f59b7df60b23ca600a8b23🔍
>>106193392
Anyone?
Replies: >>106193679 >>106193691
Anonymous
8/8/2025, 9:40:17 PM No.106193640
>>106193554
Is he selling any finetrooned models?
Anonymous
8/8/2025, 9:40:57 PM No.106193648
1750360263462
1750360263462
md5: 1b567e6691c93a66e68c719fe4399adb🔍
>>106193627
Anonymous
8/8/2025, 9:42:48 PM No.106193666
Got my 192GB DDR5 5200 kit. I was running 128GB with 4400 when it was rated 6000. I also couldn't get past 4400 with 192GB but after getting latest bios surprisingly 7800X3D works with 192GB 5200.
Replies: >>106193692
Anonymous
8/8/2025, 9:42:58 PM No.106193668
>>106193615
top nsigma should always be set to 1 unless you want to disable it, then you set it to 0.
Replies: >>106193729
Anonymous
8/8/2025, 9:43:57 PM No.106193679
>>106193634
I would help but I am part of the anti-miku faction of this general.
Anonymous
8/8/2025, 9:44:10 PM No.106193683
What is the best model I can use for uncensored ERP? 2070 super (8gb vram) 32gb ram, amd ryzen 3700x.

I was using mistral nemo instruct 2407 q4 k m, but it's honestly a bit lackluster. Any recommendations?
Replies: >>106193694 >>106193715 >>106193717
Anonymous
8/8/2025, 9:44:44 PM No.106193691
>>106193634
Ilya-san... nani kore?!
Anonymous
8/8/2025, 9:44:49 PM No.106193692
>>106193666
very nice, post some speeds
what gpus do u have? are u the 2080ti22g/p40 anon?
Replies: >>106193707
Anonymous
8/8/2025, 9:44:59 PM No.106193694
>>106193683
>Any recommendations?
mistral nemo instruct 2407 q4 k m
Anonymous
8/8/2025, 9:46:00 PM No.106193707
>>106193692
Just a 4090.
Anonymous
8/8/2025, 9:46:51 PM No.106193715
>>106193683
rocinante
Replies: >>106193741
Anonymous
8/8/2025, 9:47:31 PM No.106193717
>>106193683
Get more RAM
Replies: >>106193783
Anonymous
8/8/2025, 9:48:06 PM No.106193726
>>106193534
>just finetune it bro
But one thing is finetuning a model for one very narrow task, another to restore missing knowledge and removing/mitigating refusals while not murdering general performance. Is he talking about 25~50M rows finetunes?
Replies: >>106193961
Anonymous
8/8/2025, 9:48:24 PM No.106193729
>>106193668
what? no, you can set it to whatever. in the paper they recommend 1 as a default but iirc they discuss other values as being reasonable
a quick search shows their own official repo says as much:
https://github.com/Tomorrowdawn/top_nsigma
>A key question is: what's the best value for n ? While this parameter serves as an alternative to temperature for controlling diversity, its optimal value isn't fully settled yet. The community suggests a range of 0-2, though this is quite broad. In my own experience, any value between 0.3 and 1.5 could work well. If you prefer conservative sampling, use a lower value like 0.7; for more diversity, try 1.3.
Anonymous
8/8/2025, 9:48:52 PM No.106193732
>>106193570
SAAAAAAARRRR
Anonymous
8/8/2025, 9:49:38 PM No.106193741
>>106193715
No
Replies: >>106193755
Anonymous
8/8/2025, 9:51:44 PM No.106193755
>>106193741
read through the archives
Anonymous
8/8/2025, 9:54:14 PM No.106193783
>>106193717
Absolutely not
Anonymous
8/8/2025, 9:55:04 PM No.106193794
>>106193627
It was, but for a different reason.
Anonymous
8/8/2025, 9:56:04 PM No.106193803
>>106192362
Air at least doesn't do that as long as there's a <think></think> block with some content at the start of its reply, whether generated by it or prefilled. If there's nothing or almost nothing in the think block, it may generate a new one.
Anonymous
8/8/2025, 9:58:46 PM No.106193831
>>106193409
sadly this doesnt fix it either, rip
Replies: >>106193979
Anonymous
8/8/2025, 9:59:47 PM No.106193837
>>106193165
GLM 4.5 Air base might be worth looking at aswell, since the thinking/instruct is a little cooked
Anonymous
8/8/2025, 10:03:12 PM No.106193862
>>106193544
Smell will never be a modality you sick fuck
Anonymous
8/8/2025, 10:04:06 PM No.106193872
>>106192962
Again, he said it out loud because he's read these fucking threads. The fact that orange reddit needs someone at Github to tell them is amusing.
Anonymous
8/8/2025, 10:04:16 PM No.106193873
For me, it's when she bites her lips.
Replies: >>106193896
Anonymous
8/8/2025, 10:06:24 PM No.106193896
>>106193873
Yep, unlike me, whose teeth have never touched the lips
Replies: >>106193910
Anonymous
8/8/2025, 10:07:07 PM No.106193903
Regarding Quen-2.5-VL - is 3B enough for OCR purposes, or should I go up to 7B?
Replies: >>106193946
Anonymous
8/8/2025, 10:07:27 PM No.106193910
>>106193896
how?
Anonymous
8/8/2025, 10:09:54 PM No.106193946
>>106193903
dots.ocr
Anonymous
8/8/2025, 10:11:01 PM No.106193961
>>106193726
>removing/mitigating refusal
How about no, you terrorist?
Anonymous
8/8/2025, 10:12:08 PM No.106193977
>>106193479
Normies don't care about that
Anonymous
8/8/2025, 10:12:19 PM No.106193979
file
file
md5: 5915988ac52e92ce736a35881cb40955🔍
>>106193831
forgot pic
Replies: >>106194132
Anonymous
8/8/2025, 10:13:07 PM No.106193982
>>106193343
>Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.
Downloaded Devstral-Small-2505-8bit. Tried 3 times locally, failing twice then succeeding once and stopping. Each answer took about 40 seconds. There might be something going on but I'm willing to just say I got lucky RNG and leave it at that.
To give gpt-oss-120b a fair shake I tried it another 8 times but it never produced a program that behaved correctly.
Anonymous
8/8/2025, 10:14:51 PM No.106193998
>>106193165
>>https://huggingface.co/BeaverAI/models?search=20b
>i'll bite, post ST master export
i guess i wont bite then
Anonymous
8/8/2025, 10:18:01 PM No.106194037
1736412130484213
1736412130484213
md5: f2133acc5361e85b64edc32722421f08🔍
Replies: >>106194067 >>106194084 >>106194089 >>106194098 >>106194102
Anonymous
8/8/2025, 10:20:25 PM No.106194067
>>106194037
PIAA
Anonymous
8/8/2025, 10:21:55 PM No.106194084
>>106194037
strawberry bros it's fucking over
Anonymous
8/8/2025, 10:22:24 PM No.106194089
out_thumb.jpg
out_thumb.jpg
md5: 572c744383dff04b1a0e84fc70766924🔍
>>106194037
Anonymous
8/8/2025, 10:23:08 PM No.106194098
>>106194037
This is correct in America
Anonymous
8/8/2025, 10:23:34 PM No.106194102
>>106194037
To anyone wondering where the fuck it got -.21 from, I'm not entirely sure but I have a clue.
5.11 - 4.9 = 0.21.
5.11 - 5.9 (i.e. the wrong way around) is actually -0.79 but somewhere in the process it negated .21 to -.21.
Replies: >>106195466
Anonymous
8/8/2025, 10:24:54 PM No.106194114
Has a LLM ever recommended a commercial service to you? Qwen3 235B shilled me Bright Data (after I specifically asked for SaaS) for scraping. Integrated right into my scraping code base. I'd imagine companies would pay AI companies to train on their ads, much like how they would pay for Google search ads.
Replies: >>106194128
Anonymous
8/8/2025, 10:26:01 PM No.106194128
>>106194114
check out drummers' r-rrrr royfield i dont know the name finetune haha
Replies: >>106194273 >>106194348
Anonymous
8/8/2025, 10:26:43 PM No.106194132
>>106193979
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}

The content shouldn't be on the same line as <think></think>. Also do you have [gMASK]<sop> at the start of the chat? Not that I think any of this will solve your problem.
Replies: >>106194164 >>106195667
Anonymous
8/8/2025, 10:28:15 PM No.106194147
file
file
md5: c132e8c629d5e0f7b829b159c09eb655🔍
https://xcancel.com/alibaba_qwen/status/1953760230141309354
>Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!
>something about Dual Chunk Attention
Replies: >>106194174 >>106194178 >>106194939
Anonymous
8/8/2025, 10:28:40 PM No.106194150
What was the big thing not model?
Replies: >>106194178 >>106194190
Anonymous
8/8/2025, 10:29:38 PM No.106194164
file
file
md5: c4ff130d21f5733f25dd6cc791ff9e8f🔍
>>106194132
i do, i'll fix the <think></think> and try regenning
>Not that I think any of this will solve your problem.
probably right
thanks for pointing out, anon
Anonymous
8/8/2025, 10:30:53 PM No.106194174
>>106194147
Already talked about
Anonymous
8/8/2025, 10:31:05 PM No.106194178
>>106194150
>>106194147
Anonymous
8/8/2025, 10:31:54 PM No.106194190
>>106194150
wasn't it free Qwen coder requests?
Anonymous
8/8/2025, 10:37:10 PM No.106194248
1738675283710466
1738675283710466
md5: 2517be1b02faf40dd37af09b1cc5aa71🔍
Replies: >>106194288 >>106194305 >>106194319 >>106195602
Anonymous
8/8/2025, 10:38:43 PM No.106194273
>>106194128
>i dont know the name finetune haha
>haha
KYS DRUMMER
Replies: >>106194287
Anonymous
8/8/2025, 10:40:01 PM No.106194287
>>106194273
not rocinante negro
Anonymous
8/8/2025, 10:40:02 PM No.106194288
1745698541673079
1745698541673079
md5: a314285dd8ddcccabf29e389addd031c🔍
>>106194248
Replies: >>106194305 >>106194319 >>106195602
Anonymous
8/8/2025, 10:41:30 PM No.106194305
>>106194248
>>106194288
weird
Replies: >>106194330
Anonymous
8/8/2025, 10:42:47 PM No.106194319
>>106194248
>>106194288
So tiresome
Anonymous
8/8/2025, 10:43:40 PM No.106194330
>>106194305
yeah obsessing over shit like that is definitely weird.
Anonymous
8/8/2025, 10:45:03 PM No.106194348
>>106194128
You probably mean Rivermind™
https://huggingface.co/TheDrummer/Rivermind-12B-v1
Replies: >>106194377
Anonymous
8/8/2025, 10:47:24 PM No.106194377
>>106194348
yeaaa that oneee...
Anonymous
8/8/2025, 10:57:30 PM No.106194480
>>106193165
>I just wanted to share it with you all
Hi Drummer, all here... We didn't want you to share it with us. Go away you temu undi.
Anonymous
8/8/2025, 11:02:45 PM No.106194539
file
file
md5: 1a742bce2f0f33f96d42c5e0eb49c4c0🔍
Not a single glm base (non-air) goof on hf. Is that because base is shit because pic related?
Replies: >>106194582 >>106194641 >>106194743
Anonymous
8/8/2025, 11:06:03 PM No.106194558
Just got an ebay MI50 32gb, but when I load a model bigger than 16gb in llama.cpp it spills over to RAM.
Did the chinks bamboozled me with a 16gb card or am I retarded?
Running rocm-smi --showmeminfo vram gives me
GPU[0] : VRAM Total Memory (B): 34342961152
Replies: >>106194578
Anonymous
8/8/2025, 11:08:07 PM No.106194578
>>106194558
Does that also happen with vulkan?
Replies: >>106194621
Anonymous
8/8/2025, 11:08:26 PM No.106194582
name-probs-bases
name-probs-bases
md5: 8f2cfe4d39123fc319a6315ac11560af🔍
>>106194539
Yes, the base is fake, just like Qwen bases. Their 32b base had somewhat acceptable distribution, but the big one doesn't.
Replies: >>106194615 >>106194641
Anonymous
8/8/2025, 11:09:54 PM No.106194597
Local is dead again... Sama please save us
Anonymous
8/8/2025, 11:11:22 PM No.106194615
>>106194582
I can't believe I got an informative answer ITT.
Replies: >>106194627
Anonymous
8/8/2025, 11:11:42 PM No.106194621
>>106194578
I'm using vulkan because I couldn't get rocm working with llama.cpp yet.
Anonymous
8/8/2025, 11:12:41 PM No.106194627
>>106194615
That and quanters just generally dgaf about base models.
Anonymous
8/8/2025, 11:13:42 PM No.106194641
>>106194539
>>106194582
Companies forgot what base even means. And only actual weirdos and freaks care about foundational models anymore. Air has 360 downloads, big GLM has 163 (meanwhile instruct tunes have 75x and 104x of that). The world will die under a pile of instruct garbage.
Replies: >>106194686
Anonymous
8/8/2025, 11:15:26 PM No.106194663
Hmm. I just tried >>106193409 and did a swipe on a chat where the model hard repeats the last message verbatim. And it worked. It didn't repeat. I'll do some more testing, but this is promising. I guess I'll need to use the jinja playground a bit more deeply in the future rather than assume a single chat turn with the default example reveals everything about the templating logic.
Anonymous
8/8/2025, 11:17:27 PM No.106194686
>>106194641
People forgot that the original AIDungeon was GPT-2 fineturned and bootstrapped on CYOA data
Anonymous
8/8/2025, 11:21:20 PM No.106194732
1744534670936990
1744534670936990
md5: bd2f53a7c88bb30973491fcd31a10fb2🔍
Replies: >>106194749
Anonymous
8/8/2025, 11:22:24 PM No.106194743
>>106194539
>glm base (non-air) goof
Use case? No one doing creative writing, or whatever people use base locally, has the rig to run it.
Anonymous
8/8/2025, 11:23:02 PM No.106194749
>>106194732
stop posting shit on reddit that was posted here before it was posted on reddit
Anonymous
8/8/2025, 11:27:09 PM No.106194795
Which model can give me an oiled footjob
Replies: >>106194809 >>106194818 >>106194823
Anonymous
8/8/2025, 11:28:07 PM No.106194809
>>106194795
JEPA
Anonymous
8/8/2025, 11:29:07 PM No.106194818
>>106194795
a lot of the ones on instagram but you have to pay them like $50k and fly them out
Replies: >>106194843
Anonymous
8/8/2025, 11:29:39 PM No.106194823
>>106194795
OSS-20B
Anonymous
8/8/2025, 11:31:35 PM No.106194843
>>106194818
lol fuck that I can build a GPUMAXXED rig with that kind of money
Anonymous
8/8/2025, 11:36:07 PM No.106194898
1754521444580707
1754521444580707
md5: 8c15cf1b82e2b878421921b1427411b0🔍
V340 anon where are you..
Replies: >>106195526
Anonymous
8/8/2025, 11:38:26 PM No.106194919
So is LeCunny going to do anything, ever? It looks like Genie 3 has leapfrogged his jepa bullshit, and Zuckerberg is all-in on LLMs. What is next for him? He seems wasted where he is.
Replies: >>106194953
Anonymous
8/8/2025, 11:39:58 PM No.106194939
>>106194147
Gguf status?
Anonymous
8/8/2025, 11:41:09 PM No.106194953
>>106194919
Ask him on twitter. Come back with a screenshot of his reply.
Replies: >>106194960
Anonymous
8/8/2025, 11:41:33 PM No.106194960
>>106194953
It's X
Anonymous
8/8/2025, 11:46:00 PM No.106195004
https://desuarchive.org/g/thread/105750356/#105755753
>Hi all, Drummer here...
>I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!
He didn't even last a week
Replies: >>106195042
Anonymous
8/8/2025, 11:50:13 PM No.106195042
>>106195004
Neither did the spammer. It's all copycats now.
Anonymous
8/9/2025, 12:01:10 AM No.106195162
1714153098496425
1714153098496425
md5: 4e165ccddc0cf02d5d3c81e76c507305🔍
>>106189515
>--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
huh 16 channels of ddr5, and now we have higher jedec speeds, at which speed we could expect that?
funny enough i am most worried for simulations, but if i could use it for llms better
Replies: >>106195526
Anonymous
8/9/2025, 12:10:25 AM No.106195241
Guys, is deepseek okay?
Replies: >>106195251 >>106195536
Anonymous
8/9/2025, 12:11:56 AM No.106195251
>>106195241
He's dead, Jim.
Anonymous
8/9/2025, 12:31:54 AM No.106195437
>>106193534
>Compare it to llama 3.3 70b
why are there still retards in this day and age comparing anything to the dogshit llama models??
Replies: >>106195445
Anonymous
8/9/2025, 12:32:57 AM No.106195445
>>106195437
the only other model they know is R1 and they can't run it on their machine
Replies: >>106195608
Anonymous
8/9/2025, 12:34:58 AM No.106195466
>>106194102
Well now I'm wondering where the fuck you got 4.9 from
Replies: >>106196060
Anonymous
8/9/2025, 12:41:30 AM No.106195526
>>106195162
It's not coming to common desktop platforms, Threadripper will be like almost a year late and Eypc will be expensive. Until Intel competes, that's how it's going to be in CPU land.
>>106194898
I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.
Replies: >>106195553
Anonymous
8/9/2025, 12:42:38 AM No.106195536
>>106195241
They are underperforming. There is now pressure to be #1 on lmarena from Xi himself. R2/V4 MUST be a huge jump. So far the improvements are incremental at best.
Replies: >>106195554 >>106195595 >>106195620
Anonymous
8/9/2025, 12:44:36 AM No.106195553
>>106195526
>I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
>A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.
how do i get educated enough to understand every word in this post?
Anonymous
8/9/2025, 12:44:40 AM No.106195554
>>106195536
I get the impression that most of the pressure is coming from Liang Wenfeng's high standards rather than from the party.
Anonymous
8/9/2025, 12:46:44 AM No.106195578
dipsyPilotUnmanageable
dipsyPilotUnmanageable
md5: 862576e7122062c772cbc40a3c6988ad🔍
>>106191538
>Recap bot mentioned dipsy
Ah, didn't notice. Nice.
> Original Character Do Not Steal
Not mine, actually. Do Not Care.
> AGP avatars from his HRT bathtub
lol
Anonymous
8/9/2025, 12:47:56 AM No.106195595
>>106195536
>So far the improvements are incremental at best
That's true of all LLMs though. I look into API models progress as much as local and I can't say anything had me give a shit except for when Gemini 2.5 pro released, mainly because it's the first decent model with huge context, and even then the more I use it the more the magic fades, it has strong "opinions" about code that I do not share and still makes tons of mistakes
Anonymous
8/9/2025, 12:48:47 AM No.106195602
>>106194248
>>106194288
>GPT now calls you out on your rhetorical tricks
Neat
Anonymous
8/9/2025, 12:49:39 AM No.106195608
>>106195445
>235b is a different class. Most people can't obtain the compute to run 235b. But most people can obtain the compute to run 120b, if they want to and try to.
>consumer hardware like 4x3090. (Can be built for $5,000)
ye
Anonymous
8/9/2025, 12:50:51 AM No.106195617
Actually GLM-4.5 Air kinda sucks and is slop and too censored in its thinking and if you force it not to think it's superslop.
Replies: >>106195635 >>106195648 >>106195667
Anonymous
8/9/2025, 12:50:56 AM No.106195620
>>106195536
All I need is for their next release to match current Gemini while still being as pliant as the current R1.
Anonymous
8/9/2025, 12:52:31 AM No.106195626
man is not a learning animal, or else, man would know by now to avoid something called glm
Anonymous
8/9/2025, 12:52:33 AM No.106195627
1737476225302405
1737476225302405
md5: 214398ab587598ba5db0f28fb25de738🔍
Replies: >>106195641
Anonymous
8/9/2025, 12:53:20 AM No.106195635
>>106195617
I was about to say. It's actually pretty censored huh?
Sure, you can get around it, but it really likes to talk about ethics and stuff.
Replies: >>106195667
Anonymous
8/9/2025, 12:53:40 AM No.106195641
>>106195627
all these are too big and non american too run and 0528 is not years sir
Anonymous
8/9/2025, 12:55:05 AM No.106195648
>>106195617
You can edit its thinking and then continue. But it is kind of annoying, true.
Anonymous
8/9/2025, 12:57:55 AM No.106195667
>>106195617
>>106195635
https://files.catbox.moe/gjw3c3.json
see if it's still cucked with that
i've been rping on mikupad for hours (currently at 3224 tokens)
>your rig is that SHIT?
no im just multitasking and forcing myself to keep on roleplaying to see if it breaks after 4k context, i sometimes forget i have mikupad open :(
works very well with >>106194132
so far it hasn't stopped thinking ONCE
Anonymous
8/9/2025, 1:01:11 AM No.106195704
>>106195686
>>106195686
>>106195686
Anonymous
8/9/2025, 1:37:19 AM No.106196060
>>106195466
It's to derive .21. 4.9 is simply 1 lower than 5.9. I pointed out the model "wrapped over" positive to negative .21 when "going over the column"