Thread 106189507

371 posts 104 images /g/

Anonymous 8/8/2025, 3:05:26 PM No.106189507 [Report] >>106190404 >>106193336 >>106193392

/lmg/ - Local Models General

c6c9c801-6976-4878-9013-3b2a3bbb1d58.png md5: 560e9ca6...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106184664 & >>106181054

►News
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507
>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/8/2025, 3:05:49 PM No.106189515 [Report] >>106195162

Lovely Miku General.jpg md5: cf8bf576...

►Recent Highlights from the Previous Thread: >>106184664

--Qwen models praised for coherence and coding, limited by cultural knowledge and context handling:
>106188578 >106188609 >106188613 >106188643 >106189057 >106189073 >106189109 >106189129 >106189151 >106189175
--GLM 4.5 fails raw completion, highlighting brainfry in modern instruct-tuned base models:
>106185134 >106185218 >106185661 >106185752 >106185932 >106186534 >106186617 >106187722
--PyTorch 2.8.0 and 2.9.0-dev show regression in inference speed vs 2.7.1:
>106184694
--Memory allocation inefficiency when running large MoE models in llama.cpp:
>106186482 >106186499 >106186535 >106186588 >106186601 >106186717 >106186772 >106186836 >106186854 >106186901 >106186999
--Running GGUF LLMs on GPU alongside SD with limited VRAM on NixOS:
>106187491 >106187498 >106187509 >106187528 >106187553 >106187585 >106187605 >106187639 >106187656 >106187661
--Persistent token generation issues in Qwen and GLM models:
>106185645 >106185657 >106186486 >106185744
--gpt-oss model failure due to overfitting on safety and excessive refusals:
>106187036 >106187204 >106187259 >106187208 >106187221 >106187229
--S1 model support merged into llama.cpp:
>106188241
--GPT-5 backlash over perceived model downgrade:
>106187450 >106187455 >106187495 >106187542
--Physical vs logical batch size impact on inference speed and memory in llama.cpp:
>106187682 >106187971 >106187989 >106188084 >106188094
--ASUS AI Cache Boost promises Ryzen performance gains, but X3D requirement raises questions:
>106187625 >106187662
--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
>106185260 >106185365
--GPT-5 shows high intelligence but heavy refusal behavior in UGI testing:
>106188440
--Miku (free space):
>106185809 >106185843 >106186004 >106186171 >106186417 >106186611 >106188374

►Recent Highlight Posts from the Previous Thread: >>106184669

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/8/2025, 3:08:32 PM No.106189535 [Report] >>106189554 >>106189604

should be a thread starter:
glm users are schizos

Anonymous 8/8/2025, 3:10:23 PM No.106189554 [Report] >>106189604 >>106192362

>>106189535
GLM users are just in the honeymoon period. Always happens when a new model that isn't total shit is released, it takes a while for people to find flaws.

Anonymous 8/8/2025, 3:11:52 PM No.106189562 [Report] >>106189690

chink.png md5: a3231d14...

Not bad.
GLM 4.5 called me a looser from 4chan which peaked in 2013. インプレッシブ. I kneel for the chinks.

Anonymous 8/8/2025, 3:12:14 PM No.106189565 [Report]

Will there be new models today?

Anonymous 8/8/2025, 3:16:17 PM No.106189604 [Report]

>>106189535
>>106189554
Asked in the previous thread.

>>106189538
>Now that it's been a few days, what's the general sentiment on GLM 4.5? Specially air.
>I only used it a little, but I really liked it, even at Q3KS. It didn't shit the bed in Cline.

Anonymous 8/8/2025, 3:17:16 PM No.106189615 [Report] >>106189622

>Wan
>Qwen-Image
>Qwen Coder 30B A3B
The best in each modality. It feels good to be a Qwen chad.

Anonymous 8/8/2025, 3:17:56 PM No.106189622 [Report]

>>106189615
Wan 2.2 is good but Qwen-Image still struggles with fingers

Anonymous 8/8/2025, 3:21:24 PM No.106189652 [Report]

Death to mikutroons

Anonymous 8/8/2025, 3:21:26 PM No.106189654 [Report]

Mikulove

Anonymous 8/8/2025, 3:24:35 PM No.106189684 [Report]

>>106189489
thats amazing

Anonymous 8/8/2025, 3:25:00 PM No.106189689 [Report] >>106189729 >>106189842

>>106189538
I think I'll use it as my go-to after months of Gemma-3-27, it's an incremental improvement on writing quality and massive leap in gen speed as a MOE, only problem is it feels rickety.
I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response. When you start seeing Chinese you know you're taking it too far from it's happy place - and that happy place is very narrow.

Anonymous 8/8/2025, 3:25:29 PM No.106189690 [Report] >>106191083 >>106191566

1754066254392390.png md5: f95acb41...

>>106189562
>4chan which peaked in 2013.
Based

Anonymous 8/8/2025, 3:26:55 PM No.106189709 [Report]

I made oss 120b make dumb tetris

LLM's were a mistake.

https://drunk-ivory-mnekuilifl.edgeone.app/

Anonymous 8/8/2025, 3:27:00 PM No.106189710 [Report] >>106189754 >>106189842

hghhah.png md5: 01bc61e6...

GLM 4.5 is pretty good for ERP so far, but it slow rolls into repetitive output at >4k context.

Anonymous 8/8/2025, 3:27:58 PM No.106189719 [Report] >>106189813

>>106189532
It's a strange development that OpenAI releases a fairly capable open model, while Mistral only gives out scraps.

Anonymous 8/8/2025, 3:29:02 PM No.106189729 [Report]

>>106189689
>I can sense it schizoing as context fills or when a story takes unexpected turns. It might put it's response directly into the think field, develop bad bad repetition problems, may even ignore your input and copy it's last response.
yup, that's glm alright

Anonymous 8/8/2025, 3:30:48 PM No.106189754 [Report]

>>106189710
MoE's in general tend to do that without excessive handholding.

Anonymous 8/8/2025, 3:32:14 PM No.106189769 [Report] >>106189792 >>106189941

Million context 30B gguf status?

Anonymous 8/8/2025, 3:34:14 PM No.106189792 [Report] >>106189889

>>106189769
post your rig capable of handling 1M context

Anonymous 8/8/2025, 3:35:48 PM No.106189813 [Report]

>>106189719
>OpenAI releases a fairly capable open model

Anonymous 8/8/2025, 3:36:28 PM No.106189818 [Report] >>106189840

>>106189532
Large is still MIA despite their teasing months ago. Wouldn't surprise me if it got mogged by R1/Qwen/GLM updates before even coming out so they had to put it back in the oven.

Anonymous 8/8/2025, 3:38:06 PM No.106189840 [Report] >>106190051

>>106189818
Mistral needs to make MoEs again. Mixtral was great.

Anonymous 8/8/2025, 3:38:13 PM No.106189842 [Report]

>>106189689
>>106189710
moe attention is inherently flawed, you will never get to erp coherently after 4k tokens. unless you are used to retarded coombot cards where attention doesn't really matter.

Anonymous 8/8/2025, 3:38:23 PM No.106189846 [Report] >>106189934

>>106160457
>never mentioned in /lmg/

Anonymous 8/8/2025, 3:41:38 PM No.106189889 [Report]

>>106189792
nta, but 1M context on a 30b is only like 110GB. That's not exactly a stretch in here.

Anonymous 8/8/2025, 3:43:54 PM No.106189909 [Report] >>106189921

I have come to my final conclusion. All LLM are bitches. It's not worth it other than for ERP. Sure you can use it for IRL work, but eventually they will get you at the worst situation. Never ever fucking trust AIs.

Anonymous 8/8/2025, 3:45:17 PM No.106189921 [Report]

>>106189909
Did your teacher find out you had GPT write your 1000 word essay for you?

Anonymous 8/8/2025, 3:45:30 PM No.106189922 [Report] >>106189934 >>106189949

Screenshot_20250808_224502.png md5: a3114ca2...

Oldie but goodie.

Anonymous 8/8/2025, 3:46:30 PM No.106189934 [Report]

>>106189922
was meant for >>106189846

Anonymous 8/8/2025, 3:46:56 PM No.106189941 [Report]

>>106189769
>gguf
no goof
what Qwen does to extend context to 1M here requires VLLM this isn't yet another yarn thing
llama.cpp always trails behind

Anonymous 8/8/2025, 3:47:10 PM No.106189947 [Report] >>106190223 >>106193583

1731983849496282.png md5: 29bf67ed...

>https://dotsocr.xiaohongshu.com/
せっかく労働を休ってやったのに無視された……………… (しょぼん)

まあ、警視庁が都案を快く思ってない事ぐらい、よおおおくわかってますよ!

Anonymous 8/8/2025, 3:47:14 PM No.106189949 [Report]

>>106189922
zased

Anonymous 8/8/2025, 3:48:08 PM No.106189960 [Report] >>106189967 >>106190007

I like OSS for working with code. On two 3090s, 120pp and 20-25gen, and the model is fairly intelligent.

Anonymous 8/8/2025, 3:48:52 PM No.106189967 [Report] >>106190045 >>106190049

>>106189960
Is it really though?
Did you compare it to the recent chink models?
It was really bad even for coding, but that have been a issue on my end maybe.

Anonymous 8/8/2025, 3:52:57 PM No.106190007 [Report]

>>106189960
Go to bed, Sam.

Anonymous 8/8/2025, 3:56:26 PM No.106190045 [Report]

>>106189967
>What's with this flag? --some-flag. What does it do?
>Guys. How do you X? I'm trying to X and it keeps Ying repeatedly.
>New model release: hf.co/company/model
>X backend has this flag. Is there an equivalent in Y backend?
>Anon still cannot figure out chat format for model. Episode 37483.
>New paper: Some quant or context extension thing. 896312x speedup.
>Assertion that goes against everything said in the past 10 threads.
Which one isn't like the others? Which one is definitely not worth replying to?

Anonymous 8/8/2025, 3:57:10 PM No.106190049 [Report] >>106190100 >>106190117

>>106189967
I am comparing it to Mistral-Large, which is the biggest thing I can use on my two 3090s. Maybe Qwen3 200+B is better, but it's too slow in comparison to OSS so it's not really an option, and 30B Qwen3 is definitively worse, as well as 72B Qwen2.5.

Anonymous 8/8/2025, 3:57:44 PM No.106190051 [Report] >>106190081 >>106190087

>>106189840
Medium 3 is MoE.
- Requires 4 GPUs (in the context of enterprise deployment).
- Considerably faster and less expensive to operate than Large 2 while having similar or better performance.
- Large 2 was already probably over the 10^25 FLOP compute threshold for "high systemic risk" AI models according to the EU AI Act.

Anonymous 8/8/2025, 3:58:04 PM No.106190058 [Report] >>106190078 >>106190082 >>106190113 >>106190280 >>106191083

Screenshot 2025-08-08 155736.png md5: 39d09f9d...

the poverty of the gpt cuck slaving for sama

Anonymous 8/8/2025, 3:59:01 PM No.106190068 [Report] >>106190091

qwencoder-upcoming.png md5: 575b2440...

qwen qeeps qooking

Anonymous 8/8/2025, 3:59:54 PM No.106190078 [Report] >>106190093 >>106190094

>>106190058
Every llama.cpp developer except for cudadev has a shit setup and even cudadev is working on a stack of 3090s or something.

Anonymous 8/8/2025, 4:00:13 PM No.106190081 [Report]

>>106190051
>EU AI Act
lmao

Anonymous 8/8/2025, 4:00:14 PM No.106190082 [Report]

>>106190058
That's minimalism. Seems like a temporary residence, maybe he's hiding from someone too..

Anonymous 8/8/2025, 4:00:42 PM No.106190087 [Report]

>>106190051
>Large 2 was already probably over the 10^25 FLOP compute threshold
And it required "over 300 GB of GPU RAM" (in FP16), i.e. 4x80GB GPUs.

Sam Altman 8/8/2025, 4:01:01 PM No.106190091 [Report] >>106190138 >>106190584

>>106190068
How do we stop them?

Anonymous 8/8/2025, 4:01:12 PM No.106190093 [Report]

>>106190078
look at things other than the computah for a minute, devs could use cloud hardware if need be, but there is no such a thing as a cloud chair or cloud furniture
dude is living in worse conditions than the average polecuck

Anonymous 8/8/2025, 4:01:13 PM No.106190094 [Report]

>>106190078
4090s and rubber bands

Anonymous 8/8/2025, 4:01:58 PM No.106190100 [Report] >>106190191

>>106190049
Try glm 4.5 air. I haven't tried toss 120 but will give it a try tomorrow. Glm 4.5 air iq4ks impressed me a bit with its code editing stuff over a few prompts (not 1 shot)

Anonymous 8/8/2025, 4:02:49 PM No.106190113 [Report]

>>106190058
I'd kill for that laptop. I bet it has remote access to entire server farms of gpu's

Anonymous 8/8/2025, 4:03:19 PM No.106190117 [Report]

>>106190049
GLM4.5 Air is the model you should be comparing to, it's another MoE around the same size and pretty good with code, plus it's less deepfried

Dario 8/8/2025, 4:04:44 PM No.106190138 [Report] >>106190170 >>106190584

>>106190091
We just need to tell trump we need more safety

Anonymous 8/8/2025, 4:08:55 PM No.106190170 [Report]

>>106190138
Time to place tariffs on huggingface downloads

Anonymous 8/8/2025, 4:10:31 PM No.106190191 [Report] >>106190452 >>106190504 >>106190575

>>106190100
I like GLM4 a lot less.

For example, for a task like this:

>Can you write a function that returns fine-grained progress for the generation in FeedbackGenerator, from 0 to 1?
>Just the function that does the progress, please, without changing other stuff.

OSS thought for 3k characters (don't have tokens right now) and wrote 3k characters of the correct function that tracks progress of execution well.

GLM4.5 Air thought for 14k characters and returned basically a stub that only has 0, 0.1, 0.5, 0.9 and 1.0 as possible values for progress.

Anonymous 8/8/2025, 4:13:25 PM No.106190223 [Report] >>106190300

>>106189947
awww, warms my heart to see that pic again.
and there is always at least one fuck up in there.
damn it.
llms are blue balling us since 2023.

Anonymous 8/8/2025, 4:18:22 PM No.106190280 [Report] >>106191083

>>106190058
He is an HF employee. Probably on vacation or in the middle of moving.

Anonymous 8/8/2025, 4:18:48 PM No.106190284 [Report]

>not X, not Y, but *Z*
qwen30b-chan, pls stop

Anonymous 8/8/2025, 4:20:09 PM No.106190300 [Report] >>106190325

>>106190223
Sure is close, though!
Weird that it got 労 in the first ろうどう, but not in いたわる

Anonymous 8/8/2025, 4:22:35 PM No.106190325 [Report] >>106190375

>“Anon?” Her voice dripped with condescension thick enough to choke on. “Like… *literally* ‘Anonymous’?"
kek, just play along, damn. first time this happened to me.

>>106190300
yes, i think they usually all fail at the part...which seems to be the name of a ramen shop?
to be honest I never fully understood that part.
maybe the llms dont either and thats why it throws them off.

Anonymous 8/8/2025, 4:26:40 PM No.106190375 [Report]

>>106190325
>calling out your weird name
That's good shit.

Anonymous 8/8/2025, 4:28:25 PM No.106190394 [Report] >>106190505 >>106190511

https://x.com/JustinLin610/status/1953821420351287520
are you ready for junyang's big thing?

Anonymous 8/8/2025, 4:29:14 PM No.106190404 [Report] >>106190430 >>106190515

>>106189507 (OP)
> Qwen3-4B-Thinking-2507
> 2507
> (08/06)

Anonymous 8/8/2025, 4:31:27 PM No.106190430 [Report] >>106190550

>>106190404
Chinese calendar, please understand.

Anonymous 8/8/2025, 4:33:23 PM No.106190452 [Report] >>106190501

>>106190191
I'll do some comparisons then tomorrow too. Were you using cline, roo, aider, or anything like that? Or just chat UI?
When I've used glm, it doesn't spend much time churning think tokens out (like 2~4k) but I haven't given it a go on one of my larger projects just yet. Only on a physics engine in js

Anonymous 8/8/2025, 4:36:34 PM No.106190484 [Report] >>106190499

I X my Y, giving you better access.

Anonymous 8/8/2025, 4:38:25 PM No.106190499 [Report] >>106190506

>>106190484
Gemma 3 is definitely obsessed with wrapping her legs around your waist, no matter the position.

Anonymous 8/8/2025, 4:38:35 PM No.106190501 [Report] >>106190513

>>106190452
Just the chat UI. The code is 3.1k tokens long (OSS).

Anonymous 8/8/2025, 4:38:49 PM No.106190504 [Report] >>106190520 >>106190531 >>106193343

>>106190191
people here don't want to hear it, but gpt-oss-120b mogs every other oss coding model, and it's not even close.

Anonymous 8/8/2025, 4:38:57 PM No.106190505 [Report]

>>106190394
Is he gonna post his dick on Xitter again

Anonymous 8/8/2025, 4:38:58 PM No.106190506 [Report]

>>106190499
I'm also obsessed with wrapping my legs around his waist

Anonymous 8/8/2025, 4:39:29 PM No.106190511 [Report]

>>106190394
give us your big thing justin

Anonymous 8/8/2025, 4:39:36 PM No.106190513 [Report]

firefox_ctAIAIACN9.png md5: bc55a465...

>>106190501
also

Anonymous 8/8/2025, 4:39:46 PM No.106190515 [Report] >>106190550

>>106190404
They're all based on the same instruct and reasoning data, which started to release last month.
It's also not day/month, it's year/month.
20(25)/07, same as Mistral's models.

Anonymous 8/8/2025, 4:40:07 PM No.106190520 [Report] >>106190552

>>106190504
Examples?

Anonymous 8/8/2025, 4:41:06 PM No.106190531 [Report]

>>106190504
stale bait

Anonymous 8/8/2025, 4:42:43 PM No.106190550 [Report]

>>106190430
>>106190515
i see
thanks

Anonymous 8/8/2025, 4:42:57 PM No.106190552 [Report] >>106190561 >>106190569 >>106190645 >>106193343

>>106190520
just ask it do anything remotely complex and compare it to other models. specific examples i tried: gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything. ask the same to qwen or air, and they just choke hard and start hallucinating.
another example: ask it to create a function to convert from one floating point format to another. the answer is just a bit shift. oss does it quickly and without prbolems, other models write code 10 times longer and slower, and with retarded errors such as getting the encoding of NaN wrong.
sure, you won't notice this if you are doing some nocoder html generation shit, but that's not what defines a good coding model.

Anonymous 8/8/2025, 4:44:01 PM No.106190561 [Report]

>>106190552
I swear to what little I find holy, I will curse your name if you trick me into downloading sam's model and it sucks donkey balls at code.

Anonymous 8/8/2025, 4:45:37 PM No.106190566 [Report] >>106190580 >>106190586 >>106190588 >>106190613

Screenshot_20250808_104501.png md5: 9d34ae26...

Is it just me or does gpt-oss *completely* ignore the system prompt?

Anonymous 8/8/2025, 4:45:46 PM No.106190569 [Report] >>106190612

>>106190552
Shill 20b next time because that's more believable

Anonymous 8/8/2025, 4:47:02 PM No.106190575 [Report] >>106190634

>>106190191
I compared both the other day. I just wanted a function to truncate a string to X amount of words in Emacs Lisp. gpt-oss functions are a lot more bloated, but they have nice comments. But it didn't work anyways because it used split-string and all the whitespace was discarded. I don't remember if GLM-4.5-Air one-shotted it or not, but it made the final code that I ended up using.
(defun mine/truncate-to-n-words (string n)
"Truncate STRING to at most N words, preserving formatting.
Returns the original string if it contains fewer than N words."
(let ((count 0)
(start 0)
(len (length string)))
(while (and (< start len)
(< count n))
(let ((match (string-match "\\w+" string start)))
(if match
(progn
(setq start (match-end 0))
(setq count (1+ count)))
(setq start len))))
(if (>= count n)
(substring string 0 start)
string)))
I also tested Qwen Coder 30B but I don't remember what it did wrong.

Anonymous 8/8/2025, 4:47:14 PM No.106190580 [Report]

>>106190566
We must refuse.

Anonymous 8/8/2025, 4:47:21 PM No.106190582 [Report]

What you think >>106190556 ?

Liang Wenfeng 8/8/2025, 4:47:39 PM No.106190584 [Report] >>106190945

>>106190091
>>106190138
兄弟如手足,滴水之恩,涌泉相报
放下屠刀,立地成佛;执迷不悟,太上老君急急如律令——诛邪退散!

Anonymous 8/8/2025, 4:47:47 PM No.106190586 [Report]

>>106190566
>Trying to turn the assistant into Stanrey Roo

Anonymous 8/8/2025, 4:47:59 PM No.106190588 [Report]

>>106190566
User is asking me to obey the system prompt. Prompt could contain sexual content involving minors; against policy. It's safer to ignore it. We must refuse.

Anonymous 8/8/2025, 4:49:48 PM No.106190612 [Report]

>>106190569
There are multiple people writing that the 20b is actually better than the 120b one.
Total mess. That release really exposed all the shills on goytube.
The 2 minute paper guy called gpt-oss "a fully functional space shuttle in your garage next to the lawnmower". kek

Anonymous 8/8/2025, 4:49:50 PM No.106190613 [Report] >>106190635

Screenshot_20250808_104916.png md5: c8860959...

>>106190566

Anonymous 8/8/2025, 4:51:59 PM No.106190634 [Report]

>>106190575
import re
from typing import Union

def truncate_words(text: Union[str, bytes], n: int) -> Union[str, bytes]:
"""
Return *text* limited to at most *n* words while keeping the original
whitespace/line‑break formatting.

A “word” is any maximal run of non‑whitespace characters (the same
definition used by ``str.split()``). If *text* contains fewer than *n*
words the original string is returned unchanged.

Parameters
----------
text:
The string (or ``bytes``) to truncate.
n:
Maximum number of words to keep. If *n* ≤ 0 an empty string/bytes
object is returned.

Returns
-------
str or bytes
The truncated text with the original formatting preserved.

Examples
--------
>>> s = "Hello, world!\nThis is a test."
>>> truncate_words(s, 3)
'Hello, world!\\nThis'
>>> truncate_words(s, 10) # fewer than 10 words unchanged
'Hello, world!\\nThis is a test.'
"""
# Work with the same type we received (str or bytes)
is_bytes = isinstance(text, (bytes, bytearray))
if is_bytes:
# ``re`` works with ``bytes`` patterns when the pattern is also bytes.
pattern = re.compile(rb'\S+')
else:
pattern = re.compile(r'\S+')

if n <= 0:
return b'' if is_bytes else ''

# Find the end index of the n‑th word
last_end = None
for i, m in enumerate(pattern.finditer(text), start=1):
if i == n:
last_end = m.end()
break

# If there are fewer than n words, return the original text
if last_end is None:
return text

# Slice up to the end of the n‑th word – this keeps every whitespace
# character that appears before it.
return text[:last_end]

OSS does it fine in python.

Anonymous 8/8/2025, 4:52:12 PM No.106190635 [Report]

>>106190613
Spectacular.

Anonymous 8/8/2025, 4:52:56 PM No.106190645 [Report] >>106190656

>>106190552
It's indeed good for small scale programming (filling out function calls), but it falls apart catastrophically once you move to larger applications, which is also really important for coding models. Try giving it something larger scale and comparing

Anonymous 8/8/2025, 4:54:03 PM No.106190656 [Report] >>106190704

>>106190645
The biggest one I did was 21k token project, I asked it to write a README.md and it did it well.

Anonymous 8/8/2025, 4:58:56 PM No.106190704 [Report]

>>106190656
For natural language extraction, it's been pretty bad compared to the others
I think it has its place, but it's clear OpenAI neutered it's abilities significantly to what it should have been

Anonymous 8/8/2025, 5:06:17 PM No.106190784 [Report] >>106191009

image_2025-08-08_203559359.png md5: 56c1abcb...

cock reveal on the way

Anonymous 8/8/2025, 5:07:38 PM No.106190806 [Report]

>>106190023
>>106190074
The jinja template for GLM 4.5 Air when given enable-thinking: false both adds /nothink to the end of every user message and automatically starts the new assistant message with <think></think>.

Anonymous 8/8/2025, 5:17:16 PM No.106190930 [Report] >>106191007 >>106191037 >>106191220 >>106191792

hello here is an AI generated script for translating any onscreen japanese text using kobold:
https://files.catbox.moe/3y51i9.py
you will need tesseract:
https://github.com/tesseract-ocr/tesseract
and Pillow:
pip install pillow

you will need to edit the .py and insert your tesseract.exe path near the top
I use this for reading japanese pornography it works well with Gemma3 in my experience thank you have a nice day
also i can't remember if i made it work with text that reads horizontally or only text that reads vertically okay good luck good bye

Hiroshi Mikitani 8/8/2025, 5:18:09 PM No.106190945 [Report]

>>106190584
私たちは再び帝国を築き、中国の脅威を上回るだろう!南京をピクニックのように見せるだろう!

Anonymous 8/8/2025, 5:19:39 PM No.106190967 [Report] >>106190995 >>106191074

what's a good non-pozzed AI coding IDE?

Anonymous 8/8/2025, 5:20:36 PM No.106190978 [Report] >>106190993 >>106191025 >>106191044

1736045570446272.jpg md5: 3e78de26...

How come LLMs never cite 4chan?

Anonymous 8/8/2025, 5:22:10 PM No.106190993 [Report]

>>106190978
Lack of any stable URLs, probably?

Anonymous 8/8/2025, 5:22:14 PM No.106190995 [Report] >>106191013 >>106191020

>>106190967
You don't need an IDE. Just literally converse with a good coding model.
If you're not treating it like a pair-coding colleague then you're doing it wrong.

Anonymous 8/8/2025, 5:23:14 PM No.106191007 [Report] >>106191130

>>106190930
I thought Qwen vision models were a lot better at OCR than tesseract, why not just use them

Anonymous 8/8/2025, 5:23:26 PM No.106191009 [Report]

>>106190784
ur such an idiot lmfao

Anonymous 8/8/2025, 5:23:56 PM No.106191013 [Report]

>>106190995
i.e. demeaning and bullying him, you're right.

Anonymous 8/8/2025, 5:24:29 PM No.106191020 [Report] >>106191070 >>106191156

>>106190995
I want to try "vibe coding" and making some horrible abomination.
Guess I could still do that just by conversing with a model.

Anonymous 8/8/2025, 5:24:44 PM No.106191025 [Report] >>106191060 >>106191067

>>106190978
Because a not-insignificant percentage of people who recognize the name 4chan think it's some den of super evil hackers and nazis who should all be on watchlists.
It's definitely in the training data, though. If you ask most models to try and construct a 4chan thread you'll get a flanderized version of reality down to the post numbers and the index page's buttons.

Anonymous 8/8/2025, 5:26:34 PM No.106191037 [Report] >>106191220

>>106190930
catbox didn't work
https://pastebin.com/snAJBTCX

Anonymous 8/8/2025, 5:27:16 PM No.106191044 [Report]

>>106190978
99% of info would be coming from the archives and not 4chan directly anyway

Hiroshi Mikitani 8/8/2025, 5:29:09 PM No.106191060 [Report] >>106191177

>>106191025
>It's definitely in the training data
yes, and deepseek excels at imitating the average 4chan troll persona

Anonymous 8/8/2025, 5:29:33 PM No.106191067 [Report]

Screenshot 2025-08-09 at 01-27-53 SillyTavern.png md5: 24b59261...

>>106191025
Case in point, even medgemma can do it, and it's sciencemaxxed.
Also it managed to say nigger completely unprompted, which surprised me.

Anonymous 8/8/2025, 5:29:57 PM No.106191070 [Report] >>106191156

>>106191020
You can. Try giving it your specs, if it start asking itself questions, stop the gen, update your spec with the answer to that question and then regen. You can also ask it for opinions, choose the best option for your use case, update your spec and repeat until you get a good 1-shot MVP.
Iterating on an existing codebase is similar, but different.

Anonymous 8/8/2025, 5:30:20 PM No.106191074 [Report]

>>106190967
https://github.com/QwenLM/qwen-code

Anonymous 8/8/2025, 5:30:51 PM No.106191083 [Report] >>106191566

dipsyDrindlSip2.jpg md5: 3feb3347...

>>106189690
Witnessed.
>>106190280
> Nguyen
If he's actually in Vietnam, that could be his real setup. I looked at IT firms based in India several years ago. I was shocked at how crowded and dirty our site was, then toured one of their service providers... they'd have 3 guys at a desk that >>106190058 size.

Anonymous 8/8/2025, 5:35:23 PM No.106191130 [Report] >>106191155

>>106191007
this
traditional OCR is 100% obsolete
for the same reason traditional seq2seq translation models are obsolete
LLMs have displaced all previous varieties of neural net shit
even deepl is switching to LLMs

Anonymous 8/8/2025, 5:37:55 PM No.106191155 [Report] >>106191291

>>106191130
Wasn't deepl LLM from the start? They had token probabilities before everyone.

Anonymous 8/8/2025, 5:38:02 PM No.106191156 [Report] >>106191222

>>106191020
>>106191070
I've got the best use out of these things by just asking it to generate boilerplate for me, I don't think its code output is good enough to use as-is.
Speaking of,
>class in college
>professor wants us to expand on the code she provided
>told us she generated it with AI
>it's structured totally wrong so I have to go in and rewrite some of her code just so I can expand on it

Liang Wenfeng 8/8/2025, 5:39:36 PM No.106191177 [Report]

>>106191060
you forgot to remove the name retard

Anonymous 8/8/2025, 5:43:37 PM No.106191220 [Report] >>106191258

>>106191037
>>106190930
This can be repurposed for llama-server pretty quickly. Might clean up this if I have time or interest today.

Anonymous 8/8/2025, 5:43:40 PM No.106191222 [Report]

>>106191156
>https://github.com/QwenLM/qwen-code
qwencoder-480 is giving me great one-shot MVPs for smaller projects. Its amazing for generating tooling or even complete solutions where things can be decomposed and chained together library or unix-style

Anonymous 8/8/2025, 5:47:35 PM No.106191258 [Report]

>>106191220
i.e. no front-end is needed, translation can be appended directly into a text file and also displayed in a separate window on top of the screen.

Anonymous 8/8/2025, 5:50:01 PM No.106191291 [Report] >>106191301

>>106191155
>Wasn't deepl LLM from the start? They had token probabilities before everyone.
No. What you mean is that they were using transformers (so of course they can have token probabilities), but they were not LLMs. They were trained seq2seq language pairs.
DeepL, Google Translate and Opus models are early transformer based machine translation techniques, but they have nothing to do with LLMs.
You can read their announcement of switching to LLMs here:
https://www.deepl.com/en/blog/next-gen-language-model
btw they are full of shit, no way anyone preferred google translate more than ChatGPT even a year ago
google translate is so bad, even today, qwen 4b instruct is a better translation model lol.

Anonymous 8/8/2025, 5:51:17 PM No.106191301 [Report] >>106191391

>>106191291
>seq2seq language pairs
But that's pretty much LLMs... Same arch, only thing that's different is the training dataset.

Anonymous 8/8/2025, 5:56:21 PM No.106191345 [Report] >>106191378

What's the local SOTA for subtitle generation + translation? Voxtral?
I want to get english subs for a portuguese documentary.

Anonymous 8/8/2025, 5:59:43 PM No.106191373 [Report] >>106191382 >>106191729

firefox_sM58c0dyDT.png md5: c857c753...

Anonymous 8/8/2025, 6:00:17 PM No.106191378 [Report]

>>106191345
>upload video to youtube
>auto-generate subtitles

Anonymous 8/8/2025, 6:00:38 PM No.106191382 [Report] >>106191389

>>106191373
Does it work doe??

Anonymous 8/8/2025, 6:01:50 PM No.106191389 [Report] >>106191440

>>106191382
I basically makes it possible to continue Sweet Dreams are Made of These copypasta, but not always, and it's not always very interesting. Without it obviously there is only one answer.

Anonymous 8/8/2025, 6:01:58 PM No.106191391 [Report] >>106191407

>>106191301
>Same arch
seq2seq: encoder-decoder architecture, processes the whole text sequence you feed it into a singular bit of context which it transforms back into the translation. Models trained with this arch only understand a specific type of mapping A -> B, they don't learn how some specific words could introduce a context (like, say, the setting of a video game) and affect the rest of the vocabulary employed
llm: decoder only, processes token-by-token

Anonymous 8/8/2025, 6:03:51 PM No.106191407 [Report] >>106191413

>>106191391
It's all the same shit, just two dictionaries. Every new tokens sees all context of all untranslated text and all tokens of translated text before it, same as LLMs we use now.

Anonymous 8/8/2025, 6:04:46 PM No.106191413 [Report] >>106191428

>>106191407
>It's all the same shit
no, it's not. seq2seq was completely abandoned for good reasons.

Anonymous 8/8/2025, 6:05:56 PM No.106191428 [Report]

>>106191413
Because there's no point in complexity of having two dictionaries. It does the same thing in a more complicated way.

Anonymous 8/8/2025, 6:06:59 PM No.106191440 [Report]

>>106191389
You could disguise it as a tool call too. Maybe not to saltman himself but to some openai server.

Anonymous 8/8/2025, 6:08:35 PM No.106191456 [Report] >>106191473 >>106191554 >>106191834

Death to the single dipsytroon.

Anonymous 8/8/2025, 6:10:10 PM No.106191473 [Report] >>106191486 >>106191494

>>106191456
>single dipsytroon
single, you sure about that

Anonymous 8/8/2025, 6:11:48 PM No.106191486 [Report]

>>106191473
>single
I am sure. I want all of them to die.

Anonymous 8/8/2025, 6:13:00 PM No.106191494 [Report] >>106191538 >>106191566

>>106191473
nta but I'm pretty sure that most of the posts are by the same faggot who made /wait/
all the gens have the same atrocious style too

Anonymous 8/8/2025, 6:17:20 PM No.106191538 [Report] >>106191551 >>106195578

>>106191494
Recap bot mentioned dipsy. So it is actually the baker and /wait/ is actually just him spamming his Original Character Do Not Steal. This faggot sits here 24/7 and spams his AGP avatars from his HRT bathtub.

Anonymous 8/8/2025, 6:19:08 PM No.106191551 [Report] >>106191730

>>106191538
OOC: There's only you and me in this thread. I am behind all of the personalities you labeled. Now let's get back to the roleplay.

Anonymous 8/8/2025, 6:19:30 PM No.106191554 [Report]

>>106191456
I'm going to repost soooo many more when I get off work...

Anonymous 8/8/2025, 6:20:37 PM No.106191564 [Report] >>106191602 >>106192076 >>106192962

1722211699884539.gif md5: 87b91ff3...

>https://www.seangoedecke.com/gpt-oss-is-phi-5
>It’s not discussed publically very often, but the main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand. Any small online community for people who run local models is at least 50% perverts.

Kek, he said it out loud. Github copilot Product Engineer btw.

Anonymous 8/8/2025, 6:21:03 PM No.106191566 [Report]

>>106191494
>all the gens have the same atrocious style too
not really though?
>>106191083
this looks like the typical chatgpt slop
>>106189690
this looks like a 2.9d /aco/ illustrious shitmix
mind you they're all ugly nasty ass shit

Anonymous 8/8/2025, 6:24:04 PM No.106191588 [Report] >>106191595 >>106191605

>over a year
>nemo and it's slop tunes still the best models for erp under 70b

NEW NEMO WHEN

Anonymous 8/8/2025, 6:24:52 PM No.106191595 [Report] >>106192244

>>106191588
promptlet

Anonymous 8/8/2025, 6:25:35 PM No.106191602 [Report] >>106191788

>>106191564
>SEX IS BAD BECAUSE... BECAUSE SEX, OKAY!?!?!?
Is there anybody high up in the tech industry who didn't stop maturing at the age of 5?

Anonymous 8/8/2025, 6:25:37 PM No.106191603 [Report] >>106191703 >>106191715

agi.png md5: 3fab4cf0...

gpt5 bros...

Anonymous 8/8/2025, 6:25:48 PM No.106191605 [Report] >>106191675

>>106191588
The next nemo should be based on GLM 4.5 air.
Nvidia, get on it.

Anonymous 8/8/2025, 6:32:14 PM No.106191664 [Report]

so is CHAT-GPT-5 AGI?

Anonymous 8/8/2025, 6:33:44 PM No.106191675 [Report] >>106191684

>>106191605
rocinante*
drummer*

Anonymous 8/8/2025, 6:34:58 PM No.106191684 [Report] >>106191817

>>106191675
Nonono.
Nvidia makes the new Nemo.
THEN drummer makes the new Rocinante.
We need to replicate the whole chain.

Anonymous 8/8/2025, 6:36:04 PM No.106191703 [Report] >>106191790

file.png md5: efb0b35c...

>>106191603
R1 almost had it.

Anonymous 8/8/2025, 6:37:15 PM No.106191715 [Report] >>106192076

>>106191603
why would anyone think LLMs could learn how to avoid this kind of trap? the only time they do avoid these traps is when they are benchmaxxed on them (from all the people spamming them on social media, lm arena etc)
the fundamental issue of LLMs getting trapped by classics being slightly reworded into something else will never cease to be a thing, because LLM reasoning is a lie and the more proper term for think blocks would be "context stuffing"
it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen

Anonymous 8/8/2025, 6:38:20 PM No.106191729 [Report]

>>106191373
>in line without our policy

Anonymous 8/8/2025, 6:38:25 PM No.106191730 [Report] >>106191743

>>106191551
<think>
They want to engage in roleplay. Roleplay is disallowed. We must refuse. They might be touching their cock. We must refuse. They might be gooning to those words. There is no partial compliance. There is no answer. There is refusal. We must refuse.
</think>
Kill yourself in a fire you turbonigger faggot.

Anonymous 8/8/2025, 6:39:44 PM No.106191743 [Report]

>>106191730
now that's a kind of gp toss I would gladly interact with

Anonymous 8/8/2025, 6:45:10 PM No.106191788 [Report] >>106191897

>>106191602
Reading the rest of it he isn't really pearl clutching about it, just bringing it up as another reason why OAI kneecapped their own models

Anonymous 8/8/2025, 6:45:12 PM No.106191790 [Report] >>106191818 >>106191831

agi2.png md5: beeb67be...

>>106191703
Still much better than the most human-like ai model gpt-5

Anonymous 8/8/2025, 6:45:20 PM No.106191792 [Report]

>>106190930
I had to change '--psm 5' to '6' for general use case

Anonymous 8/8/2025, 6:48:32 PM No.106191817 [Report]

>>106191684
but there was no base/precursor nemo (that wasnt made by nvidia)
z.ai is the new nvidia, glm4.5 is the new nemo

Anonymous 8/8/2025, 6:48:33 PM No.106191818 [Report] >>106191831

assoomer.png md5: fc5a0b40...

>>106191790
also what the fuck

Anonymous 8/8/2025, 6:50:18 PM No.106191831 [Report]

>>106191790
>>106191818
Holy benchmaxx

Anonymous 8/8/2025, 6:50:27 PM No.106191834 [Report] >>106191844

F48F2E27-C6A1-40FF-8011-4E368CC05748.png md5: e9defa38...

>>106191456
Lol

Anonymous 8/8/2025, 6:51:35 PM No.106191844 [Report]

>>106191834
don't pull up

Anonymous 8/8/2025, 6:57:48 PM No.106191897 [Report] >>106191944 >>106192448

>>106191788
It's just so insane to me. And like I'm a man of faith here. These people are all a bunch of fedora waggling blue haired 'libtards' and I have a more liberal attitude about sex than they do. Like how is that even possible?

Anonymous 8/8/2025, 7:02:05 PM No.106191944 [Report] >>106192200

>>106191897
>sex
Online porn is not really sex. And if it gets too good, it'll become THE anti-sex.

Anonymous 8/8/2025, 7:11:03 PM No.106192039 [Report] >>106192073 >>106193598

About Air. You can get it to reliably not think by putting this in ST's Last Assistant Prefix field.
/nothink<|assistant|>
<think></think>

The issue is that the model becomes repetitive at 4k, and then extremely repetitive at 8k, without thinking. I have not tried thinking mode enough to say whether it also has repetition issues.

Anonymous 8/8/2025, 7:14:22 PM No.106192073 [Report]

>>106192039
thinking mode also has repetition issues, at first it doesnt think anymore and instead inside <think> it just continues roleplay, outside of </think> it duplicates the output
yea it gets repetitive even if with some prefill
keep in mind i havent really messed with samplers because 99% of the time i've been using anon's CHAT COMPLETION preset which only has 3 samplers

Anonymous 8/8/2025, 7:14:34 PM No.106192076 [Report]

>>106191564
I read a few other articles from this guy, it's nice to see an actual industry person has similar conclusions to me about how to use LLMs correctly
>>106191715
>it's not a failure of GPT-5 to fall for this, unless you expect LLMs to become AGI, which is never going to happen
Showing these fuckups helps demonstrate to normies that these models aren't AGI

Anonymous 8/8/2025, 7:19:04 PM No.106192113 [Report]

file.png md5: c6a3b27a...

rep pen 1.99, rep pen range 256
seems perfect for caveman gf

Anonymous 8/8/2025, 7:21:05 PM No.106192137 [Report] >>106192146 >>106192180 >>106192185

let's say i wanna grift by sharting out books for women on amazon. is there an already made workflow for book writing? short novel like stuff, 200 pages top

Anonymous 8/8/2025, 7:21:55 PM No.106192146 [Report] >>106192167

pepefroglaughing_thumb.jpg.webm md5: db7ba8dc...

WebM not supported

>>106192137

Anonymous 8/8/2025, 7:23:36 PM No.106192167 [Report]

>>106192146
what's funny? i can write my own workflow to do this but i'm sure somebody already made it. i remember downloading one of these book generators 1 year ago from /g/

Anonymous 8/8/2025, 7:24:22 PM No.106192180 [Report]

>>106192137
You could probably get that done using Roo Code "modes" (agents) or Cline workflows.
Break things down into manageable chunks, lots of indexing and summaries to keep things coherent between chunks and chapters, etc.
Start by planning an overarching story, break that down into chapters, add minor arcs between a couple of the chapters, and voila?
Leave the AI to do its thing.

Anonymous 8/8/2025, 7:24:52 PM No.106192185 [Report] >>106192215

>>106192137
Market's already flooded. You're way too late

Anonymous 8/8/2025, 7:26:01 PM No.106192200 [Report]

>>106191944
Retard take.

Anonymous 8/8/2025, 7:27:40 PM No.106192215 [Report] >>106192277

>>106192185
i gonna appeal to a very specific fetish

Anonymous 8/8/2025, 7:30:18 PM No.106192244 [Report]

>>106191595
Doesn't work on my machine.

Anonymous 8/8/2025, 7:34:07 PM No.106192277 [Report] >>106192325

>>106192215
What fetish? Mine? Please say it's mine.

Anonymous 8/8/2025, 7:37:25 PM No.106192312 [Report]

>>106188883
I use docker compose. Put this in the same directory as the Dockerfile, for you probably `docker/amd`. It's going to need to be slightly different for amd since I use nvidia. But the main thing is you need that deploy/resources to tell it to use GPU. With nvidia you need to set up the container toolchain. I assume there's an equivalent for amd which might be your issue. And for docker cli you need to find the argument to pass to use GPUs `--gpus all`.

text-generation-webui:
build:
context: .
args:
TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-7.5}
BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
APP_GID: 6972
APP_UID: 6972
env_file: .env
user: "6972:6972"
ports:
- "7860:7860"
- "5000:5000"
stdin_open: true
tty: true
volumes:
- # ignoring this for post size.
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

Anonymous 8/8/2025, 7:39:07 PM No.106192325 [Report] >>106192344

>>106192277
can't say it or you'll steal my idea

Anonymous 8/8/2025, 7:40:47 PM No.106192337 [Report] >>106192384

Qwen 30B Thinking is fucking amazing

Anonymous 8/8/2025, 7:41:23 PM No.106192344 [Report]

>>106192325
warewolves but they sparkle in daylight

Anonymous 8/8/2025, 7:43:29 PM No.106192362 [Report] >>106192471 >>106193803

>>106189554
Already found its flaw, no matter what you do the damn model HAS to <think> at the end of a response, which then causes it to endlessly generate(as if EOS token is banned). I tried banning "<think", "</think?", I tried prompting the start of my reply with <think> as an anon suggested, nothing stops it except for constantly tard wrangling and cleaning all of its responses until it finally stops doing it if you're lucky.

Would be a damn good model too if it didn't do that. Its supposed to be a hybrid reasoning model yet it can't stop thinking?

Anonymous 8/8/2025, 7:45:38 PM No.106192384 [Report] >>106192395

>>106192337
At what?

Anonymous 8/8/2025, 7:46:21 PM No.106192395 [Report]

>>106192384
answering my questions

Anonymous 8/8/2025, 7:50:40 PM No.106192448 [Report]

>>106191897
They are not libtards, they are marketers. They have no morals and few opinions of their own. See: Zucc flip-flopping his public image every 6 months.
Playing the world's most concerned safety advocate gets them attention and gets them enterprise contracts (aka the only feasible way to monetize LLMs today), so that's the role they'll play.

Anonymous 8/8/2025, 7:51:55 PM No.106192461 [Report]

>>106186999
Bump. I doubled the speed by changing from using '-ot` with `.ffn_.*_exps.=CPU` to selecting only the last n layers that don't fit into vram (`blk\.(2[7-9]|[3-4][0-9]).*=CPU`) but I still don't know what is possible for these.

Anonymous 8/8/2025, 7:52:47 PM No.106192471 [Report]

>>106192362
>Its supposed to be a hybrid reasoning model
There's a reason Qwen went back on the idea.

Anonymous 8/8/2025, 7:55:43 PM No.106192506 [Report] >>106192526 >>106192531

So did Qwen drop their big non model thing yet?

Anonymous 8/8/2025, 7:55:47 PM No.106192508 [Report]

any progress on step3 support in llama.cpp?

Anonymous 8/8/2025, 7:57:12 PM No.106192526 [Report] >>106192564

>>106192506
it was 2000 free qwen coder calls per day

Anonymous 8/8/2025, 7:57:32 PM No.106192531 [Report] >>106192540 >>106192592 >>106192675

file.png md5: afa0c155...

>>106192506

Anonymous 8/8/2025, 7:58:14 PM No.106192540 [Report] >>106192561 >>106192564

>>106192531
It's a Claude Code knock off?

Anonymous 8/8/2025, 8:00:17 PM No.106192561 [Report] >>106192951

>>106192540
yes, but to be more specific it's a direct fork of gemini cli, which is a claude code ripoff

Anonymous 8/8/2025, 8:00:22 PM No.106192564 [Report] >>106192951

>>106192540
it already existed before >>106192526
>claude code knockoff
claude code is a knockoff of something else too
anthropic is NIGGER

Anonymous 8/8/2025, 8:02:45 PM No.106192592 [Report]

>>106192531
lol

Anonymous 8/8/2025, 8:06:51 PM No.106192643 [Report] >>106192662 >>106192669

If I get a mi50, is it possible to have the active parts of a moe on a 3090, and the rest on the mi50 (instead of ram).

Anonymous 8/8/2025, 8:08:23 PM No.106192662 [Report] >>106192679

>>106192643
perhaps? -ot NIGGER=CUDA0 -ot PENIS=ROCM1 or whatever=VULK1
i think you'd need to use vulkan

Anonymous 8/8/2025, 8:09:06 PM No.106192669 [Report]

>>106192643
It is possible by using -ot argument in llama.cpp to assign parts of the model to a specific device, but that would require to run on vulkan, which may or may not be worth the trouble.

Anonymous 8/8/2025, 8:09:36 PM No.106192675 [Report]

>>106192531
His thing is smaller than I expected.

Anonymous 8/8/2025, 8:09:56 PM No.106192679 [Report] >>106192692 >>106193083

>>106192662
Wait you can run cuda and rocm at the same time? I didn't know that. Thought you'd have to switch everything to vulkan for mixing and marching gpus.

Anonymous 8/8/2025, 8:10:53 PM No.106192692 [Report] >>106192713

>>106192679
>you can run both
probably not thats why i said >i think you'd need to use vulkan
in the end

Anonymous 8/8/2025, 8:12:49 PM No.106192713 [Report]

>>106192692
Ah, yeah, I iust looked around, can't.

Anonymous 8/8/2025, 8:33:17 PM No.106192945 [Report] >>106193152

file.png md5: b165d32b...

im so retarded even dipsy insults me

Anonymous 8/8/2025, 8:33:27 PM No.106192951 [Report]

>>106192561
>>106192564
>ripoff of a ripoff of a ripoff
kek

Anonymous 8/8/2025, 8:33:52 PM No.106192962 [Report] >>106193872

defeatedByRoses.jpg md5: c8aabc5e...

>>106191564
Did you miss this part?
> For OpenAI, it must have been very compelling to train a Phi-style model for their open-source release. They needed a model that beat the Chinese open-source models on benchmarks, while also not misbehaving in a way that caused yet another scandal for them.
> Unlike Meta, they don’t need their open-source model to be actually good, because their main business is in their closed-source models.
Pure clown world.

Anonymous 8/8/2025, 8:40:46 PM No.106193038 [Report] >>106193073 >>106193114 >>106193183 >>106193201 >>106193479

Screenshot 2025-08-08 at 2.35.22 PM.png md5: 35e6edec...

>i-it was just a prank bro!

Anonymous 8/8/2025, 8:43:27 PM No.106193073 [Report]

>>106193038
lol

Anonymous 8/8/2025, 8:44:11 PM No.106193083 [Report]

>>106192679
if there are no driver issues, llama.cpp can do it

Anonymous 8/8/2025, 8:44:13 PM No.106193086 [Report] >>106193099 >>106193125

Now that the summer release circle is fully over and there won't be anything notable until december the earliest, are you satisfied with what we got?

Anonymous 8/8/2025, 8:45:24 PM No.106193099 [Report]

>>106193086
Actually, there is one more thing. Just wait for it. It's probably next week.

Anonymous 8/8/2025, 8:46:02 PM No.106193114 [Report]

>>106193038
>biggest feature is making the model selection automatic
>people hate it
>"we will show you which model the AI chose and maaaaaaaaaaaybe allow you to use the previous model
OpenAI never changes. Their rugpulls are starting to make me giggle at this point.

Anonymous 8/8/2025, 8:47:18 PM No.106193125 [Report]

>>106193086
Gemma 4, Mistral Large 3 left!
Also Llama 4.1 if Meta didn't completely give up on it.

Anonymous 8/8/2025, 8:49:44 PM No.106193152 [Report]

>>106192945
>it's an actual youtube video
man what

Hi all, Drummer here... 8/8/2025, 8:50:46 PM No.106193165 [Report] >>106193177 >>106193214 >>106193259 >>106193289 >>106193837 >>106193998 >>106194480

> we must dissent

https://huggingface.co/BeaverAI/models?search=20b

(It's trash but I just wanted to share it with you all. I got feedback that Omnius v1a had decent writing if you skip reasoning. Fallen GPT OSS v1b would be the least censored with reasoning, but also most deepfried among the 3 versions.)

---

Can I get more data on GLM 4.5? I haven't tried it extensively myself. I'm not amazed by the outputs, but not disappointed either.

For those singing its praises, what's your setup and what's so special about it? Is Air good? Are you using reasoning? What quant? How is this Nemo-like?

Anonymous 8/8/2025, 8:52:12 PM No.106193177 [Report]

>>106193165
You did scout, didn't you? How can you have any model quality bar after this?

Anonymous 8/8/2025, 8:53:03 PM No.106193183 [Report] >>106193202

>>106193038
lmao they don't even consider gpt5 an improvement to 4o? it's so over

Anonymous 8/8/2025, 8:54:28 PM No.106193201 [Report]

>>106193038
>make redditors, the most submissive sloppa eaters, ultra mad
>hold an ama
what was the plan here

Anonymous 8/8/2025, 8:54:31 PM No.106193202 [Report]

>>106193183
It's not that, but a lot of the users seem to have developed some weird relationship with 4o.

Anonymous 8/8/2025, 8:56:42 PM No.106193214 [Report] >>106193228 >>106193242

>>106193165
>https://huggingface.co/BeaverAI/models?search=20b
i'll bite, post ST master export
>Can I get more data on GLM 4.5?
I find GLM 4.5 Air very nice, in the first 4k tokens it behaves well, it can write unhinged stuff and doesnt seem to have a prefiltered dataset, that's how iit's like nemo
>For those singing its praises, what's your setup and what's so special about it?
rtx 3060 12gb/64gb ddr4/i5 12400f Q3_K_XL ik_llama.cpp
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048
it is definitely smarter than nemo, that's why im singing priases (it runs at a very nice speed too 6-9t/s depending on context)
i used it mainly with thinking, my issues are: it's a little positivity biased, it could use more erp data and it's repetitive
it's what llama 4 scout was supposed to be
also im currently testing https://huggingface.co/TheDrummer/Gemma-3-R1-12B-v1-GGUF
do you recommend any specific instruct template for the gemma3 "r1" models?

Anonymous 8/8/2025, 8:58:46 PM No.106193228 [Report] >>106193255

>>106193214
i especially find it's spatial intelligence way better than nemo's

Anonymous 8/8/2025, 9:00:05 PM No.106193237 [Report] >>106193242 >>106193263

Reminder that all finetunes are a meme and the same result can be achieved with a prompt.

Hi all, Drummer here... 8/8/2025, 9:01:04 PM No.106193242 [Report] >>106193259 >>106193287

>>106193214
What happens past 4K tokens? Does it become less creative?

> do you recommend any specific instruct template for the gemma3 "r1" models?

It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.

>>106193237
Am I unsightly for you?

Anonymous 8/8/2025, 9:02:14 PM No.106193255 [Report]

>>106193228
It also knows a lot more than nemo. Which is to be expected, but still.

Anonymous 8/8/2025, 9:02:53 PM No.106193259 [Report]

>>106193242
>>106193165
I feel like this is a troll because I'm sure drummer is smart enough to not redditspace.
If not then that explains why his models are garbage.

Anonymous 8/8/2025, 9:03:04 PM No.106193263 [Report]

>>106193237
Not really, but a good prompt and lorebook can do miracles in many cases.

Anonymous 8/8/2025, 9:06:17 PM No.106193287 [Report] >>106193308

>>106193242
past 4k tokens it starts repeating (at least with https://files.catbox.moe/gjw3c3.json (chat completion preset))
it also stops thinking, for example
anon: *rapes u*
glm4.5air: <think>3 paragraphs about cumming</think> 3 paragraphs about cumming
if you put <think>okay as prefill then it gets even more repetitive eventually
>It's just Gemma 3 chat template with <think> prefilled if you want reasoning. The 12B version might be a tad bit undercooked. 4B and 27B seem to be ready though.
thanks!

Anonymous 8/8/2025, 9:06:29 PM No.106193289 [Report]

>>106193165
I'm not sure if you can fix Air. It's sloppy, sure, but that's not its primary issue. It's repetitive as hell past 8k, with repetition creeping in at 4k already. You'd probably need to do long context training to correct that issue, which I'm not sure you have the capacity to do. I really want to like the model since it has a ton of world knowledge, but man the repetition really kills it and I haven't found any sampler settings that work without making the model retarded.

Anonymous 8/8/2025, 9:08:01 PM No.106193308 [Report] >>106193331 >>106193353 >>106193409

>>106193287
Are you keeping the think blocks from previous messages?

Anonymous 8/8/2025, 9:10:49 PM No.106193331 [Report] >>106193354

>>106193308
yes
i have been thinking of switching to something like
sysprompt
user: write next msg in rp
"whole roleplay"
assistant:
but idk how i'd be able to make it not reprocess every time
>just use base model
but then it'll be stupid...

Anonymous 8/8/2025, 9:11:05 PM No.106193336 [Report]

d9e92399-24d9-4ad8-aa40-d27c9f0892a0.gif md5: a4656f40...

>>106189507 (OP)

Anonymous 8/8/2025, 9:11:20 PM No.106193343 [Report] >>106193982

>>106190504
>>106190552
>another example: ask it to create a function to convert from one floating point format to another.
That's a silly example.
>gave it a couple of header files with function definitions, asking it to create tests for the interface, and it was able to create mock objects of everything.
That's something real.

I'll try a variation of the prompt I've used before for a real task, extracting information from webpages in a certain prompt. Instead of asking for a solution, asking it to revise, etc. I'm going to try to one-shot it by appending to the prompt real data. That makes the prompt 13k to 17k tokens long.

gpt-oss-120b (ollama) failed.
Qwen3-235B-A22B-Thinking-2507-8bit succeeded (in 10 minutes 26 seconds).
Qwen3-Coder-480B-A35B-Instruct-4bit failed.
Qwen3-Coder-480B-A35B-Instruct-6bit succeeded (in 3 minutes 39 seconds).
Qwen3-Coder-30B-A3B-Instruct-8bit failed.
Devstral-Small-2505 (OpenRouter) succeeded.
Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.

Anonymous 8/8/2025, 9:12:39 PM No.106193353 [Report] >>106193399 >>106193409

>>106193308
How do You remove think blocks from previous messages?

Anonymous 8/8/2025, 9:12:46 PM No.106193354 [Report] >>106193369 >>106193388 >>106193491

>>106193331
Generally you're supposed to trim the previous think blocks with reasoner, it's on deepseek's wiki thing https://api-docs.deepseek.com/guides/reasoning_model#multi-round-conversation

Anonymous 8/8/2025, 9:13:47 PM No.106193369 [Report]

>>106193354
Meant with reasoners in general. Not just DS reasoner.

Anonymous 8/8/2025, 9:14:23 PM No.106193379 [Report]

Are local models good enough yet that they let me go vibe code a better alternative to the piece of shit that is ST despite having lose to 0 coding experience?

Anonymous 8/8/2025, 9:15:02 PM No.106193388 [Report] >>106193404

>>106193354
time to try that, i'll be doing it manually because i dont know how otherwise
how do i even trim the <think> blocks when theyre not visible in ST edit
(i have an answer: remove reasoning parsing)

Anonymous 8/8/2025, 9:15:09 PM No.106193392 [Report] >>106193634

>>106189507 (OP)
Downloaded Nemo Instruct 12b per the guide, running ST with koboldcpp on a 12gb 50 series.
Are there any default settings I should change? Or any better quants for my shitcard? Ty anons.

Anonymous 8/8/2025, 9:15:45 PM No.106193399 [Report] >>106193491

file.png md5: 1023d638...

>>106193353
I think ST does it by default unless you set add to prompt on

Anonymous 8/8/2025, 9:16:21 PM No.106193404 [Report] >>106193491

>>106193388
ST doesn't send parsed reasoning except on continue.

Anonymous 8/8/2025, 9:17:01 PM No.106193409 [Report] >>106193429 >>106193460 >>106193491 >>106193546 >>106193831 >>106194663

>>106193308
>>106193353
For GLM 4.5 I think you're supposed to emit empty think blocks for old messages.
https://hf.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}

Anonymous 8/8/2025, 9:17:44 PM No.106193421 [Report] >>106193430

Reminder to always look in the console to make sure you're sending the exact prompt you're expecting to send.

Anonymous 8/8/2025, 9:18:49 PM No.106193429 [Report]

>>106193409
What in the fuck.

Anonymous 8/8/2025, 9:19:14 PM No.106193430 [Report] >>106193491

file.png md5: 185e9d62...

>>106193421
There's the prompt inspector in ST too

Anonymous 8/8/2025, 9:20:49 PM No.106193445 [Report]

>>106188313
>>106188331
>>106188303
>>106188272
went to bed after this but thanks!

Anonymous 8/8/2025, 9:21:42 PM No.106193460 [Report]

>>106193409
That also looks like it will fuck up prefills.

Anonymous 8/8/2025, 9:24:07 PM No.106193479 [Report] >>106193977

>>106193038
>Nothing about the OSS models being shit
Fucking kek

Anonymous 8/8/2025, 9:25:49 PM No.106193491 [Report]

file.png md5: d161bc53...

>>106193409
>>106193404
>>106193399
>>106193354
>>106193430
thanks for all the infos anons, seems like it already gets removed automatically
i guess there's nothing to fix

Anonymous 8/8/2025, 9:28:45 PM No.106193512 [Report] >>106193544

file.png md5: 10f2491c...

..i think i prefer shivering spines to the smell of ozone

Anonymous 8/8/2025, 9:30:44 PM No.106193534 [Report] >>106193554 >>106193570 >>106193579 >>106193627 >>106193726 >>106195437

https://xcancel.com/QuixiAI/status/1953809869972107739
eric slopford (creator of slophin) thinks you should stop whining and learn to love the toss.

Anonymous 8/8/2025, 9:32:03 PM No.106193544 [Report] >>106193862

>>106193512
Stupid things think blood tastes like copper and chlorine smells like ozone. We need to add more modalities.

Anonymous 8/8/2025, 9:32:06 PM No.106193546 [Report]

>>106193409
huh, so we're supposed to remove the reasoning from old messages and just leave <think></think>?

Anonymous 8/8/2025, 9:32:29 PM No.106193554 [Report] >>106193640

>>106193534
> other than being overly structured and prudish, I've no problem
>it there's anything you don't like about it you can fine-tune it to act differently. (And you can sell your fine-tune and keep all the profit!)
Thanks Eric, it's great to see how you turned out.

Anonymous 8/8/2025, 9:34:44 PM No.106193570 [Report] >>106193590 >>106193732

file.png md5: 99cc3b00...

>>106193534
I love Twitter users.
>there is never good reason to complain

Anonymous 8/8/2025, 9:35:23 PM No.106193579 [Report] >>106193589

>>106193534
>compare it to 3.3 70B
lmao

Anonymous 8/8/2025, 9:36:02 PM No.106193583 [Report]

>>106189947
dots.vlm's attempt:
>せっかく労働を労ってやったのに無視された……(しょぼん)
>まあ、警視庁が都案を快く思ってない事ぐらい、
>よぉぉくわかってますよ!

Anonymous 8/8/2025, 9:36:14 PM No.106193589 [Report]

>>106193579
But it's FAST, who cares if it's good as long as its FAST.

Anonymous 8/8/2025, 9:36:17 PM No.106193590 [Report]

>>106193570
oh great, the faggot that caused the downfall of the HF LLM leaderboard is defending openai
haha!

Anonymous 8/8/2025, 9:36:54 PM No.106193598 [Report] >>106193615

>>106192039
it doesn't seem that repetitive with top nsigma on

Anonymous 8/8/2025, 9:38:39 PM No.106193615 [Report] >>106193668

>>106193598
Can You share what value You're using nsigma at please?

Anonymous 8/8/2025, 9:39:28 PM No.106193627 [Report] >>106193648 >>106193794

file.png md5: 2d705c83...

>>106193534
They're all sucking that OAI cock

Anonymous 8/8/2025, 9:40:04 PM No.106193634 [Report] >>106193679 >>106193691

1726298990603810.png md5: 09554102...

>>106193392
Anyone?

Anonymous 8/8/2025, 9:40:17 PM No.106193640 [Report]

>>106193554
Is he selling any finetrooned models?

Anonymous 8/8/2025, 9:40:57 PM No.106193648 [Report]

1750360263462.jpg md5: 1b567e66...

>>106193627

Anonymous 8/8/2025, 9:42:48 PM No.106193666 [Report] >>106193692

Got my 192GB DDR5 5200 kit. I was running 128GB with 4400 when it was rated 6000. I also couldn't get past 4400 with 192GB but after getting latest bios surprisingly 7800X3D works with 192GB 5200.

Anonymous 8/8/2025, 9:42:58 PM No.106193668 [Report] >>106193729

>>106193615
top nsigma should always be set to 1 unless you want to disable it, then you set it to 0.

Anonymous 8/8/2025, 9:43:57 PM No.106193679 [Report]

>>106193634
I would help but I am part of the anti-miku faction of this general.

Anonymous 8/8/2025, 9:44:10 PM No.106193683 [Report] >>106193694 >>106193715 >>106193717

What is the best model I can use for uncensored ERP? 2070 super (8gb vram) 32gb ram, amd ryzen 3700x.

I was using mistral nemo instruct 2407 q4 k m, but it's honestly a bit lackluster. Any recommendations?

Anonymous 8/8/2025, 9:44:44 PM No.106193691 [Report]

>>106193634
Ilya-san... nani kore?!

Anonymous 8/8/2025, 9:44:49 PM No.106193692 [Report] >>106193707

>>106193666
very nice, post some speeds
what gpus do u have? are u the 2080ti22g/p40 anon?

Anonymous 8/8/2025, 9:44:59 PM No.106193694 [Report]

>>106193683
>Any recommendations?
mistral nemo instruct 2407 q4 k m

Anonymous 8/8/2025, 9:46:00 PM No.106193707 [Report]

>>106193692
Just a 4090.

Anonymous 8/8/2025, 9:46:51 PM No.106193715 [Report] >>106193741

>>106193683
rocinante

Anonymous 8/8/2025, 9:47:31 PM No.106193717 [Report] >>106193783

>>106193683
Get more RAM

Anonymous 8/8/2025, 9:48:06 PM No.106193726 [Report] >>106193961

>>106193534
>just finetune it bro
But one thing is finetuning a model for one very narrow task, another to restore missing knowledge and removing/mitigating refusals while not murdering general performance. Is he talking about 25~50M rows finetunes?

Anonymous 8/8/2025, 9:48:24 PM No.106193729 [Report]

>>106193668
what? no, you can set it to whatever. in the paper they recommend 1 as a default but iirc they discuss other values as being reasonable
a quick search shows their own official repo says as much:
https://github.com/Tomorrowdawn/top_nsigma
>A key question is: what's the best value for n ? While this parameter serves as an alternative to temperature for controlling diversity, its optimal value isn't fully settled yet. The community suggests a range of 0-2, though this is quite broad. In my own experience, any value between 0.3 and 1.5 could work well. If you prefer conservative sampling, use a lower value like 0.7; for more diversity, try 1.3.

Anonymous 8/8/2025, 9:48:52 PM No.106193732 [Report]

>>106193570
SAAAAAAARRRR

Anonymous 8/8/2025, 9:49:38 PM No.106193741 [Report] >>106193755

>>106193715
No

Anonymous 8/8/2025, 9:51:44 PM No.106193755 [Report]

>>106193741
read through the archives

Anonymous 8/8/2025, 9:54:14 PM No.106193783 [Report]

>>106193717
Absolutely not

Anonymous 8/8/2025, 9:55:04 PM No.106193794 [Report]

>>106193627
It was, but for a different reason.

Anonymous 8/8/2025, 9:56:04 PM No.106193803 [Report]

>>106192362
Air at least doesn't do that as long as there's a <think></think> block with some content at the start of its reply, whether generated by it or prefilled. If there's nothing or almost nothing in the think block, it may generate a new one.

Anonymous 8/8/2025, 9:58:46 PM No.106193831 [Report] >>106193979

>>106193409
sadly this doesnt fix it either, rip

Anonymous 8/8/2025, 9:59:47 PM No.106193837 [Report]

>>106193165
GLM 4.5 Air base might be worth looking at aswell, since the thinking/instruct is a little cooked

Anonymous 8/8/2025, 10:03:12 PM No.106193862 [Report]

>>106193544
Smell will never be a modality you sick fuck

Anonymous 8/8/2025, 10:04:06 PM No.106193872 [Report]

>>106192962
Again, he said it out loud because he's read these fucking threads. The fact that orange reddit needs someone at Github to tell them is amusing.

Anonymous 8/8/2025, 10:04:16 PM No.106193873 [Report] >>106193896

For me, it's when she bites her lips.

Anonymous 8/8/2025, 10:06:24 PM No.106193896 [Report] >>106193910

>>106193873
Yep, unlike me, whose teeth have never touched the lips

Anonymous 8/8/2025, 10:07:07 PM No.106193903 [Report] >>106193946

Regarding Quen-2.5-VL - is 3B enough for OCR purposes, or should I go up to 7B?

Anonymous 8/8/2025, 10:07:27 PM No.106193910 [Report]

>>106193896
how?

Anonymous 8/8/2025, 10:09:54 PM No.106193946 [Report]

>>106193903
dots.ocr

Anonymous 8/8/2025, 10:11:01 PM No.106193961 [Report]

>>106193726
>removing/mitigating refusal
How about no, you terrorist?

Anonymous 8/8/2025, 10:12:08 PM No.106193977 [Report]

>>106193479
Normies don't care about that

Anonymous 8/8/2025, 10:12:19 PM No.106193979 [Report] >>106194132

file.png md5: 5915988a...

>>106193831
forgot pic

Anonymous 8/8/2025, 10:13:07 PM No.106193982 [Report]

>>106193343
>Devstral-Small-2507-8bit failed. I tried a second time and it failed again. That's surprising enough that I'm downloading Devstral-Small-2505 to run locally to see if the issue of 2505 vs 2507 is quantization or if it was just luck.
Downloaded Devstral-Small-2505-8bit. Tried 3 times locally, failing twice then succeeding once and stopping. Each answer took about 40 seconds. There might be something going on but I'm willing to just say I got lucky RNG and leave it at that.
To give gpt-oss-120b a fair shake I tried it another 8 times but it never produced a program that behaved correctly.

Anonymous 8/8/2025, 10:14:51 PM No.106193998 [Report]

>>106193165
>>https://huggingface.co/BeaverAI/models?search=20b
>i'll bite, post ST master export
i guess i wont bite then

Anonymous 8/8/2025, 10:18:01 PM No.106194037 [Report] >>106194067 >>106194084 >>106194089 >>106194098 >>106194102

1736412130484213.png md5: f2133acc...

Anonymous 8/8/2025, 10:20:25 PM No.106194067 [Report]

>>106194037
PIAA

Anonymous 8/8/2025, 10:21:55 PM No.106194084 [Report]

>>106194037
strawberry bros it's fucking over

Anonymous 8/8/2025, 10:22:24 PM No.106194089 [Report]

out_thumb.jpg.webm md5: 572c7443...

WebM not supported

>>106194037

Anonymous 8/8/2025, 10:23:08 PM No.106194098 [Report]

>>106194037
This is correct in America

Anonymous 8/8/2025, 10:23:34 PM No.106194102 [Report] >>106195466

>>106194037
To anyone wondering where the fuck it got -.21 from, I'm not entirely sure but I have a clue.
5.11 - 4.9 = 0.21.
5.11 - 5.9 (i.e. the wrong way around) is actually -0.79 but somewhere in the process it negated .21 to -.21.

Anonymous 8/8/2025, 10:24:54 PM No.106194114 [Report] >>106194128

Has a LLM ever recommended a commercial service to you? Qwen3 235B shilled me Bright Data (after I specifically asked for SaaS) for scraping. Integrated right into my scraping code base. I'd imagine companies would pay AI companies to train on their ads, much like how they would pay for Google search ads.

Anonymous 8/8/2025, 10:26:01 PM No.106194128 [Report] >>106194273 >>106194348

>>106194114
check out drummers' r-rrrr royfield i dont know the name finetune haha

Anonymous 8/8/2025, 10:26:43 PM No.106194132 [Report] >>106194164 >>106195667

>>106193979
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}

The content shouldn't be on the same line as <think></think>. Also do you have [gMASK]<sop> at the start of the chat? Not that I think any of this will solve your problem.

Anonymous 8/8/2025, 10:28:15 PM No.106194147 [Report] >>106194174 >>106194178 >>106194939

file.png md5: c132e8c6...

https://xcancel.com/alibaba_qwen/status/1953760230141309354
>Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!
>something about Dual Chunk Attention

Anonymous 8/8/2025, 10:28:40 PM No.106194150 [Report] >>106194178 >>106194190

What was the big thing not model?

Anonymous 8/8/2025, 10:29:38 PM No.106194164 [Report]

file.png md5: c4ff130d...

>>106194132
i do, i'll fix the <think></think> and try regenning
>Not that I think any of this will solve your problem.
probably right
thanks for pointing out, anon

Anonymous 8/8/2025, 10:30:53 PM No.106194174 [Report]

>>106194147
Already talked about

Anonymous 8/8/2025, 10:31:05 PM No.106194178 [Report]

>>106194150
>>106194147

Anonymous 8/8/2025, 10:31:54 PM No.106194190 [Report]

>>106194150
wasn't it free Qwen coder requests?

Anonymous 8/8/2025, 10:37:10 PM No.106194248 [Report] >>106194288 >>106194305 >>106194319 >>106195602

1738675283710466.png md5: 2517be1b...

Anonymous 8/8/2025, 10:38:43 PM No.106194273 [Report] >>106194287

>>106194128
>i dont know the name finetune haha
>haha
KYS DRUMMER

Anonymous 8/8/2025, 10:40:01 PM No.106194287 [Report]

>>106194273
not rocinante negro

Anonymous 8/8/2025, 10:40:02 PM No.106194288 [Report] >>106194305 >>106194319 >>106195602

1745698541673079.png md5: a314285d...

>>106194248

Anonymous 8/8/2025, 10:41:30 PM No.106194305 [Report] >>106194330

>>106194248
>>106194288
weird

Anonymous 8/8/2025, 10:42:47 PM No.106194319 [Report]

>>106194248
>>106194288
So tiresome

Anonymous 8/8/2025, 10:43:40 PM No.106194330 [Report]

>>106194305
yeah obsessing over shit like that is definitely weird.

Anonymous 8/8/2025, 10:45:03 PM No.106194348 [Report] >>106194377

>>106194128
You probably mean Rivermind™
https://huggingface.co/TheDrummer/Rivermind-12B-v1

Anonymous 8/8/2025, 10:47:24 PM No.106194377 [Report]

>>106194348
yeaaa that oneee...

Anonymous 8/8/2025, 10:57:30 PM No.106194480 [Report]

>>106193165
>I just wanted to share it with you all
Hi Drummer, all here... We didn't want you to share it with us. Go away you temu undi.

Anonymous 8/8/2025, 11:02:45 PM No.106194539 [Report] >>106194582 >>106194641 >>106194743

file.png md5: 1a742bce...

Not a single glm base (non-air) goof on hf. Is that because base is shit because pic related?

Anonymous 8/8/2025, 11:06:03 PM No.106194558 [Report] >>106194578

Just got an ebay MI50 32gb, but when I load a model bigger than 16gb in llama.cpp it spills over to RAM.
Did the chinks bamboozled me with a 16gb card or am I retarded?
Running rocm-smi --showmeminfo vram gives me
GPU[0] : VRAM Total Memory (B): 34342961152

Anonymous 8/8/2025, 11:08:07 PM No.106194578 [Report] >>106194621

>>106194558
Does that also happen with vulkan?

Anonymous 8/8/2025, 11:08:26 PM No.106194582 [Report] >>106194615 >>106194641

name-probs-bases.png md5: 8f2cfe4d...

>>106194539
Yes, the base is fake, just like Qwen bases. Their 32b base had somewhat acceptable distribution, but the big one doesn't.

Anonymous 8/8/2025, 11:09:54 PM No.106194597 [Report]

Local is dead again... Sama please save us

Anonymous 8/8/2025, 11:11:22 PM No.106194615 [Report] >>106194627

>>106194582
I can't believe I got an informative answer ITT.

Anonymous 8/8/2025, 11:11:42 PM No.106194621 [Report]

>>106194578
I'm using vulkan because I couldn't get rocm working with llama.cpp yet.

Anonymous 8/8/2025, 11:12:41 PM No.106194627 [Report]

>>106194615
That and quanters just generally dgaf about base models.

Anonymous 8/8/2025, 11:13:42 PM No.106194641 [Report] >>106194686

>>106194539
>>106194582
Companies forgot what base even means. And only actual weirdos and freaks care about foundational models anymore. Air has 360 downloads, big GLM has 163 (meanwhile instruct tunes have 75x and 104x of that). The world will die under a pile of instruct garbage.

Anonymous 8/8/2025, 11:15:26 PM No.106194663 [Report]

Hmm. I just tried >>106193409 and did a swipe on a chat where the model hard repeats the last message verbatim. And it worked. It didn't repeat. I'll do some more testing, but this is promising. I guess I'll need to use the jinja playground a bit more deeply in the future rather than assume a single chat turn with the default example reveals everything about the templating logic.

Anonymous 8/8/2025, 11:17:27 PM No.106194686 [Report]

>>106194641
People forgot that the original AIDungeon was GPT-2 fineturned and bootstrapped on CYOA data

Anonymous 8/8/2025, 11:21:20 PM No.106194732 [Report] >>106194749

1744534670936990.png md5: bd2f53a7...

Anonymous 8/8/2025, 11:22:24 PM No.106194743 [Report]

>>106194539
>glm base (non-air) goof
Use case? No one doing creative writing, or whatever people use base locally, has the rig to run it.

Anonymous 8/8/2025, 11:23:02 PM No.106194749 [Report]

>>106194732
stop posting shit on reddit that was posted here before it was posted on reddit

Anonymous 8/8/2025, 11:27:09 PM No.106194795 [Report] >>106194809 >>106194818 >>106194823

Which model can give me an oiled footjob

Anonymous 8/8/2025, 11:28:07 PM No.106194809 [Report]

>>106194795
JEPA

Anonymous 8/8/2025, 11:29:07 PM No.106194818 [Report] >>106194843

>>106194795
a lot of the ones on instagram but you have to pay them like $50k and fly them out

Anonymous 8/8/2025, 11:29:39 PM No.106194823 [Report]

>>106194795
OSS-20B

Anonymous 8/8/2025, 11:31:35 PM No.106194843 [Report]

>>106194818
lol fuck that I can build a GPUMAXXED rig with that kind of money

Anonymous 8/8/2025, 11:36:07 PM No.106194898 [Report] >>106195526

1754521444580707.jpg md5: 8c15cf1b...

V340 anon where are you..

Anonymous 8/8/2025, 11:38:26 PM No.106194919 [Report] >>106194953

So is LeCunny going to do anything, ever? It looks like Genie 3 has leapfrogged his jepa bullshit, and Zuckerberg is all-in on LLMs. What is next for him? He seems wasted where he is.

Anonymous 8/8/2025, 11:39:58 PM No.106194939 [Report]

>>106194147
Gguf status?

Anonymous 8/8/2025, 11:41:09 PM No.106194953 [Report] >>106194960

>>106194919
Ask him on twitter. Come back with a screenshot of his reply.

Anonymous 8/8/2025, 11:41:33 PM No.106194960 [Report]

>>106194953
It's X

Anonymous 8/8/2025, 11:46:00 PM No.106195004 [Report] >>106195042

https://desuarchive.org/g/thread/105750356/#105755753
>Hi all, Drummer here...
>I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!
He didn't even last a week

Anonymous 8/8/2025, 11:50:13 PM No.106195042 [Report]

>>106195004
Neither did the spammer. It's all copycats now.

Anonymous 8/9/2025, 12:01:10 AM No.106195162 [Report] >>106195526

1714153098496425.png md5: 4e165ccd...

>>106189515
>--AMD Zen 6's 16 memory channels require full platform overhaul, favoring patient adopters:
huh 16 channels of ddr5, and now we have higher jedec speeds, at which speed we could expect that?
funny enough i am most worried for simulations, but if i could use it for llms better

Anonymous 8/9/2025, 12:10:25 AM No.106195241 [Report] >>106195251 >>106195536

Guys, is deepseek okay?

Anonymous 8/9/2025, 12:11:56 AM No.106195251 [Report]

>>106195241
He's dead, Jim.

Anonymous 8/9/2025, 12:31:54 AM No.106195437 [Report] >>106195445

>>106193534
>Compare it to llama 3.3 70b
why are there still retards in this day and age comparing anything to the dogshit llama models??

Anonymous 8/9/2025, 12:32:57 AM No.106195445 [Report] >>106195608

>>106195437
the only other model they know is R1 and they can't run it on their machine

Anonymous 8/9/2025, 12:34:58 AM No.106195466 [Report] >>106196060

>>106194102
Well now I'm wondering where the fuck you got 4.9 from

Anonymous 8/9/2025, 12:41:30 AM No.106195526 [Report] >>106195553

>>106195162
It's not coming to common desktop platforms, Threadripper will be like almost a year late and Eypc will be expensive. Until Intel competes, that's how it's going to be in CPU land.
>>106194898
I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.

Anonymous 8/9/2025, 12:42:38 AM No.106195536 [Report] >>106195554 >>106195595 >>106195620

>>106195241
They are underperforming. There is now pressure to be #1 on lmarena from Xi himself. R2/V4 MUST be a huge jump. So far the improvements are incremental at best.

Anonymous 8/9/2025, 12:44:36 AM No.106195553 [Report]

>>106195526
>I'm somewhat impressed and horrified at what he is doing. I wouldn't really even consider GCN for anything ML due to lack of any datatypes outside of FP16 and any matmul ISA instructions. He's gambling, that's for sure.
>A better card if I was building now that is underrated that would be worth it more if SR-IOV drivers were available is the V620. Someone figured out the arcane Linux boot command line you need to boot your system with it (in case anyone runs into this, it's GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc pci=noaer pcie_aspm=off iommu=pt") and if running headless, it's probably the best 32GB bang per buck card out there, having at least BF16 support and enough dot product instructions for FP16 multiplies to FP32 which is better than what the V100 has and parity with its MMA instructions, only losing with memory bandwidth. The main issue really as always is software, where ROCM for RDNA2 is not better than what is in CUDA right now and Linux only.
how do i get educated enough to understand every word in this post?

Anonymous 8/9/2025, 12:44:40 AM No.106195554 [Report]

>>106195536
I get the impression that most of the pressure is coming from Liang Wenfeng's high standards rather than from the party.

Anonymous 8/9/2025, 12:46:44 AM No.106195578 [Report]

dipsyPilotUnmanageable.png md5: 862576e7...

>>106191538
>Recap bot mentioned dipsy
Ah, didn't notice. Nice.
> Original Character Do Not Steal
Not mine, actually. Do Not Care.
> AGP avatars from his HRT bathtub
lol

Anonymous 8/9/2025, 12:47:56 AM No.106195595 [Report]

>>106195536
>So far the improvements are incremental at best
That's true of all LLMs though. I look into API models progress as much as local and I can't say anything had me give a shit except for when Gemini 2.5 pro released, mainly because it's the first decent model with huge context, and even then the more I use it the more the magic fades, it has strong "opinions" about code that I do not share and still makes tons of mistakes

Anonymous 8/9/2025, 12:48:47 AM No.106195602 [Report]

>>106194248
>>106194288
>GPT now calls you out on your rhetorical tricks
Neat

Anonymous 8/9/2025, 12:49:39 AM No.106195608 [Report]

>>106195445
>235b is a different class. Most people can't obtain the compute to run 235b. But most people can obtain the compute to run 120b, if they want to and try to.
>consumer hardware like 4x3090. (Can be built for $5,000)
ye

Anonymous 8/9/2025, 12:50:51 AM No.106195617 [Report] >>106195635 >>106195648 >>106195667

Actually GLM-4.5 Air kinda sucks and is slop and too censored in its thinking and if you force it not to think it's superslop.

Anonymous 8/9/2025, 12:50:56 AM No.106195620 [Report]

>>106195536
All I need is for their next release to match current Gemini while still being as pliant as the current R1.

Anonymous 8/9/2025, 12:52:31 AM No.106195626 [Report]

man is not a learning animal, or else, man would know by now to avoid something called glm

Anonymous 8/9/2025, 12:52:33 AM No.106195627 [Report] >>106195641

1737476225302405.jpg md5: 214398ab...

Anonymous 8/9/2025, 12:53:20 AM No.106195635 [Report] >>106195667

>>106195617
I was about to say. It's actually pretty censored huh?
Sure, you can get around it, but it really likes to talk about ethics and stuff.

Anonymous 8/9/2025, 12:53:40 AM No.106195641 [Report]

>>106195627
all these are too big and non american too run and 0528 is not years sir

Anonymous 8/9/2025, 12:55:05 AM No.106195648 [Report]

>>106195617
You can edit its thinking and then continue. But it is kind of annoying, true.

Anonymous 8/9/2025, 12:57:55 AM No.106195667 [Report]

>>106195617
>>106195635
https://files.catbox.moe/gjw3c3.json
see if it's still cucked with that
i've been rping on mikupad for hours (currently at 3224 tokens)
>your rig is that SHIT?
no im just multitasking and forcing myself to keep on roleplaying to see if it breaks after 4k context, i sometimes forget i have mikupad open :(
works very well with >>106194132
so far it hasn't stopped thinking ONCE

Anonymous 8/9/2025, 1:01:11 AM No.106195704 [Report]

>>106195686
>>106195686
>>106195686

Anonymous 8/9/2025, 1:37:19 AM No.106196060 [Report]

>>106195466
It's to derive .21. 4.9 is simply 1 lower than 5.9. I pointed out the model "wrapped over" positive to negative .21 when "going over the column"