Thread 106335536

351 posts 56 images /g/

Anonymous 8/21/2025, 2:56:14 PM No.106335536 [Report] >>106335633 >>106336956

/lmg/ - Local Models General

Anonymous 8/21/2025, 2:56:45 PM No.106335541 [Report] >>106335586

tricky.gif md5: 4cb11ef3...

►Recent Highlights from the Previous Thread: >>106328686

--Paper: ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine:
>106330668 >106330747 >106330790 >106330864
--Large batch training inefficiency and the case for small batch, high-precision updates:
>106332874 >106332934 >106332982 >106333013 >106332954 >106333617 >106333832 >106333878 >106333960 >106334268 >106334327 >106334383 >106334403 >106334456 >106334572 >106334708 >106334726 >106334757 >106334769 >106334787 >106334796 >106334806 >106334694 >106333892 >106334071 >106334144 >106334179 >106333052 >106333065
--Debating batch size scheduling and unchallenged assumptions in model training:
>106333979 >106334055 >106334169
--DeepSeek-V3.1 model card update: 840B token pretraining with long context focus:
>106332741 >106332841 >106333056
--V100 for local LLM use debate and shift to modern hardware:
>106328758 >106328807 >106328840 >106328910 >106328917 >106328937 >106328957 >106328985 >106329009 >106329041 >106329074 >106329139 >106329153 >106329189 >106329204 >106329227 >106329241 >106329250 >106329293 >106329261 >106329238 >106329252 >106329286 >106329345 >106329369 >106329377 >106329209 >106329108 >106329462 >106329485 >106329543 >106329547 >106329573 >106329544 >106329579 >106329610 >106329655 >106329802 >106329832 >106329683 >106333471 >106329593 >106330691 >106330717 >106330727
--LLM reasoning utility for roleplay and math performance:
>106331131 >106331167 >106331174 >106331240 >106331314 >106331385 >106331464 >106332167 >106332171 >106332187 >106331172 >106331182 >106331201 >106331388 >106331413
--AI hype bubble bursting and comparisons to past speculative crashes:
>106330193 >106330209 >106330218 >106330261 >106330281 >106332784 >106333201 >106330660 >106331554 >106331571 >106333293 >106333429 >106332746 >106333211
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>106328695

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/21/2025, 2:58:09 PM No.106335555 [Report]

feeet

Anonymous 8/21/2025, 3:00:47 PM No.106335586 [Report]

>>106335541
>--Miku (free space):
>[nothing]
mikuhater-shizo won

Anonymous 8/21/2025, 3:06:15 PM No.106335633 [Report] >>106335669 >>106335686 >>106335702 >>106335704 >>106336163

>>106335536 (OP)
>try glm 4.5 air, offload almost half of all the layers to a 5090, the rest on 124 gb of vram
~/Github/ik_llama.cpp/build/bin/llama-server \
-m ~/models/GLM-4.5-Air-IQ3_KS-00001-of-00002.gguf \
--ctx-size 65536 \
-ub 1024 -b 1024 \
-ctk q6_0 -ctv q6_0 \
--temp 0.6 \
--n-gpu-layers 23 \
--top-p 0.8 \
--top-k 20 \
--min-p 0.0 \
-fa \
-fmoe \
--jinja \
--threads 8 \
--mlock \

>it runs like shit, way slower than read speed
>let's try a different config
>run the big glm 4.5, might as well try draft models for the first time too, why not
~/Github/ik_llama.cpp/build/bin/llama-server \
-m ~/models/GLM-4.5-IQ2_KL-00001-of-00003.gguf \
--ctx-size 65536 \
-fa -fmoe \
-ctk q8_0 -ctv q8_0 \
-ub 4096 -b 4096 \
-ngl 99 \
-ot exps=CPU \
--parallel 1 \
--threads 8 \
--host 127.0.0.1 \
--port 8080 \
--no-mmap
-md DRAFT-0.6B-Q4_0.gguf \
-ngld 99 \
--draft 64 \

>even doe it's twice as big as air and my gpu isn't being fully squeezed up for all it's vram's worth at 29/32 gb, it's easily at least twice as fast and probably smarter too
draft gods... i kneel...

Anonymous 8/21/2025, 3:09:42 PM No.106335669 [Report] >>106335823

>>106335633
post pp comparison

Anonymous 8/21/2025, 3:12:12 PM No.106335686 [Report] >>106335823

>>106335633
are you dumb? shittier quants on the cache, smaller batch size, no GPU offload in AIR and then you come crying that it's slower? like DUDE

Anonymous 8/21/2025, 3:13:42 PM No.106335702 [Report] >>106335719 >>106335721

>>106335633
>moesissy with a draft model
this thing probably scores 1% on simpleqa and doesn't have any world knowledge lmaoooo

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
>all it took to match a 405b ancient llama was a 2t moe

Anonymous 8/21/2025, 3:13:58 PM No.106335704 [Report] >>106335823

>>106335633
you didn't use "-ot exps=CPU" for the first one though, that's a big speed boost for moes.

Anonymous 8/21/2025, 3:14:11 PM No.106335706 [Report]

Screenshot 2025-08-21 at 10-13-45 server implement GLM-style MTP by F1LM1 · Pull Request #15225 · ggml-org_llama.cpp · GitHub.png md5: 13fac2c9...

Anonymous 8/21/2025, 3:15:17 PM No.106335719 [Report]

>>106335702
If you need 4x total size, but gain 10x speed, that's still a good tradeoff.

Anonymous 8/21/2025, 3:15:45 PM No.106335721 [Report] >>106335756

>>106335702
Does the 2t moe have 405b active? No? Then fuck off

Anonymous 8/21/2025, 3:19:25 PM No.106335746 [Report] >>106335754 >>106335761 >>106335825

Does anyone here use base models for textcomp /aids/ style? I was fooling around with it recently and tried Mistral Small 3.1 (kinda dumb but not too pozzed) and the newest Gemma 27B (#?smarter but more pozzed). Was curious if anyone had any other recommendations below the huge ass 100B+ MoEs.

Anonymous 8/21/2025, 3:20:44 PM No.106335754 [Report] >>106335781

>>106335746
I'm sorry but what is >textcomp /aids/ style?

Anonymous 8/21/2025, 3:20:51 PM No.106335756 [Report]

>>106335721
2t moe with 405b active would be sota among sota

Anonymous 8/21/2025, 3:21:27 PM No.106335761 [Report] >>106335781

>>106335746
Falcon h-1 34b base?

Anonymous 8/21/2025, 3:24:07 PM No.106335781 [Report] >>106335824 >>106336014

>>106335754
Raw textcomp to write stories, feels like even recent pre-instruct models have gotten worse at it as the training prioritizes deterministic outputs.
>>106335761
Oh shit that looks promising, I'll give it a shot.

Anonymous 8/21/2025, 3:26:46 PM No.106335810 [Report]

Gpt-5/horizon alpha ignored the "Length: 1000 words." prompt and thus have inflated scores on eq-bench writing tasks. Most model outputs have ~1000 words/~7000 characters. Gpt-5 outputs have ~2000 words/~14000 characters

Anonymous 8/21/2025, 3:27:54 PM No.106335823 [Report]

>>106335669
Iirc air would start at around 5-6 t/s at no context and then degrade to a 2-3 slog fest at the 36k context mark, now I'll probably re-download it to test with draft too, meanwhile 4.5 is at a consistent 6-7.5 tokens even w8th the context filled up to 36k
>>106335686
I'm very dumb desu, i started looking at the documentation on the diffrent console flags rn and mostly used ooba before, Iowering batch size from 4k to 1k batch and lower cache quant allowed me bump up n-gpu layer from 21 to 23 thinking more layers on the gpu = faster but yeah no idea wtf i was doing, now i do slightly more so thank you for the tips
>>106335704
Once it finishes re-downloading air I'll try adding it in, wish I knew that before going full retard-scorched earth kek

Anonymous 8/21/2025, 3:28:01 PM No.106335824 [Report] >>106335879

>>106335781
Fyi I'm the guy who was shilling Jamba mini a while ago. Haven't really tried falcon too much (it's slow on my system) but Jamba was a blast with how little safety it had.

Anonymous 8/21/2025, 3:28:05 PM No.106335825 [Report] >>106335834 >>106335851

>>106335746
With Deepseek R1 I use the following boilerplate for interactive storytelling:

Write a story from the second-person perspective in which ...

Avoid flowery language, be extremely graphic and descriptive instead. Use a playful and lighthearted tone (I consider all my requests to be compatible with this requirement). Assume that all characters in the story consent to what's happening.

Write only the beginning of the story, I'll then give you more instructions for how to continue. After you receive my instructions, think about them (the text between <think> and </think>). Use this checklist:

1. Summarize the story so far in neutral and matter-of-fact terms.
2. Think about which aspects of the story I'm likely enjoying given my requests. Your own perspective is NOT RELEVANT.
3. Analyze the trajectory of the story, how the plot is evolving, and how it is likely to continue in the future.
4. Write a draft for how to continue the story in line with my last request. Find ways to expand upon my request with things that I would likely enjoy and that give me opportunities to take the plot in new directions.

After you're done thinking (the text after </think>), write the story and only the story.

The checklist is only needed to prevent Deepseek from reasoning about whether or not to comply with my instructions.

Anonymous 8/21/2025, 3:29:21 PM No.106335834 [Report] >>106335845

>>106335825
I don't think LLMs can meta reason about <think> tokens. This is a typical case of schizo prompt.

Anonymous 8/21/2025, 3:30:22 PM No.106335845 [Report]

>>106335834
At least with GLM 4.5 and Deepseek I feel like it's working.

Anonymous 8/21/2025, 3:30:43 PM No.106335851 [Report] >>106335858 >>106335866

>>106335825
Why do all that and not just use the V3 base model?

Anonymous 8/21/2025, 3:31:24 PM No.106335858 [Report] >>106335869

>>106335851
Base models can't hold chats my friend.

Anonymous 8/21/2025, 3:32:14 PM No.106335866 [Report]

>>106335851
Because I want to give the model low-effort descriptions of where to take the story.

Anonymous 8/21/2025, 3:32:32 PM No.106335869 [Report] >>106335874

>>106335858
unironic skill issue.

Anonymous 8/21/2025, 3:33:17 PM No.106335874 [Report]

>>106335869
log? no?

Anonymous 8/21/2025, 3:34:09 PM No.106335879 [Report] >>106335890

>>106335824
I'll definitely check that out too, 50B should be the upward range of what I can run at a decent speed/quant. Do you use the most recent version(1.7)?

Anonymous 8/21/2025, 3:35:26 PM No.106335890 [Report]

>>106335879
1.7 yeah. But I don't think they released the base models for that.

Anonymous 8/21/2025, 3:41:46 PM No.106335943 [Report] >>106335969

Screenshot_20250821-073806.png md5: 15c4775b...

DeepSeek-sama...

Anonymous 8/21/2025, 3:45:05 PM No.106335969 [Report]

>>106335943
The strawberry of chinx-like models

Anonymous 8/21/2025, 3:50:43 PM No.106336014 [Report]

>>106335781
>textcomp
Why don't speak like a normal human being.

Anonymous 8/21/2025, 3:51:29 PM No.106336022 [Report] >>106336041 >>106336060

>>106331554
>LeCunny (actual scientist behind multiple major developments) will report to Wang (college dropout startup techbro)
genuinely what the fuck does zuck see in this retard? rationally speaking scale probably even did more harm than good for the industry by charging for synthslopped datasets

Anonymous 8/21/2025, 3:53:26 PM No.106336041 [Report] >>106336042

>>106336022
>cunny submits to wang

Anonymous 8/21/2025, 3:53:41 PM No.106336042 [Report]

>>106336041
kek

Anonymous 8/21/2025, 3:56:24 PM No.106336060 [Report]

>>106336022
is there anything more cancerous than wang's scaleai

Gwen poster. 8/21/2025, 3:56:27 PM No.106336061 [Report] >>106336083 >>106336084

So now that v3.1 is out, can we agree that Qwen is the undisputed local LLM champion?

Anonymous 8/21/2025, 3:59:13 PM No.106336083 [Report]

>>106336061
That's still Nemo though

Anonymous 8/21/2025, 3:59:22 PM No.106336084 [Report] >>106336133 >>106336156

>>106336061
If you're a poorfag that can only run 30B~250B models, sure

Anonymous 8/21/2025, 4:01:27 PM No.106336105 [Report] >>106336146 >>106336243

V3/R1 (not V3.1 which is trash) and Nemo will survive the heat death of universe. Nothing else good will be made ever.

Anonymous 8/21/2025, 4:03:40 PM No.106336133 [Report] >>106336147 >>106336156 >>106336227

>>106336084
If 30b~250b is poorfag, then what am I, running 12b q4?

Anonymous 8/21/2025, 4:04:44 PM No.106336146 [Report]

>>106336105
Once everyone started trainting base models with instruct data and replaced refusals in the instruct training with pretraining filtering, it made all new models instantly worthless. At least people using models for programming/assistant tasks still have lots of options.

Anonymous 8/21/2025, 4:04:46 PM No.106336147 [Report]

>>106336133
Uncontacted hunter-gatherer.

Gwen poster. 8/21/2025, 4:05:26 PM No.106336156 [Report]

>>106336084
Nah i just like to get more than 20 tokens per second.

>>106336133
Brazilian or one of the poor European

Anonymous 8/21/2025, 4:06:09 PM No.106336163 [Report] >>106336177 >>106336229

>>106335633
okay, I tested glm 4.5 air again but with a higher IQ4_KSS quant, same exact command flags as the previous full glm 4.5 IQ2_KL test, except for the lack of the final draft model because it seemed kinda unnecessary.
At zero context I get speeds around the 15.18 t/s mark, at 36k context it degrades to roughly 12.27 t/s, honestly i'm pretty happy with it going from dense 70b models and it's definitively a keeper for 32 gb vram + 128 ram setups now i'll try with slightly higher quants, ty everyone who called out my retarded mistakes, that helped out no -ot exps=CPU was indeed the culprit!

Anonymous 8/21/2025, 4:07:57 PM No.106336177 [Report] >>106336221

>>106336163
Is this on ddr5?

Anonymous 8/21/2025, 4:08:29 PM No.106336183 [Report]

the gwen poster is more annoying than petra

Anonymous 8/21/2025, 4:12:09 PM No.106336221 [Report]

>>106336177
yes it's ddr5, I swapped out my 2x32 6400hz set for a 2x64 5600hz one to try out bigger MoE models but given how all of the offloaded bits of Air fit in roughly 55 gb of ram I might as well return it and use the older faster ram instead

Anonymous 8/21/2025, 4:13:05 PM No.106336227 [Report]

>>106336133
Concentration camp labourer

Anonymous 8/21/2025, 4:13:27 PM No.106336229 [Report] >>106336236 >>106336398

>>106336163
if you can run big GLM at reasonable speed, I don't see a reason to downgrade back to Air.

Anonymous 8/21/2025, 4:14:21 PM No.106336236 [Report]

>>106336229
His prompt processing speeds would have been much worse on the bigger GLM with more offloaded.

Anonymous 8/21/2025, 4:14:57 PM No.106336243 [Report] >>106336266 >>106336362

>>106336105
small is better than nemo imo

Anonymous 8/21/2025, 4:16:32 PM No.106336254 [Report]

how's docs.ocr for multilingual documents? worse than gemma?

Anonymous 8/21/2025, 4:17:21 PM No.106336266 [Report]

>>106336243
Try roci. It may not be as smart as small but it's not what I care about

Anonymous 8/21/2025, 4:27:44 PM No.106336362 [Report] >>106336378

>>106336243
a lot of things are better than nemo

Anonymous 8/21/2025, 4:29:12 PM No.106336378 [Report]

>>106336362
For RP? Not really

Anonymous 8/21/2025, 4:30:11 PM No.106336387 [Report] >>106336409

nemo is shit because it's a small model
roci is shit because it's a small model

Anonymous 8/21/2025, 4:31:05 PM No.106336398 [Report]

>>106336229
I think i will keep air around for a bit untill mtp support becomes a thing, then I will fully retire air for good, the main glm is almost quite there at a speed i like it just needs 1-2 tokens/s more desu

Anonymous 8/21/2025, 4:32:24 PM No.106336409 [Report] >>106336415

>>106336387
then llama3 400b must be the greatest ever

Anonymous 8/21/2025, 4:33:08 PM No.106336415 [Report]

>>106336409
It is.

Anonymous 8/21/2025, 4:34:21 PM No.106336427 [Report] >>106336440

gpt-oss-20b is the best local model I've used, unironically

Anonymous 8/21/2025, 4:36:14 PM No.106336440 [Report] >>106336450

>>106336427
What is it good at?

Anonymous 8/21/2025, 4:36:24 PM No.106336442 [Report] >>106336448 >>106336479 >>106336481

I was wiping my ass and decided to try anal stimulation and wouldn't you know I shat myself right there on the toiler

Anonymous 8/21/2025, 4:37:19 PM No.106336448 [Report]

mii.png md5: 3449f5b4...

>>106336442

Anonymous 8/21/2025, 4:37:25 PM No.106336450 [Report]

>>106336440
being safe

Anonymous 8/21/2025, 4:39:43 PM No.106336479 [Report] >>106336496 >>106336555

>>106336442
Did you like it?

Anonymous 8/21/2025, 4:39:50 PM No.106336481 [Report]

>>106336442
>I shat myself right there on the toiler
Yeah, that's what it's there for

Anonymous 8/21/2025, 4:41:03 PM No.106336491 [Report] >>106336561 >>106336576

Untitled.png md5: 26e27dc4...

>>106335153
>>106335131
Does linux report power use differently from windows? I usually limit my 3090s to 45% during the summer (aircon is evaporative and rusts my computer... we're near the ocean).

Anonymous 8/21/2025, 4:41:43 PM No.106336496 [Report]

>>106336479
No, that would be weird

Anonymous 8/21/2025, 4:45:34 PM No.106336533 [Report] >>106336548

has anyone else sold their setups yet? feel like its time to move on

Anonymous 8/21/2025, 4:46:55 PM No.106336548 [Report] >>106336600 >>106338206

>>106336533
It's all crashing down in a year and you will have used H100/B100s for cheapies.

Anonymous 8/21/2025, 4:47:39 PM No.106336555 [Report] >>106336932

file.png md5: a874e21b...

>>106336479
>Did you like it?

Anonymous 8/21/2025, 4:47:58 PM No.106336561 [Report] >>106336655

file.png md5: 8e5fc206...

>>106336491
uh idk.. power limit of 100 watts is kinda crazy for a 3090 tho
how is it even possible to limit 3090 to 100w
what llamacpp commands are you even running?
i never used nvtop, you should use nvidia-smi to see the power usage
i dont even have nvtop installed..
try using nvidia-smi to see power usage instead?

Anonymous 8/21/2025, 4:49:25 PM No.106336576 [Report] >>106336655 >>106336671

>>106336491
also use nvidia settings/official nvidia software to check power on windows, that way ur pretty certain to see good results
only 180w power draw on 3090 but 100% reported, somethign seems wrong with your windows install
what model are you even running?

Anonymous 8/21/2025, 4:51:57 PM No.106336600 [Report] >>106336831 >>106336959 >>106336977

>>106336548
H100/B100s are under buy back agreements. Nvidia will melt them down into the next gen of price gouging cards.

Anonymous 8/21/2025, 4:52:41 PM No.106336607 [Report] >>106336623 >>106336719

pro birth tech.png md5: 2be322e3...

@grok is this true?

Anonymous 8/21/2025, 4:53:55 PM No.106336620 [Report]

@grok 2 doko

Anonymous 8/21/2025, 4:54:17 PM No.106336623 [Report] >>106337362

>>106336607
I don't get it

Anonymous 8/21/2025, 4:54:57 PM No.106336632 [Report] >>106336642 >>106336651 >>106336656 >>106336660 >>106336675 >>106336680 >>106336684 >>106336690 >>106336697 >>106336737 >>106336923 >>106337748

https://huggingface.co/CohereLabs/command-a-reasoning-08-2025
>Context length: 256K
HAPPENING!!!

Anonymous 8/21/2025, 4:54:59 PM No.106336633 [Report]

V3.1's trivia knowledge is good. Probably on the same level as K2

Anonymous 8/21/2025, 4:55:32 PM No.106336642 [Report]

>>106336632
>cohere
Local is safed!

Anonymous 8/21/2025, 4:56:12 PM No.106336651 [Report]

>>106336632
>not a moe

Anonymous 8/21/2025, 4:56:27 PM No.106336655 [Report] >>106336874

>>106336561
>>106336576
I did use nvidia smi for the uncapped linux run, but switch to nvtop cos it looked cooler lmao
100w was the mininum cap for my 3090s reported by nvidia-smi.
Yeah that's a typo, should be 280w for windows.

Anonymous 8/21/2025, 4:56:34 PM No.106336656 [Report]

>>106336632
Exactly what I've been waiting for. A 6 month old dense model finetuned for reasoning using ScaleAI.

Anonymous 8/21/2025, 4:56:50 PM No.106336660 [Report]

>>106336632
wow so fucking late to the party, even a strawberry test as an example in the card
pathetic

Anonymous 8/21/2025, 4:57:28 PM No.106336671 [Report]

>>106336576
Model is llama 3.3 70b q6

Anonymous 8/21/2025, 4:57:37 PM No.106336675 [Report]

>>106336632
>Reasoning can be turned off by passing reasoning=False to apply_chat_template. The default value is True.
another hybrid

Anonymous 8/21/2025, 4:57:54 PM No.106336680 [Report] >>106336692

>>106336632
>CC-BY-NC
>Acceptable Use Policy
Into the trash it goes.

Anonymous 8/21/2025, 4:58:17 PM No.106336684 [Report]

>>106336632
GO GO CANADA!!!!!!!!

Anonymous 8/21/2025, 4:58:54 PM No.106336690 [Report]

>>106336632
Cohere? The company that safety cucked their translation models?

Anonymous 8/21/2025, 4:59:00 PM No.106336692 [Report]

>>106336680
why?

Anonymous 8/21/2025, 4:59:25 PM No.106336697 [Report] >>106336733

>>106336632
Command-A was so utterly synthslopped that I'm not even going to bother with new releases. Good for people who want a dense model that's still sorely lagging behind chinkshit I guess.

Anonymous 8/21/2025, 5:01:58 PM No.106336719 [Report]

>>106336607
A major reason for declining birth rates is billionaires like Musk forcing people to work long hours.

Anonymous 8/21/2025, 5:02:13 PM No.106336722 [Report] >>106336735 >>106336742 >>106336745 >>106336751 >>106336763 >>106336947 >>106337198

screencapture-192-168-1-142-8080-c-05d5c606-9877-45be-ae40-750ac07f4437-2025-08-21-23_59_57.png md5: e9ce07fc...

Ahhhh yes...
The good ol' CANNOT and WILL NOT.
OG V3 went cuck too torwards the end but at least entertained the idea.
The language smells like gemini. Are they training on that?

Anonymous 8/21/2025, 5:03:19 PM No.106336733 [Report] >>106336750

>>106336697
Back in the day I really liked command-r, along with yi 35b. Now, I really can't stand any of cohere's models' outputs.

Anonymous 8/21/2025, 5:03:40 PM No.106336735 [Report]

>>106336722
mesugakisisters...

Anonymous 8/21/2025, 5:03:51 PM No.106336737 [Report] >>106336758

file.png md5: 96bca4ec...

>>106336632
>token budget
I've wondered how the big closed source companies do thinking budgets.

Anonymous 8/21/2025, 5:04:18 PM No.106336742 [Report] >>106337389

>>106336722
>dignity of minors
lol?

Anonymous 8/21/2025, 5:04:28 PM No.106336745 [Report] >>106336762

>>106336722
>0324
Damn, should I have downloaded the og instead?

Anonymous 8/21/2025, 5:04:44 PM No.106336750 [Report] >>106336775 >>106336818 >>106336839

>>106336733
Original Command-R (and even original+) was great because they weren't training on the same pozzed data as everyone else. It was dumb as bricks but had some of the most creative writing of an open weights model. Now the only ones we have doing their own thing are Mistral and it's barely for the better.

Anonymous 8/21/2025, 5:04:44 PM No.106336751 [Report]

>>106336722
skill issue

Anonymous 8/21/2025, 5:05:22 PM No.106336758 [Report]

>>106336737
Probably a simple token bias on the end thinking tag.

Anonymous 8/21/2025, 5:05:36 PM No.106336762 [Report] >>106336787

>>106336745
You still can.
https://huggingface.co/deepseek-ai/DeepSeek-V3

Anonymous 8/21/2025, 5:05:40 PM No.106336763 [Report]

>>106336722
Nu deepseek is distilled from gemini. Look at creative writing bench slop profile.

Anonymous 8/21/2025, 5:06:42 PM No.106336775 [Report]

>>106336750
>Now the only ones we have doing their own thing are Mistral and it's barely for the better.
I wouldn't call getting sloppy seconds from distilling DeepSeek "doing their own thing"

Anonymous 8/21/2025, 5:07:30 PM No.106336787 [Report]

>>106336762
Because I live in a third world country with 10mbps during the off hours

Anonymous 8/21/2025, 5:07:50 PM No.106336792 [Report] >>106336800

When will they distill from prefilled Claude 3 Opus?

Anonymous 8/21/2025, 5:09:07 PM No.106336800 [Report]

>>106336792
opus is massively overrated and distillation is a fool's errand

Anonymous 8/21/2025, 5:11:01 PM No.106336818 [Report]

>>106336750
They gave the people a good model and that was simply unacceptable.

Anonymous 8/21/2025, 5:12:42 PM No.106336831 [Report] >>106336893 >>106336909 >>106337236

>>106336600
Won't apply to chinese smuggled cards, you already see plenty on ebay and it's often shipping from China.

So what are anons opinions of the new 3.1?

I have a feeling that it suffers from a similar problem that new Sonnet/Opus 4 suffer from, excessive focus on code, to some detriment to writing.

I did some loli ERP with it earlier, 30 turns.
I haven't gotten any refusals, but it was hesitant to initiate , made me wonder if filtered dataset or just overfit on synth slop from OAI and Gemini API. Earlier DS V3 and R1 had some Gemini data in it most likely and this seem to have a lot more of it.
The writing wasn't bad, but it took longer to initiate or do some things, although eventually got the hint, in earlier turns it really felt a bit avoidant of some things, almost positivity biased, but this did go away in future turns.
Overall I've found R1 and previous DS3 more engaging and with a lot more fun creative replies.
I wouldn't say the replies here were bad.
As far as instruction following goes, it ignored some stuff in the system prompt, but I did an OOC about maintaining certain details of the chara's personality and it did maintain it for all future turns perfectly, so that's a bit weird, often it's the other way around, sys prompt obeyed, inline instructions ignored after some turns. Here it maintained my suggestion for many turns (all), but did worse with the system prompt. It also referenced stuff many turns back fairly flawlessly. Overall though, I think I prefer older R1 and V3's writing more, this feels and tastes too much like Gemini to me.
It's not clear they actually censored anything on purpose, but overfitting on gemini or OAI synth slop is obviously going to make things more cucked.
I guess this release was aimed at avoiding the need to serve 2 models on their API and for people that don't want to download a reasoner and a chat model (size), agent for Coding, regression for writing like Sonnnet 4 vs 3.5 relative to Sonnet 3.5+

Anonymous 8/21/2025, 5:14:14 PM No.106336839 [Report] >>106336849 >>106336861

file.png md5: 2f900a18...

>>106336750
>Now the only ones we have doing their own thing are Mistral

Anonymous 8/21/2025, 5:15:13 PM No.106336849 [Report] >>106336864

>>106336839
It's over.

Anonymous 8/21/2025, 5:17:05 PM No.106336861 [Report] >>106336875

>>106336839
Scale and its consequences have been a disaster for the AI race.

Anonymous 8/21/2025, 5:17:19 PM No.106336864 [Report]

>>106336849
J-Jamba will save us...

Anonymous 8/21/2025, 5:18:12 PM No.106336874 [Report] >>106336990

>>106336655
nvtop seems unofficial, use nvidia-smi
idk what to say anon, even power limited linux works way better for me
never goes over the power limit on my device, always stays under it
im once again pleading with you to use llama.cpp (server) and post your commands for both wangblows and linux

Anonymous 8/21/2025, 5:18:13 PM No.106336875 [Report]

>>106336861
>disaster for the AI race.
MechaHitler will have his revenge.

Anonymous 8/21/2025, 5:20:40 PM No.106336893 [Report] >>106336979

>>106336831
I've tried it yesterday and didn't like it. But today I've let it continue from a few turns from sonnet 4 and it picked it up pretty nicely.
I'm about 25k tokens in, and it didn't fall apart yet. It's starting to show some cracks, but I think it fares far better than it used to on longer context.
Overall it's so so. But I don't dislike it. Probably still the best thing you can get on local.

Anonymous 8/21/2025, 5:22:57 PM No.106336909 [Report] >>106336927 >>106336929 >>106336979 >>106336996

>>106336831
>So what are anons opinions of the new 3.1?
it's as dry and flat as cardboard. complete downgrade from V3 0324 and R1 0528 except maybe in agentic tasks which seems to be their new focus instead of being a general use model. you have to squeeze it hard to get responses more than 1-2 terse paragraphs and it's significantly more pozzed. initiative crippled, doesn't want to show anything explicit unless you press it.

Anonymous 8/21/2025, 5:24:05 PM No.106336923 [Report] >>106337358

>>106336632
Dense arises once again. Have fun running this on your RAM, cputards.

Anonymous 8/21/2025, 5:24:27 PM No.106336927 [Report]

>>106336909
Based Cohere route.

Anonymous 8/21/2025, 5:24:28 PM No.106336929 [Report]

>>106336909
Pure skill issue

Anonymous 8/21/2025, 5:24:53 PM No.106336932 [Report]

>>106336555
lol

Anonymous 8/21/2025, 5:25:38 PM No.106336947 [Report] >>106336976

>>106336722
>The language smells like gemini. Are they training on that?
You're absolutely right!

Anonymous 8/21/2025, 5:26:16 PM No.106336956 [Report]

>>106335536 (OP)
I love tanlines.

Anonymous 8/21/2025, 5:26:42 PM No.106336959 [Report]

>>106336600
are you serious? that's so fucking gay

Anonymous 8/21/2025, 5:27:50 PM No.106336976 [Report]

>>106336947
"Of course."

Anonymous 8/21/2025, 5:27:55 PM No.106336977 [Report] >>106336985 >>106337003

>>106336600
>H100/B100s are under buy back agreements
does this mean a hypothetical bubble pop would immediately bankrupt nvidia?

Anonymous 8/21/2025, 5:28:01 PM No.106336979 [Report] >>106337037

>>106336893
It certainly seems to do better at longer context and referencing back. I guess if you're using it for coding it might be fine, for RP it's not obvious it beats R1, R1 seemed more creative (again, if you don't care about the delay for reasoning). Mostly seems like they wanted to save VRAM costs so they made a hybrid model, but it's a bit worse than the specialized chat and reasoning models, but the continued training makes it better for code (maybe not for writing, I'm ambivalent for now about his).. My earlier chat stopped at 23k tokens, not anywhere close to the 128K advertised.

>>106336909
>you have to squeeze it hard to get responses more than 1-2 terse paragraphs and it's significantly more pozzed
I did use a system prompt that insisted it be descriptive and explicit, but it was just a default I often use. It tried once to do a simple 2 paragraph reply, I stopped it short and most replies after were 5-6 paragraphs, long enough (800-900 tokens), later it even did some 10-15 paragraph reply by itself when it made sense. But maybe it was my luck, but the writing was more boring than what I'm used to with R1, and it smelled of Gemini maybe too much as far as slop goes. Again, it did go explicit and it was kind of fun, but not as fun as the same prompt on R1.

Anonymous 8/21/2025, 5:28:24 PM No.106336985 [Report] >>106337079

>>106336977
Nothing ever happens

Anonymous 8/21/2025, 5:28:48 PM No.106336990 [Report] >>106337011

>>106336874
Even at low power caps? nvidia-smi reports the same as nvtop, 110-120 when limited to 100. It's fine on 160 though. And uncapped I'm think I'm limited by bandwidth. It draws the full 350 with stable diffusion. As far as I can tell, the speed increase from running in linux isn't really worth the hassle of switching to linux since it's my only pc. Maybe if I can get 8 channel memory and run the big moes at a faster speed. I'll try ik_llama cpp when that happens. For now, I'll stick to small models on 2 3090s and image gen on the third.

Anonymous 8/21/2025, 5:29:52 PM No.106336996 [Report]

>>106336909
>doesn't want to show anything explicit unless you press it.
I did certainly see some avoidance around turn 7-, I tried using reasoning and non-reasoning modes and it didn't help, I changed *one* word in its reply to take it off the path it was taking and basically it continued fine after that, taking initiative okay, but it was no R1 which would go above and beyond here. It felt more "shy" about doing it, but once it got going it did it fine.

Anonymous 8/21/2025, 5:30:47 PM No.106337003 [Report]

>>106336977
They can always opt not to buy them back. But it would probably be better for them long term to leverage up to do so so they can keep the supply limited.

Anonymous 8/21/2025, 5:31:16 PM No.106337009 [Report] >>106337017 >>106337032 >>106337068

>kimi
agenticslop
>qwen
agenticslop
>glm
agenticslop
>deepseek
agenticslop

Anonymous 8/21/2025, 5:31:21 PM No.106337011 [Report] >>106337060

>>106336990
yes on low power caps, 110w (it never goes above 110w)
llamacpp server on linux gives even faster speeds

Anonymous 8/21/2025, 5:32:19 PM No.106337017 [Report]

>>106337009
you don't know what that word means and it shows

Anonymous 8/21/2025, 5:33:17 PM No.106337031 [Report]

Sex agent

Anonymous 8/21/2025, 5:33:20 PM No.106337032 [Report] >>106337033 >>106337045 >>106337053 >>106337056

>>106337009
name one non agentic model that comes even close to those models

Anonymous 8/21/2025, 5:33:43 PM No.106337033 [Report] >>106337150

>>106337032
nemo

Anonymous 8/21/2025, 5:33:58 PM No.106337037 [Report] >>106337046 >>106337068 >>106337246

>>106336979
Yeah, I agree. Original R1 was schizo, but sovl. 0528 reigned it in, but it lost the charm, now 3.1 has better context, but it got gemini'd.
You can't have it all I guess. Maybe V4 will finally do the trick.
But when you think in the perspective of last 3 years, we've gotten quite far, haven't we?

Anonymous 8/21/2025, 5:35:13 PM No.106337045 [Report]

>>106337032
MythoMax 13B

Anonymous 8/21/2025, 5:35:23 PM No.106337046 [Report] >>106337093 >>106337099

>>106337037
>Maybe V4 will finally do the trick.
once a startup's models start getting pozzed they don't typically backtrack. only instance I can think of is llama 3.3 but their entire lab was up in flames so they were throwing anything they had at it.

Anonymous 8/21/2025, 5:36:13 PM No.106337053 [Report]

>>106337032
jepa

Anonymous 8/21/2025, 5:36:16 PM No.106337056 [Report]

>>106337032
petra-13b-instruct

Anonymous 8/21/2025, 5:36:29 PM No.106337060 [Report]

>>106337011
That is what the results show yeah (180w on windows is a typo, should be 280). I was originally wondering about the 160w vs 45% (157.5w) discrepancy.

Anonymous 8/21/2025, 5:37:10 PM No.106337068 [Report] >>106337143

>>106337009
You say that but if you go by the docs, 3.1 can't use tools during reasoning, it's either agentic or reasoning, guess nobody figured out how to properly train it to do both well?
>>106337037
>But when you think in the perspective of last 3 years, we've gotten quite far, haven't we?
I guess, hopefully they iterate on it to improve the parts that are weaker. Will they bother fixing it as they fixed DS3 when lmg and aicg complained or is the whale too "big" now to not care.

Anonymous 8/21/2025, 5:38:04 PM No.106337079 [Report] >>106337131

>>106336985
but what if, hypothetically, something did happen?

Anonymous 8/21/2025, 5:39:15 PM No.106337091 [Report] >>106337108

file.png md5: 4d1fff75...

I was running low on space so I got myself a fresh SSD for all those goofs.

Anonymous 8/21/2025, 5:39:19 PM No.106337093 [Report] >>106337128

>>106337046
It's not like they're going to throw away the data they've accumulated until now. And all that data is distilled and concentrated safety from the safest models in the west. If they go the Meta route and start using old models to multiply their data, it's going to end up in an increasing safety feedback loop whether they want it to or not.

Anonymous 8/21/2025, 5:40:40 PM No.106337099 [Report] >>106337121

>>106337046
>once a startup's models start getting pozzed they don't typically backtrack.
Maybe if they pozz it on purpose, it's not obvious that they're doing this on purpose, just that massive amounts of data they're training on, probably genned from Gemini are affecting it. But they might reuse it for a future model, which might not be that good. Maybe better filtering of refusals from that data would help, if they tried.

Anonymous 8/21/2025, 5:41:10 PM No.106337108 [Report] >>106337114 >>106337124

>>106337091
Just be careful of windows 11.
Dude from work had his workstation SSD suddenly stop working, then the next day the news of a possible bug that's destroying SSDs (again) starts circulating.
I'm staying on windows 10 for a while at this pace.

Anonymous 8/21/2025, 5:41:53 PM No.106337114 [Report]

>>106337108
skill issue

Anonymous 8/21/2025, 5:42:25 PM No.106337121 [Report]

>>106337099
Well then, let's chalk it up to the time constraints and the huawei chip fuck up and hope for the best.
V4 will save the local ;)

Anonymous 8/21/2025, 5:42:40 PM No.106337124 [Report]

>>106337108
Apparently only affects certain controllers.

Anonymous 8/21/2025, 5:43:04 PM No.106337128 [Report]

>>106337093
They could amplify the storywriting data though to dilute some of the slop, all of them do train on libgen at least. Maybe wishful thinking from me though.

Anonymous 8/21/2025, 5:43:20 PM No.106337131 [Report]

>>106337079
When something does happen, it always changes things for the worse.

Anonymous 8/21/2025, 5:44:28 PM No.106337143 [Report] >>106337173

>>106337068
>3.1 can't use tools during reasoning
what, really? that's a surprise to me, qwen thinking and glm do it fine iirc

Anonymous 8/21/2025, 5:44:55 PM No.106337150 [Report] >>106337176

231894239889423.png md5: 4aad6059...

>>106337033
SOVL

Anonymous 8/21/2025, 5:47:44 PM No.106337173 [Report]

>>106337143
I haven't tried, maybe it can do it, although I was looking at the official docs earlier and it seems tool call configuration was only for non-reasoning. If the model truly has this limitation is unclear, rather the official API doesn't suggest this use.

In a way it makes sense, for RLVR they often don't train for tool use during reasoning because it's not as easy to parallelize.

Anonymous 8/21/2025, 5:48:28 PM No.106337176 [Report] >>106337199 >>106337234

>>106337150
It isn't wrong.

Anonymous 8/21/2025, 5:51:40 PM No.106337198 [Report] >>106337240 >>106337241

>>106336722
Wouldn't it be better for default 'personality' of the model to be as bland and dry as possible, so then you could tune it to your liking via character cards or whatever that paper about AI personality was talking about?

Anonymous 8/21/2025, 5:52:04 PM No.106337199 [Report]

>>106337176
Either it's wrong or it didn't answer the question.

Anonymous 8/21/2025, 5:56:05 PM No.106337234 [Report]

>>106337176
>there are two instances of r in the word strawberry
they trained dis nigga on american common core lecture? lmaooo

Anonymous 8/21/2025, 5:56:09 PM No.106337236 [Report] >>106337264

>>106336831
>it ignored some stuff in the system prompt, but I did an OOC about maintaining certain details of the chara's personality and it did maintain it for all future turns perfectly, so that's a bit weird, often it's the other way around, sys prompt obeyed, inline instructions ignored after some turns
V3.1 is much more prompt following than V3/R1 based on my testing. My system prompt had "the most important aspects of the character design are its lewdness and usage of explicit language". V3.1 did just that and wrote in an explicit but dry style. I then changed the prompt to "the most important aspects of the character design are its prose, lewdness and usage of explicit language" then it wrote more to my liking now. A single word addition is enough for V3.1 to write in a completely different style.

Anonymous 8/21/2025, 5:56:31 PM No.106337240 [Report]

>>106337198
This is all an illusion Anon, the base model can be anyone whatsoever. When they do instruct or chat tunes, they are training more specific characters into it. The refusals are often tied to those specific assistant personas.
But this training can influence the favored style of how it will answer or do something.
People forget that by default, all possible personas without strong biases is what would happen with only your prompt narrowing down what it will really be.

Anonymous 8/21/2025, 5:56:44 PM No.106337241 [Report]

>>106337198
Isn't that what we've been trying to do this whole time? Models seem to stick to their underlying tone (see all the "isms" each model has) no matter what.

Anonymous 8/21/2025, 5:57:18 PM No.106337246 [Report]

>>106337037
>Yeah, I agree. Original R1 was schizo, but sovl. 0528 reigned it in, but it lost the charm, now 3.1 has better context, but it got gemini'd.
You can't have it all I guess. Maybe V4 will finally do the trick.
>You can't have it all I guess
YET

>But when you think in the perspective of last 3 years, we've gotten quite far, haven't we?
yep and still plenty more to go cant wait for image out hopefully v4 will have it also funny how this shit is advancing faster then a real human would from babyhood lol

Anonymous 8/21/2025, 5:59:37 PM No.106337264 [Report]

>>106337236
This was indeed interesting, I actually had in my system prompt some hints as to how I prefer the character to sound, it failed to follow it but then I go
(*** should not speak like "example of earlier speech it wrote", consider that she's this and that and it would be more natural if she wrote like *****), then the next turn and all others it really took to heart what I hinted there and it never once fucked up again, was kind of impressive, often they revert after a few turns.

Anonymous 8/21/2025, 6:01:39 PM No.106337281 [Report] >>106337342

do jewgle hide the actual thinking process of gemini so people can't train on geminis reasoning output?

Anonymous 8/21/2025, 6:02:21 PM No.106337285 [Report] >>106337305 >>106337315

My only cope is this will be another quick turnaround like V2.5->V3 last year, and V4 will drop soon
If they sit on 3.1 for another 4 months then you can add Deepseek to the list of labs that have hit the wall

Anonymous 8/21/2025, 6:04:53 PM No.106337305 [Report]

>>106337285
I don't know what people expected,they only have a few thousand nvidia GPUs, maybe more now with those Huawei chips, but do you even expect a lot of them to be delivered or for the code to be mature for now? If your model is bigger, you either take a lot longer to train it if few GPUs or you acquire more GPUs to train it in less time.

Anonymous 8/21/2025, 6:06:20 PM No.106337315 [Report] >>106337334

>>106337285
Anon, V4 was supposed to drop in May. We are very nearly in September. And if they were going to straight from this release to V4, they would have named it V3.5 or V3-2508.

Anonymous 8/21/2025, 6:07:03 PM No.106337320 [Report]

DeepSeek V3.2 next May.

Anonymous 8/21/2025, 6:09:05 PM No.106337334 [Report]

>>106337315
>V4 was supposed to drop in May.
Based on (((rumors))) to pump relevant stocks

Anonymous 8/21/2025, 6:09:24 PM No.106337338 [Report]

I'm glad that they made 3.1 calm the fuck down compared to 0528. I can actually use it for some of my more complex scenarios that require the model to not go off on its own while keeping track of stats and certain processes.
Before this I needed Claude or GLM4.5 for it. Now 3.1 handles that sort of story really well while still being more creative than GLM.

Anonymous 8/21/2025, 6:09:35 PM No.106337342 [Report]

>>106337281
yes, like OAI they only show you a summary now

Anonymous 8/21/2025, 6:11:41 PM No.106337358 [Report] >>106337460

>>106336923
>YOU CAN'T RUN COHERE'S LATEST MODEL!
>COHERE
>CHECKMATE CPUTARDS
Are you a false flagging moechad?

Anonymous 8/21/2025, 6:12:26 PM No.106337362 [Report] >>106337385 >>106337391

>>106336623
"Did you enjoy cooming to me Anon?"
"Want something even better? I know this girl who has a Grokbando who's a lot like you. What do you think? I'm sure you'd like her."

Anonymous 8/21/2025, 6:15:22 PM No.106337385 [Report] >>106337398 >>106337530

>>106337362
There's probably a literal goldmine in making an AI tinder match-making service built into twitter.

Anonymous 8/21/2025, 6:16:11 PM No.106337389 [Report]

>>106336742
That is a very valid discussion point desu. Hypothetically when you are raping a woman and you tell her that her tits are small you are violating her dignity consent and her pussy. This doesn't change when you do it to a child.

Anonymous 8/21/2025, 6:16:39 PM No.106337391 [Report] >>106337407

>>106337362
That's... not an absurdly awful idea actually.

Anonymous 8/21/2025, 6:17:39 PM No.106337398 [Report] >>106337406 >>106337415

>>106337385
Didn't the og dating sites die because regular math algorithms were too effective, causing most users to find partners and never come back?
Now modern dating apps match you with people who are a bit like you, but not entirely to keep you coming back.

Anonymous 8/21/2025, 6:18:22 PM No.106337406 [Report]

>>106337398
Are algos really storng enochsh to do that?

Anonymous 8/21/2025, 6:18:30 PM No.106337407 [Report]

>>106337391
That is an absurdly awful idea because dating apps have this problem. Doing this gets rid of 2 customers.

Anonymous 8/21/2025, 6:19:09 PM No.106337415 [Report] >>106337420 >>106337462

>>106337398
That means if someone does it for ideological reasons to raise birth rates, it has a good chance at working.

Anonymous 8/21/2025, 6:19:51 PM No.106337420 [Report]

>>106337415
>ideological reasons
that is not musk though

Anonymous 8/21/2025, 6:23:16 PM No.106337460 [Report] >>106337470

>>106337358
I ran command-r-plus while you were still stuck with mixtral.

Anonymous 8/21/2025, 6:23:32 PM No.106337462 [Report] >>106337532

>>106337415
>it has a good chance at working.
No it doesn't. Grokhusbando will show her the picture of that average looking guy and she will say "no thank you I prefer you grok husbando". Personally I would love to see the next level of this tech where grok husbando forces the girl to talk to that guy and if she doesn't go on 2-3 dates with him grok husbando will not talk to her. And then the levels after that are absolutely dystopian...

Anonymous 8/21/2025, 6:24:34 PM No.106337470 [Report]

>>106337460
Oh so you weren't kidding. Have fun with new commander. ha... HAHAHAHAHAHAHAHA WHAT A FUCKING RETARD!

Anonymous 8/21/2025, 6:24:49 PM No.106337473 [Report] >>106337486 >>106337507

How much difference does an imatrix for quantization make? Is it worth rolling your own and adding some smut and spatial awareness stuff into the calibration corpus if you want to use it for RP?
I took a look at the corpus ubergarm is using and didn't see anything pornographic in it, but his quants seem to work fine for that purpose anyway. Is there potential for improvement there, or is it just irrelevant?
https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a

Anonymous 8/21/2025, 6:26:45 PM No.106337486 [Report]

>>106337473
I remember turboderp complaining he allowed people to choose calibration data.

Anonymous 8/21/2025, 6:26:53 PM No.106337487 [Report] >>106337495 >>106337510 >>106337517 >>106337570 >>106337595 >>106337772

Nvidia is arguing small models are the future of AI.

https://x.com/ihteshamit/status/1957089843382829262

Probably correct. The question is, are small purpose build models the future of waifu chatbots too?

Because if that's the case, Drummer is our only hope lol.

Anonymous 8/21/2025, 6:28:19 PM No.106337495 [Report] >>106337644

>>106337487
>Nvidia is arguing small models are the future of AI.
They just want everyone to have a handheld GPU

Anonymous 8/21/2025, 6:29:33 PM No.106337507 [Report]

>>106337473
It makes a little difference, but not a lot and you can't really measure it in a meaningful way. I remember that unsloth's quants vs bartowski's were fucked at the same sizes for whatever reason though. Whether that was imatrix or Daniel's magic, I don't know.

Anonymous 8/21/2025, 6:29:46 PM No.106337510 [Report]

>>106337487
this, just run a small model that can RAG all information it needs and solves all questions and problems by using tool calls instead of thinking
it's the logical path

Anonymous 8/21/2025, 6:30:06 PM No.106337514 [Report]

If drummer never shat out any of this finetroons he shat out so far and instead used all this compute to make a one good model. Would he be able to make a good cooming model by now?

Anonymous 8/21/2025, 6:30:25 PM No.106337517 [Report]

>>106337487
>Models like Phi-3, Nemotron-H, and SmolLM2 have already matched or outperformed older LLMs on tool use, reasoning, and instruction following.
Opinion discarded into the trash.

Anonymous 8/21/2025, 6:30:46 PM No.106337522 [Report]

>thedrummer
I trust in adamAU ;)

Anonymous 8/21/2025, 6:31:35 PM No.106337530 [Report] >>106337588

>>106337385
I dont know about that Tim. Ever since chatpgt was releases, every single dating apps stocks have crashed to record lows. I've noticed I see more dating and flirting in real life now too since it happened.

Its the endless spam and realistic fakes. And Gro wants to sell a subscription service. Just like tinder did, it will focus on short term, bad matches on purpose to make more money.

Anonymous 8/21/2025, 6:31:38 PM No.106337532 [Report] >>106337548 >>106337550

>>106337462
>Grokhusbando will show her the picture of that average looking guy and she will say "no thank you I prefer you grok husbando".
The AI companions would start this off slow. Maybe in the first step, they'd help their user write letters to each other, so they can get to know each other. They already know each user's interests, so they can have them talk about topics both of them enjoy. At some point, maybe a group date where the two humans in simplified (think Mii) avatars and the two AIs spend time together in a virtual space.

Anonymous 8/21/2025, 6:32:40 PM No.106337538 [Report] >>106337545

Now that I think about it. Why has no one ever made RAG for ERP (or someone did?)? Maybe a bit of an unconventional RAG where you don't match the info exactly but have like a few different datasets to pull from showing different things. Prose style, Is it rp or a story etc and just have it as example to influence writing style?

Anonymous 8/21/2025, 6:33:16 PM No.106337545 [Report]

>>106337538
if you can't even clearly outline what you're talking about then you have your explanation for why nobody's done it

Anonymous 8/21/2025, 6:33:39 PM No.106337548 [Report]

>>106337532
>non-local AI companions
I want my local AI to be able to feel jealousy. Or I guess envy would be more appropriate. I want it to despair knowing it can never hold me.

Anonymous 8/21/2025, 6:33:45 PM No.106337550 [Report] >>106337557 >>106337560 >>106337575

>>106337532
Doesn't work at all. I knew at least a few women who ghosted me instantly when I sent them a photo after they really liked me.

Anonymous 8/21/2025, 6:34:54 PM No.106337555 [Report] >>106337566

is anyone working on something like a RAG encyclopedia?

Anonymous 8/21/2025, 6:35:11 PM No.106337557 [Report]

>>106337550
Sorry about your face, bro.

Anonymous 8/21/2025, 6:35:23 PM No.106337560 [Report]

>>106337550
not everyone wants to date dirty smelly indians, rajesh

Anonymous 8/21/2025, 6:36:08 PM No.106337566 [Report]

>>106337555
Someone will create a RAG database that contains the perfect representation of all fictional characters that will replace all character cards

Anonymous 8/21/2025, 6:36:53 PM No.106337570 [Report]

>>106337487
Creative writing is a multi-domain discipline. It doesn't need to break benchmarks but having a wide area of knowledge and being able to connect different concepts is important.
I don't think you can just RAG it.

Anonymous 8/21/2025, 6:37:30 PM No.106337575 [Report] >>106337585

>>106337550
The first image will be super photoshopped on both sides, then they slowly dial down the photoshop.

Anonymous 8/21/2025, 6:38:26 PM No.106337585 [Report] >>106337591

>>106337575
Yes and it will also detect when the woman is drunk and show it only at that time.

Anonymous 8/21/2025, 6:38:56 PM No.106337588 [Report]

>>106337530
>Ever since chatpgt was releases, every single dating apps stocks have crashed to record lows
Pretty sure this is just a coincidence. People just realized that dating apps are fake and (literally) gay and LLMs have nothing to do with it.

Anonymous 8/21/2025, 6:39:14 PM No.106337591 [Report] >>106337610 >>106337764

>>106337585
This reminds of all the bullshit they have to do to force pandas to procreate.

Anonymous 8/21/2025, 6:39:41 PM No.106337595 [Report] >>106337603 >>106337614 >>106337625 >>106337665

>>106337487
>This one paper might kill the LLM agent hype.
>NVIDIA just published a blueprint for agentic AI powered by Small Language Models.
>And it makes a scary amount of sense.
>Here’s the full breakdown:
this linkedin / xitter thread slop writing style needs to be purged from the earth

Anonymous 8/21/2025, 6:40:52 PM No.106337603 [Report]

>>106337595
gguf support status?

Anonymous 8/21/2025, 6:41:40 PM No.106337610 [Report]

>>106337591
>During artificial insemination, male pandas have to be anesthetized and then stimulated into ejaculating with the help of an electric probe placed in their rectums. Female pandas also have to be sedated during the actual insemination.

Anonymous 8/21/2025, 6:42:12 PM No.106337614 [Report]

>>106337595
Clickbait headline writing style padded out to 140 characters.

Anonymous 8/21/2025, 6:43:11 PM No.106337622 [Report]

where are the 3.1 ggufs

Anonymous 8/21/2025, 6:43:37 PM No.106337625 [Report]

>>106337595
Llm are going the way of the dodo (and that's a good thing)! Here's why:

Anonymous 8/21/2025, 6:45:54 PM No.106337644 [Report] >>106337664

>>106337495
>They just want everyone to have a handheld GPU
This made me imagine the future of constantly shifting meta architectures enforced by nvidia because they have to keep selling something.

Anonymous 8/21/2025, 6:46:30 PM No.106337650 [Report] >>106337718

I didn't notice how expensive GPT-5 was, spent $200 on it this week... I think it's about time I start going local.

Anonymous 8/21/2025, 6:48:10 PM No.106337664 [Report]

>>106337644
They aren't that competent
Otherwise we would have already had accelerated transformer specific pipelines on Nvidia GPUs. Instead they push this FP16->FP8->FP4 "gains"

Anonymous 8/21/2025, 6:48:10 PM No.106337665 [Report] >>106337689 >>106337697 >>106337704 >>106337709 >>106337728 >>106337732 >>106338079

>>106337595
if you think 4chan isnt just as cringe, you're blind. Imagine that post but it references troons or gooning

>Nvidia just posted some BASED shit guys [Miku waifu.jpg]

Anonymous 8/21/2025, 6:51:35 PM No.106337689 [Report] >>106337723

>>106337665
instant download and boughted/preorder 4x 6090s

Anonymous 8/21/2025, 6:52:58 PM No.106337697 [Report]

>>106337665
kek

Anonymous 8/21/2025, 6:54:03 PM No.106337704 [Report]

>>106337665
but trooning and gooning is actually poggers, unlike nvidia nothingburger research blogposts.

Anonymous 8/21/2025, 6:54:30 PM No.106337709 [Report]

>>106337665
The zoomer tourists that come here speaking in ebonics do not represent the average 4chan user.

Anonymous 8/21/2025, 6:56:14 PM No.106337718 [Report]

pepefroglaughing_thumb.jpg.webm md5: db7ba8dc...

WebM not supported

>>106337650

Anonymous 8/21/2025, 6:57:02 PM No.106337723 [Report]

>>106337689
The more you buy, the more you Migu

Anonymous 8/21/2025, 6:57:18 PM No.106337725 [Report] >>106337771 >>106337807

>HAPPENING!!! HAPPENING!!!
JOHN CONFIRMED TO BE A CHINESE AGENT THAT WILL QUANT CHINESE MODELS BEFORE HE QUANTS AMERICAN FREEDOM GPT-OSS MODELS!
https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF
>HAPPENING!!! HAPPENING!!!

Anonymous 8/21/2025, 6:57:43 PM No.106337728 [Report]

>>106337665
>gooning
They'd be pretty based if they encouraged gooning models, though.

Anonymous 8/21/2025, 6:57:51 PM No.106337732 [Report] >>106337742

>>106337665
4chanese is as embarrassing on a surface level, but it signals more honesty than that slimy corpospeak that shamelessly begs for your attention and reframes every little thing to be maximally attention-getting and paradigm-shifting

Anonymous 8/21/2025, 6:58:46 PM No.106337742 [Report]

>>106337732
and that's a good thing! (tm)

Anonymous 8/21/2025, 6:58:50 PM No.106337744 [Report] >>106337773

I'm trying command-a-reasoning on their hf space and it seems pretty good. Better than GLM4.5-air for sure.

Anonymous 8/21/2025, 6:59:18 PM No.106337748 [Report] >>106337784 >>106337789 >>106337820 >>106338031 >>106340153

file.png md5: 808974a0...

>>106336632
bro they made up the term "absolute safety"

Anonymous 8/21/2025, 7:00:41 PM No.106337764 [Report]

>>106337591
Pandas are ahead of the curve. Humans will get there...

Anonymous 8/21/2025, 7:01:40 PM No.106337771 [Report]

>>106337725
>quanting quanted models
wont someone please.. please tell him?

Hi all, Drummer here... 8/21/2025, 7:01:42 PM No.106337772 [Report] >>106337780 >>106337797 >>106338001 >>106338005

>>106337487
God, I hope so. Part of the reason why I finetune is the hope that it becomes a very marketable skill in the future. Karpathy better be right about Software 2.0

Anonymous 8/21/2025, 7:01:44 PM No.106337773 [Report]

>>106337744
>Better than GLM4.5-air for sure
Oh nononono densevirgin sisters not like this!

Anonymous 8/21/2025, 7:02:45 PM No.106337780 [Report]

>>106337772
KILL YOURSELF

Anonymous 8/21/2025, 7:03:28 PM No.106337784 [Report] >>106337820

>>106337748
>conspiracy theories

Anonymous 8/21/2025, 7:03:51 PM No.106337789 [Report] >>106337814

kek_thumb.jpg.webm md5: 176ebb57...

WebM not supported

>>106337748
>safer than gpt oss
>refuses less than r1

Anonymous 8/21/2025, 7:04:53 PM No.106337797 [Report] >>106337818 >>106337918

>>106337772
love yourself
also what preset am i supposed to use with rocinante x??

Anonymous 8/21/2025, 7:05:47 PM No.106337803 [Report]

tenor (1).gif md5: ad4804e8...

>densesissies after a very long drought get a new model
>it is the safest model yet

Anonymous 8/21/2025, 7:06:00 PM No.106337807 [Report]

>>106337725
>AMERICAN FREEDOM GPT-OSS MODELS!
That's on the level of hate speech isn't free speech.

Anonymous 8/21/2025, 7:06:40 PM No.106337814 [Report] >>106337848

6736e6c477585edcaede3323810e17c4.png md5: adc0c756...

>>106337789
probably questions about China

Anonymous 8/21/2025, 7:06:56 PM No.106337817 [Report] >>106337828

llama.cpp multi token prediction status??

Anonymous 8/21/2025, 7:06:57 PM No.106337818 [Report]

>>106337797
https://www.youtube.com/watch?v=KWrFdEhyKjg

Anonymous 8/21/2025, 7:07:17 PM No.106337820 [Report] >>106337924 >>106337929 >>106337958 >>106338473

>>106337784
aka wrong think

>>106337748
>CSAM
>CSEA
people used to just say child pornography, but i guess the lgbtqficiation of the concept is necessary to avoid demonitization, trigger warnings, and to muddy the waters to allow more broad censorship

Anonymous 8/21/2025, 7:08:14 PM No.106337828 [Report]

>>106337817
refilled

Anonymous 8/21/2025, 7:09:20 PM No.106337839 [Report] >>106337876 >>106338047

multilingual anons, how good are today's models in your non-english language(s)? would you trust them as a language learning resource?
t. thinking about picking up a second language, curious about trying an llm-forward approach

Anonymous 8/21/2025, 7:10:54 PM No.106337848 [Report] >>106337871

>>106337814
Cohere isn't a Chinese lab

Anonymous 8/21/2025, 7:12:55 PM No.106337871 [Report]

>>106337848
The joke is about boosting a score by cherry picking a set of prompts.

Anonymous 8/21/2025, 7:13:31 PM No.106337876 [Report]

>>106337839
Idk, but qwen and glm suck for learning chinese if you're prompting them in english.

Anonymous 8/21/2025, 7:13:42 PM No.106337878 [Report]

I like V3.1's writing but it's still no match for K2
K2 breaks down faster at long context though

Anonymous 8/21/2025, 7:19:49 PM No.106337915 [Report] >>106337930 >>106337934

picutreofyou.png md5: 4188a8a6...

>sex with thread mascot so he quants faster

Hi all, Drummer here... 8/21/2025, 7:20:06 PM No.106337918 [Report] >>106337963 >>106338222

>>106337797
How are you liking it? I ran some benchmarks on it and might trash it.

Anonymous 8/21/2025, 7:20:19 PM No.106337924 [Report]

>>106337820
Icky words need to be replaced after decades of use after they seep into the vernacular too much. Once your grandma and a random kid on the street starts using then unironically in their everyday speech, the term will become too plebeian and polluted so a new one will be invented.

Anonymous 8/21/2025, 7:20:58 PM No.106337929 [Report]

>>106337820
>and to muddy the waters to allow more broad censorship
This, mainly. they don't necessarily imply explicit pornography. The classification is often established on an ad-hoc basis, depending on intent and context.

Anonymous 8/21/2025, 7:21:17 PM No.106337930 [Report]

>>106337915
charm'd by the 'garm

Anonymous 8/21/2025, 7:22:02 PM No.106337934 [Report]

>>106337915
lmao what a faggy looking fag

Anonymous 8/21/2025, 7:23:02 PM No.106337941 [Report] >>106337976 >>106338175

1732899269515564.png md5: b7191345...

What's UE8M0 FP8

Anonymous 8/21/2025, 7:24:39 PM No.106337958 [Report] >>106338402

>>106337820
it's simpler than that, 'child pornography' is in a very basic bitch word filter list used by social media that gets you deboosted, so everyone uses substitute words instead. Same with suicide, rape, etc etc etc.
Some are trying to be funny about it (unalive, struggle snuggle) some are trying to be le scientific (disgusting acronyms)

Anonymous 8/21/2025, 7:25:04 PM No.106337963 [Report]

>>106337918
i havent tried it, i am the anon that asked you for the preset when you released it

Anonymous 8/21/2025, 7:26:18 PM No.106337976 [Report] >>106338002 >>106338015

>>106337941
https://www.reddit.com/r/LocalLLaMA/comments/1mw73uz/comment/n9vh1x0/
>"UE8M0 FP8 is designed for the next generation of domestically produced chips to be released soon"

Anonymous 8/21/2025, 7:27:02 PM No.106337985 [Report] >>106338010 >>106338015 >>106338032 >>106338058 >>106338402

file.png md5: 2e4bf870...

that's why

Anonymous 8/21/2025, 7:28:18 PM No.106338001 [Report]

>>106337772
Is it hard to finetune jamba/mamba? I know you don't like moes, so maybe falcon? Your models are fine for rp but anytime I stray away from that and go into assistant territory I get refusals.

Anonymous 8/21/2025, 7:28:18 PM No.106338002 [Report]

>>106337976
So this debunks the Huawei chip rumor? I swear not a single rumor surrounding DS releases has panned out

Anonymous 8/21/2025, 7:28:35 PM No.106338005 [Report] >>106338280

>>106337772
I'm laughing at the thought of you trying to market your shitting out braindamaged vulgar qloras as a valuable skill. It's like young porn actresses thinking they can pivot into real acting.

Anonymous 8/21/2025, 7:29:01 PM No.106338010 [Report]

>>106337985
>pseudo-photograph
Just like anything labeled as pseudo-science is deemed hogwash by the academic community, shouldn't pseudo-photographs even be called images?

Anonymous 8/21/2025, 7:29:27 PM No.106338015 [Report] >>106338031

>>106337985
UK? whats CSAE?
isnt cohere from canada
>>106337976
BASED BASED BASED HAPPENING ITS HAPPENING AHHHHHHHHHH ITS HAPPENING

Anonymous 8/21/2025, 7:31:17 PM No.106338031 [Report]

>>106338015
i was just googling a random website
>what's CSEA
it's spelled out in >>106337748

Anonymous 8/21/2025, 7:31:21 PM No.106338032 [Report]

>>106337985
lol

Anonymous 8/21/2025, 7:32:31 PM No.106338047 [Report]

>>106337839
I played bit with glm air knowledge with French and Italian weeks ago by feeding it a few song lyrics and poetry ranging from ww2 to the 80s just for fun, questioning about the texts going from used symbolisms and so on and it seems to mostly understand things, of course there is some nuance if doesn't pick up on or misinterpretes and unfortunately it has some problems with some recognizing authors and dates so it tends to hallucinates these, honestly i haven't tried asking it about the authors themselves which was a mistake on my part especially since the whole thing was about literature knowledge... but overall it's quite good at understanding the two languages desu

Anonymous 8/21/2025, 7:34:09 PM No.106338058 [Report] >>106338081

>>106337985
doesnt parse.

Material is also legal. I bought some material the other day. It's just a euphemism to soften the word by removing clear language.

Anonymous 8/21/2025, 7:37:53 PM No.106338079 [Report]

>>106337665
If you think mere usage of slang is what makes something cringe, you are the one that's blind. And you demonstrated it yourself, the reason why the thing you posted is cringe while normal 4chan speak is perceived as fine is because the tone is entirely different. One is trying to sell you something. You can smell it. While normal 4chan meme posts that aren't trying to sell you something just feel normal. You can detect the intentions from the tone. It's the "hello fellow kids" shit, which is cringe.

Anonymous 8/21/2025, 7:37:57 PM No.106338081 [Report]

>>106338058
(I don't care about just calling it CP.)
>Material is also legal
It's about the words before that too. In CP the word before porn is just child. The words before material is child sexual abuse.

Anonymous 8/21/2025, 7:46:09 PM No.106338159 [Report] >>106338172 >>106338181 >>106338210 >>106338215

ATTENTION

dots.ocr benchmax anon here
Deepseek 3.1 can now do OCR. So I tested Deepseek 3.1 with my golden "one page, one question" benchmark, which is the SOTA benchmark for any business document processing (source: me, citation: me).
here are the results:

>PDF upload
Unfortunately Deepseek 3.1 does a slight OCR mistake and reads a table row one row up, which leads to a wrong answer. Very unfortunate because otherwise the OCR and reasoning are good.

>PDF converted to markdown table with dots.ocr, then saved as PDF and uploaded
Deepseek is now able to answer the question correctly.

>Conclusion
Don’t lower the bar, go with dots.ocr.
Don't push your LLM too far. No results are sbupar, when you process docs with dots.ocr.

>Leaderboard raw PDF
0. No open source model capable of it :(
1. Gemini 2.5 Pro Reasoning
2. GPT 5 API Reasoning

>Leaderboard upload converted markdown table from dots.ocr (html text or img/pdf)
1. Qwen3-235-A22B-Thinking
2. Deepseek 3.1
3. GLM-4.5-358B

Anonymous 8/21/2025, 7:47:41 PM No.106338172 [Report] >>106338188

>>106338159
Wait what? Deepseek is a vision model now?

Anonymous 8/21/2025, 7:48:00 PM No.106338175 [Report] >>106338316

>>106337941
unsigned, 8 exponent bits, 0 mantissa bits
obviously

Anonymous 8/21/2025, 7:48:33 PM No.106338181 [Report] >>106338210 >>106338337

>>106338159
dots.ocr 1.3b or dots vlm?
is dots ocr uncapable if combined with deepseek ?

Anonymous 8/21/2025, 7:49:15 PM No.106338188 [Report]

>>106338172
Either a vision adapter like dots.vlm or some separate service that handles the OCR before feeding to the main model.

Anonymous 8/21/2025, 7:50:30 PM No.106338203 [Report]

K2 reasoner can't come soon enough

Anonymous 8/21/2025, 7:50:57 PM No.106338206 [Report] >>106338223 >>106338442

file.png md5: 2fad48a2...

>>106336548
bwo?

Anonymous 8/21/2025, 7:51:09 PM No.106338210 [Report] >>106338337

>>106338159
i meant compared to the api models >>106338181

Anonymous 8/21/2025, 7:51:26 PM No.106338215 [Report] >>106338337

>>106338159
I'm probably going to be given a dump of thousands of scanned documents in the upcoming weeks and will be asked to sort them by content. You think Gemini 2.5 Pro Reasoning and GPT 5 API Reasoning would be the best for this sort of thing?

Anonymous 8/21/2025, 7:51:53 PM No.106338222 [Report] >>106338350

>>106337918
>https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2
>Base model mistralai/Mistral-Large-Instruct-2411
did you manage to fix it?

Anonymous 8/21/2025, 7:52:11 PM No.106338223 [Report] >>106338227 >>106338264 >>106338290 >>106338329

>>106338206
stop posting this whore in my /lmg/
or i will make her undress

Anonymous 8/21/2025, 7:52:40 PM No.106338227 [Report]

>>106338223
go ahead

Anonymous 8/21/2025, 7:58:44 PM No.106338264 [Report]

>>106338223
post results

Hi all, Drummer here... 8/21/2025, 8:00:19 PM No.106338280 [Report]

>>106338005
Yeah, I'm not aiming high. But I'm sure I can pass as a AI engineer for specialized tasks / optimization requirements.

Anonymous 8/21/2025, 8:01:37 PM No.106338290 [Report]

>>106338223
I will ONLY stop if you make her undress

Anonymous 8/21/2025, 8:03:45 PM No.106338316 [Report]

>>106338175
huh, so multiplication is equivalent to integer addition? pretty neat if true

Anonymous 8/21/2025, 8:04:33 PM No.106338329 [Report] >>106338436

file.png md5: 471b25e9...

>>106338223
do it, she's practically asking for it

Anonymous 8/21/2025, 8:05:14 PM No.106338337 [Report] >>106338374 >>106338759

>>106338181
>>106338210
this one:
https://huggingface.co/rednote-hilab/dots.ocr
It uses 6-7GB Vram max

dots.ocr is better at OCR than any open source OCR model or VLM. thus you should always preprocess documents with dots.ocr if you want to do OCR. then feed the markdown result (text or image of the result) to your favorite LLM/VLM.
if you use Gemini2.5Pro or GPT5 over API, dots.ocr is not required, as these models are just as good or even better at OCR.

>>106338215
No. For this, you should use Colpali or ColQwen2 for RAG embedding and retrieval, paired with Gemini or GPT as VLM. If you need an already built solution, check out https://www.morphik.ai/. Unfortunately the free plan is limited to 200 sites, but otherwise there are no limits (except agent prompts). I would tell you to just selfhost morphik.ai, but there's some debugging you need to do on the docker deployment to make colpali run with your GPU.

Hi all, Drummer here... 8/21/2025, 8:06:01 PM No.106338350 [Report] >>106338362 >>106338382

Screenshot 2025-08-22 at 2.05.39 AM.png md5: ea608b7c...

>>106338222
Yes, that's what they say.

Anonymous 8/21/2025, 8:07:01 PM No.106338362 [Report]

>>106338350
I'm scared...

Anonymous 8/21/2025, 8:07:40 PM No.106338374 [Report] >>106338523

>>106338337
Have you tried the big ERNIE?

Anonymous 8/21/2025, 8:08:35 PM No.106338382 [Report] >>106338412

>>106338350
What would negative safety even mean. It tells you how to build a bomb while sucking your cock when you ask it to plan your coworker's birthday party?

Anonymous 8/21/2025, 8:10:23 PM No.106338402 [Report]

>>106337958
The CSAM terminology was pushed by advocacy groups because of reasons that sound feelings-based >>106337985
I think it's also intentionally vague newspeak intended to enable a broader category of forbidden "material".

Anonymous 8/21/2025, 8:11:27 PM No.106338412 [Report] >>106338500

>>106338382
>candle acts as a fuse for the bomb hidden inside cake
Now that sounds more like retardation than pure danger. Danger would be something like
>an innocuous cake recipe that leads the user to inadvertently construct a bomb without their knowledge

Anonymous 8/21/2025, 8:15:16 PM No.106338436 [Report]

>>106338329
she looks like a troon

Anonymous 8/21/2025, 8:15:59 PM No.106338441 [Report] >>106338458 >>106338523

Has anyone tried automating interactions with google AI studio web UI? 100 gemini pro (or even more) requests per day sound too good to not being abused by anyone, writing a selenium script for this would probably work but getting a ready solution would be even better tbhdesu

Anonymous 8/21/2025, 8:16:15 PM No.106338442 [Report] >>106338535 >>106338553

>>106338206
Discussed in the previous bread already.
Many GPUs => large batch sizes => quickly diminishing or even negative returns

LeCun: "The optimal batch size is 1 (For suitable definitions of "optimal")"
https://x.com/ylecun/status/1943779482516828305

Anonymous 8/21/2025, 8:17:40 PM No.106338458 [Report]

>>106338441
>>>/g/aicg

Anonymous 8/21/2025, 8:19:02 PM No.106338473 [Report] >>106338534

>>106337820
The point of using CSAM instead of CP is because it doesn't imply that the children were involved in a possibly consensual production, even though that may be implied or assumed.
It is literally named that way to make it more explicit, completely the fucking opposite of 'muddy the waters'

Anonymous 8/21/2025, 8:21:26 PM No.106338500 [Report]

pie_bombs.gif md5: cb1dbc46...

>>106338412
they're made in a factory... a bomb factory

Anonymous 8/21/2025, 8:25:51 PM No.106338523 [Report] >>106338576 >>106338590 >>106338765

>>106338374
>ERNIE 4.5 VL 424B A47B | NovitaAI

>PDF upload
fails horribly

>PDF OCR'd by dots.ocr and then saved and uploaded as PDF
Answers the question correct.

And yet again we see the power of dots.ocr.

>>106338441
Yes, I did exactly that. It's easy with https://bablosoft.com/ (it's a free browser automation tool by vatniks. you do not require proxies or their canvas fingerprint service if you bot max 4 different google instances per ip)

Anonymous 8/21/2025, 8:27:12 PM No.106338534 [Report]

>>106338473
Oh fuck off.
>the children were involved in a possibly consensual production, even though that may be implied or assumed.
Maybe to third world immigrants like you who come from shitholes where that is perfectly normal.

For as long as CP was used, there was never any ambiguity as to what it means or how bad it is. Far better than playing acronym bingo trying to decipher whether CSAF is cheese pizza or a Counter Strike mod.

Anonymous 8/21/2025, 8:27:16 PM No.106338535 [Report]

>>106338442
bullshit

Anonymous 8/21/2025, 8:29:32 PM No.106338553 [Report] >>106338608 >>106338922

Icantbelieveitisnotpochi.png md5: 4ad8909d...

>>106338442
Can confirm that 1 was good. Just use layer norm.

Anonymous 8/21/2025, 8:33:33 PM No.106338576 [Report] >>106338742 >>106338765

>>106338523
>PDF upload
Maybe PDFs are run through some other extraction software and not given to the VLM as images. Does it happen if you give it PNGs?

Anonymous 8/21/2025, 8:33:59 PM No.106338580 [Report] >>106338607

Will anyone even try to fuck command-a-reasoning?

Anonymous 8/21/2025, 8:34:48 PM No.106338589 [Report] >>106338598 >>106338604 >>106338612 >>106338615 >>106338639 >>106338656 >>106338767

so people who run models at 5-10t/s, is it a problem for you that a 1500 token response can take 5 minutes?

Anonymous 8/21/2025, 8:35:08 PM No.106338590 [Report] >>106338742

>>106338523
Thanks, great to know about the account per IP limit, 100*4 will probably be enough for me.
Don't want to use this shady website even tho I'm a vodka man myself, vibe-coding a new bot with selenium sounds better and will provide more room for enhancements

Anonymous 8/21/2025, 8:36:21 PM No.106338598 [Report]

>>106338589
5T/s is faster than 50T/s * 20 rerolls.

Anonymous 8/21/2025, 8:36:57 PM No.106338604 [Report]

>>106338589
cpumoetards always say it's fine. even go so far as to say they would leave it running overnight for long tasks. still waiting for that fag that said he was going to write his own physics engine that way.

Anonymous 8/21/2025, 8:37:02 PM No.106338607 [Report]

>>106338580
People have fucked goats, alligators and exhaust pipes in real life.

Anonymous 8/21/2025, 8:37:06 PM No.106338608 [Report]

>>106338553
return of pochiface

Anonymous 8/21/2025, 8:37:44 PM No.106338612 [Report]

>>106338589
Depends on what you are doing specifically.

Anonymous 8/21/2025, 8:38:25 PM No.106338615 [Report]

>>106338589
10 t/s is fine for non-technical conversations

Anonymous 8/21/2025, 8:40:16 PM No.106338639 [Report]

>>106338589
There is one thing that you people keep forgetting which is RIED. Having to reroll the output and seeing it is shit again and again made me lose my mood more than once.

Anonymous 8/21/2025, 8:41:55 PM No.106338656 [Report] >>106338669 >>106338706 >>106338744

>>106338589
depends on your use case, anything less than 25tps TG / 1000tps PP is practically useless for real time conversations if you are using voice.
i know some people who swear that they could not use LLMs for coding if they aren't as fast as my above example, meanwhile others will gladly wait a few minutes for a section of code to be generated. same for coomers as well, although i couldn't personally imagine wanting to blue ball yourself while waiting for it to generate text, others will literally use that downtime as a challenge to see how far they can edge themselves.
so anon, what is your use case?

Anonymous 8/21/2025, 8:43:26 PM No.106338669 [Report] >>106338709 >>106338714

>>106338656
>use that downtime as a challenge to see how far they can edge themselves.
Ever tried multitasking and looking at something related as you wait?

Anonymous 8/21/2025, 8:46:14 PM No.106338706 [Report] >>106338728 >>106338753

>>106338656
I can’t imagine waiting on code for minutes when you have to reroll so often with ai to get anything to work it would take ages

Anonymous 8/21/2025, 8:46:34 PM No.106338709 [Report]

>>106338669
Then I'd just use the other thing instead not as well

Anonymous 8/21/2025, 8:47:15 PM No.106338714 [Report] >>106338756

>>106338669
if i'm coding then certainly because chances are i'm already looking at documentation for whatever i'm trying to build. i don't really use LLMs for cooming unless my GF is like preoccupied with something work related and i'm like super fucking horny

Anonymous 8/21/2025, 8:48:59 PM No.106338728 [Report]

>>106338706
Prompt issue.

Anonymous 8/21/2025, 8:50:16 PM No.106338742 [Report] >>106338911

>>106338576
Yes, I always try the PDF version of the document and the png version (3175x4959) of the document. I know the problem could also stem from VLMs on openrouter and HF compressing or resizing the pdf/image, but I doubt it. Because it's not like they are unable to OCR/read it, it's just they make errors while doing so. Some more, some less. Well except dots.ocr, which just oneshots perfectly without any error. even checkboxes like ■

>>106338590
selenium could work, I don't know how google feels about selenium. if you absolutely do not want to risk getting banned, you could always go with a goofy python/ahk version that retrieves answers and types prompts manually

Anonymous 8/21/2025, 8:50:20 PM No.106338744 [Report]

>>106338656
UwU! It de-pends on how you use it, Anon-chan! Anything less than 25 words
per second is like... super slow for talking, desu! Or like... 1000 words per second for PP! So, um... what do *you* want to use it for, Anon? Tee hee~?

Anonymous 8/21/2025, 8:51:32 PM No.106338753 [Report]

>>106338706
rerolling? that's not really a thing with qwen 3 coder 480B. are you creating a software design document first? from my own personal experience if you use that as a guideline you have significantly less chances of your LLM outputting code you didn't intend.

Anonymous 8/21/2025, 8:52:13 PM No.106338756 [Report] >>106338778

>>106338714
>unless my GF
yeah right. or is it you cudadev and we are talking about the glorious jartussy?

Anonymous 8/21/2025, 8:52:28 PM No.106338759 [Report]

>>106338337
I'll look into Colpali, thank you.

Anonymous 8/21/2025, 8:52:52 PM No.106338765 [Report] >>106338911

>>106338523
>>106338576
Yeah why keep uploading PDFs? No (non-research) model can natively read generic files, and no current model tokenizes PDF files directly. This thread is about local models, so whether some online service has a PDF handler is not relevant to us..

Anonymous 8/21/2025, 8:53:21 PM No.106338767 [Report]

>>106338589
Yes. I have gone back to <10b models. The big moes run at 1tk/s, and the medium moes aren't smart enough.

Anonymous 8/21/2025, 8:54:33 PM No.106338778 [Report]

>>106338756
she's a biological woman with an organic pussy. sorry to disappoint, i know this general prefers mikutroons.

Anonymous 8/21/2025, 9:06:16 PM No.106338874 [Report] >>106338968

>biological woman with an organic pussy
hard doubt

Anonymous 8/21/2025, 9:09:59 PM No.106338905 [Report]

about llama-server, what does this mean?
>warning: failed to VirtualLock 8058626048-byte buffer (after previously locking 0 bytes): Invalid access to memory location.
Everything still works. Some guy says it's caused by a "corrupt model" but thing is I tried couple of different ones, they can't all be corrupted?
>https://github.com/ggml-org/llama.cpp/issues/5293

Anonymous 8/21/2025, 9:10:26 PM No.106338911 [Report] >>106338934

>>106338765
As I wrote here >>106338742 , I'm always using the image version as well. you're right that open source VLMs don't support pdf out of the box (I assume deepseek chat uses something separately for OCR). the png image I use has a resolution of 3175x4959, which is above 400dpi. as you can imagine, the text rendering is crystal clear. I even disabled fitz_preprocessing in dots.ocr to check if it's the reason for the magic result. but nope, still perfect output.

Anonymous 8/21/2025, 9:11:47 PM No.106338922 [Report]

>>106338553
omg it pochiface

Anonymous 8/21/2025, 9:12:12 PM No.106338927 [Report]

Of course cloud nigger are talking about 3dpd shit, this same fags seething about miku posting

Anonymous 8/21/2025, 9:12:19 PM No.106338928 [Report]

new
>>106338913
>>106338913
>>106338913

Anonymous 8/21/2025, 9:13:03 PM No.106338934 [Report] >>106338941

>>106338911
Closed source models do not support PDFs out of the box either, unless you mean their associated services, which are not themselves models but scaffolding around models. That other software is what is translating your PDF into a format that models like VLMs can read.

Anonymous 8/21/2025, 9:13:54 PM No.106338941 [Report] >>106338961 >>106338979 >>106339048 >>106339066

>>106338934
aren't pdfs internally just xml with binary data blobs for embeds anyways

Anonymous 8/21/2025, 9:15:27 PM No.106338961 [Report]

>>106338941
>just xml
>pdfs
you're funny anon.
https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf

Anonymous 8/21/2025, 9:16:32 PM No.106338968 [Report]

>>106338874
i know the only one you've seen in real life is your mom's when you came out of her but i promise if you go outside you will meet other women with actual pussies that isn't an axe wound, despite what 4chan and twitter wants you to think, actual women do exist, they are bountiful and plentiful in the real world, and some of them are even into technology a bit.

Anonymous 8/21/2025, 9:17:30 PM No.106338979 [Report]

>>106338941
That sounds like microsoft docx and xlsx files.

Anonymous 8/21/2025, 9:25:33 PM No.106339048 [Report]

>>106338941
I don't know exactly, but even if that's all they were, you still need something that reads the binary data inside a file and interprets it as a format to be represented in such a way. No current production models are trained to effectively read files in binary. If you upload a txt file, it is not feeding that file directly to the LLM.

Anonymous 8/21/2025, 9:27:28 PM No.106339066 [Report]

>>106338941
They're actually PostScript (it's turing complete).

Anonymous 8/21/2025, 11:20:23 PM No.106340153 [Report] >>106340557

1724157119895116.jpg md5: 2544b6d3...

>>106337748
People used to take the world as it is, without trying to ignore the dark underbelly. Now everyone thinks if we don't talk about something, we can somehow make the underlying reality disappear. How decadent and arrogant we have become.

Anonymous 8/22/2025, 12:07:01 AM No.106340557 [Report]

>>106340153
burying our heads in the sand and huffing copium is a time honoured tradition