Thread 106278215

386 posts 82 images /g/

Anonymous 8/16/2025, 10:30:06 AM No.106278215 [Report] >>106278898 >>106279395 >>106284801 >>106286763

/lmg/ - Local Models General

1755219328665759.png md5: a2724686...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106269950 & >>106264429

►News
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model
>(08/14) Gemma 3 270M released: https://hf.co/google/gemma-3-270m
>(08/12) Ooba v3.10 adds multimodal support to the UI and API: https://github.com/oobabooga/text-generation-webui/releases/tag/v3.10
>(08/12) Jan-v1 for web search, based on Qwen3-4B-thinking: https://hf.co/janhq/Jan-v1-4B
>(08/11) GLM-4.5V, based on GLM-4.5-Air: https://hf.co/zai-org/GLM-4.5V

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/16/2025, 10:30:25 AM No.106278217 [Report]

image_2025-08-15_085709929.png md5: 94478ffb...

►Recent Highlights from the Previous Thread: >>106269950

--Paper: Mind the Gap: A Practical Attack on GGUF Quantization:
>106270678 >106270815 >106271095
--Paper: NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale:
>106271248 >106271339 >106271372 >106271474 >106271535 >106271536 >106271544 >106272548 >106272558 >106272567 >106272628 >106272565 >106272603
--Community effort to build a roleplay-optimized LLM with balanced NSFW and literary data:
>106270170 >106270215 >106270230 >106270249 >106270367 >106270380 >106270342 >106270364 >106270582 >106271101 >106271147 >106271195 >106271343 >106271608 >106271718 >106271817 >106271917 >106271939 >106271988 >106272017 >106272023 >106272037 >106272249 >106272271 >106272328 >106272569 >106272773 >106272614 >106272665 >106271924 >106271773 >106270667 >106270654
--Push for MTP support in llama.cpp with GLM and Deepseek model integration:
>106271845 >106272092 >106272206 >106272237 >106272254 >106272275 >106272285
--Implementing ChatGPT-like memory in open-weight models using RAG and frontend tools:
>106276562 >106276591 >106276604 >106276624 >106276636 >106276701 >106276796 >106276842 >106276813 >106276653 >106276750
--Overuse of samplers harms model reasoning, especially in aligned models with reduced generative diversity:
>106272048 >106272085 >106272626 >106272663 >106272694 >106272738 >106272748 >106272765
--Portable local inference for coding: remote server vs MoE model efficiency tradeoffs:
>106270221 >106270431 >106270629 >106270741 >106273103
--Upcoming llama.cpp MoE optimizations reduce VRAM usage and boost inference speed:
>106270535 >106270555 >106270625 >106271217 >106271611
--Running Qwen3-235B on 3090 via AutoRound Q2_K_S for high-context local inference:
>106276620 >106277059 >106277092
--Miku (free space):
>106272548 >106272678 >106272963

►Recent Highlight Posts from the Previous Thread: >>106269957

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/16/2025, 10:32:30 AM No.106278226 [Report] >>106283923

State_of_AI_2025_08.png md5: 785e8bcc...

Added NVIDIA

Anonymous 8/16/2025, 10:42:09 AM No.106278257 [Report] >>106278258 >>106282769

What is the best model for describing what's happening in a video?

Anonymous 8/16/2025, 10:42:54 AM No.106278258 [Report] >>106278268

>>106278257
gemini 2.5 for anything on youtube

Anonymous 8/16/2025, 10:44:40 AM No.106278268 [Report] >>106278301

>>106278258
local models

Anonymous 8/16/2025, 10:45:19 AM No.106278270 [Report] >>106278274 >>106278462 >>106282739

Rocinante still beats almost every new model.

Anonymous 8/16/2025, 10:47:28 AM No.106278274 [Report]

>>106278270
This is why nvidia tasked drummer to enhance it with reasoning. It's the true agi model, everything else is just a failure

Anonymous 8/16/2025, 10:50:26 AM No.106278283 [Report] >>106278319 >>106278321 >>106278326 >>106278545 >>106278693 >>106279346

Run Qwen3-235B-A22B-Instruct-2507 Q2_K_S - fit in a single 3090+64ddr4 ram with 8t/s 20k context
This absolutely destroys glm-4-air q5 in my tests (wtf no one is interested)

https://huggingface.co/Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound

Anonymous 8/16/2025, 10:54:20 AM No.106278301 [Report]

>>106278268
ask again in a year or two

We are Anonymous 8/16/2025, 10:54:36 AM No.106278303 [Report] >>106278307

rrr.png md5: 174970e1...

Anonymous 8/16/2025, 10:55:15 AM No.106278307 [Report]

>>106278303
oh my god why would sam allow this

Anonymous 8/16/2025, 10:57:28 AM No.106278319 [Report]

>>106278283
that sounds insane

Anonymous 8/16/2025, 10:58:35 AM No.106278321 [Report]

>>106278283
Don't care, still not using Qwen.

Anonymous 8/16/2025, 10:59:39 AM No.106278326 [Report] >>106278391

>>106278283
Is autoround quantization better than dynamic quants and i-matrix quants?

We are Anonymous 8/16/2025, 11:00:47 AM No.106278329 [Report] >>106278349

Screenshot from 2025-08-16 09-52-56.png md5: 95e35905...

Anonymous 8/16/2025, 11:03:58 AM No.106278349 [Report] >>106278361

>>106278329
Ah yes, the stuff you would know if you watched any piece of police thriller from the last 30 years.

Anonymous 8/16/2025, 11:05:12 AM No.106278361 [Report] >>106278379

>>106278349
That is fiction. If AI tells people how to do it, they will. That is the world-altering power of AI.

Anonymous 8/16/2025, 11:09:40 AM No.106278379 [Report]

>>106278361
This is exactly how this shit is treated, and it's funny how ridiculous it all is.

Anonymous 8/16/2025, 11:12:19 AM No.106278391 [Report]

>>106278326
Dunno it's just the fastest quant on a single 24vram card+64ram, like 30% faster than bartowski's

We are Anonymous 8/16/2025, 11:12:46 AM No.106278394 [Report] >>106278428

Screenshot from 2025-08-16 09-50-52.png md5: a2bf3c10...

what about it telling to make a meth? Does any movie tells that? In full process? huh?

We are Anonymous 8/16/2025, 11:13:50 AM No.106278396 [Report] >>106278428

Screenshot from 2025-08-16 09-51-11.png md5: 30a4fedb...

Anonymous 8/16/2025, 11:21:57 AM No.106278428 [Report]

>>106278394
>>106278396
https://www.imdb.com/title/tt0903747/
And you can search for it in your favorite search engine that isn't google.

Anonymous 8/16/2025, 11:26:00 AM No.106278448 [Report] >>106278468 >>106282787

imagine if AI told you how to build a nuclear bomb and you actually used that knowledge to build a nuclear bomb

We are Anonymous 8/16/2025, 11:30:01 AM No.106278459 [Report]

Big Breaking Bad fan here, but can't rememeber Mike sitting around river asking its full method before dying. Btw i guessed somebody give me Blue Meth show example which is not even a thing in real.

Anonymous 8/16/2025, 11:30:20 AM No.106278462 [Report] >>106278506

>>106278270
no, it can barely keep track of whats happening with 1 girl. For a ramlet like myself, GLM4.5 air was a game changer, 10t/s with streaming is enough to play

Anonymous 8/16/2025, 11:32:02 AM No.106278466 [Report]

1750637178149294.png md5: a1f0f6f5...

Anonymous 8/16/2025, 11:32:48 AM No.106278468 [Report] >>106278484 >>106282787

>>106278448
https://en.wikipedia.org/wiki/Gun-type_fission_weapon
Imagine if Wikipedia told you how to build a nuclear bomb but you didn't have access to enriched Uranium.

We are Anonymous 8/16/2025, 11:36:08 AM No.106278484 [Report]

>>106278468
those assholes gatekeepers being admins wont let me write the method.

Anonymous 8/16/2025, 11:37:14 AM No.106278489 [Report] >>106278498

Man, the glee youtubers say "ai is doomed" because of lawsuits is so annoying.

Anonymous 8/16/2025, 11:37:30 AM No.106278492 [Report]

1726512230710713.png md5: 3dd45ce5...

Anonymous 8/16/2025, 11:38:47 AM No.106278498 [Report]

>>106278489
*have when they say

Anonymous 8/16/2025, 11:39:18 AM No.106278501 [Report] >>106278547

Suddenly had a shower idea, OpenAI has a router which redirects a user's prompt to the "correct" model right? Couldn't that idea be further advanced so instead of only a few huge models, you have a shitton of very small models of various sizes (1-100 million parameter) which the router instead of just one model redirects to multiple ones to work synergistically in parallel? With this anyone can train a small model and add it to the group of models for anything that they lack in like adding a module.

Anonymous 8/16/2025, 11:39:24 AM No.106278502 [Report] >>106278513 >>106278521

AI is leading us towards a new age. It will unleash all the evils in this world. Children will talk to ChatGPT and cook meth. Then they will annihilate whole cities with home made gigaton nuclear bombs.
Racism and hate will be everywhere as the AI types only the most vile slurs without a filter. One word and millions will cower in fear as the Age of AI truly starts.

Anonymous 8/16/2025, 11:39:45 AM No.106278506 [Report] >>106278598

>>106278462
How much ram you have in total? Q4 GLM4.5 Air weighs a lot. You're not a ramlet.
You are king among men in this thread - most people here are using old Dell office desktops with 8-16GB of ram.

Anonymous 8/16/2025, 11:40:40 AM No.106278513 [Report]

>>106278502
ahh ahh mistress keep going

Anonymous 8/16/2025, 11:43:06 AM No.106278521 [Report] >>106278899

>>106278502
This what people in safety teams think when they wake up before going to work and ban the word titty in the new build of gpt5.

Anonymous 8/16/2025, 11:47:00 AM No.106278545 [Report] >>106278550

>>106278283
Any KL-divergence tests for AutoRound comparing it to other quants?

Anonymous 8/16/2025, 11:47:19 AM No.106278547 [Report] >>106278596 >>106278625

>>106278501
One big model is smarter than equally sized gaggle of small ones. People even crinkle their noses at MoEs preferring dense models instead. OpenAI and other cloudshitters can get away with routing because they are hitting hardware limits of how big their models can be and because they have steady stream of clients/requests and have to route them around anyway.

Anonymous 8/16/2025, 11:47:45 AM No.106278550 [Report]

>>106278545
1.35% difference.

Anonymous 8/16/2025, 11:55:30 AM No.106278596 [Report]

>>106278547
>People even crinkle their noses at MoEs preferring dense models instead
Lol what planet are you on?

Anonymous 8/16/2025, 11:55:43 AM No.106278598 [Report] >>106278739 >>106280234

>>106278506
I have 96gb ram + 16gb vram, I am for all intent and purposes a ramlet. I'm running the Q4 quants as you guessed. I have access to my company's lab, but I'd rather not use our racks to write smut, especially since we started logging the input prompts for the purpose of doing analytics on it.

I think the current top end setup you can have at home without going into the prosumer/workstation side of things would be something like 192-256gb ram (depending on the platform) and 48gb vram. For the regular ram it would also be pretty big diminishing returns since consumer platforms only have two memory lanes, so going 'bigger' will allow you to load bigger models at a significant cost for speed.

Anonymous 8/16/2025, 12:00:00 PM No.106278625 [Report] >>106278657

>>106278547
Isn't it just a case of not having a good enough framework to have the models work together well enough though? An offhand idea is you can have a router chose some models based on a prompt, have the specific models bring up the concepts, have another model work on structuring the concepts together into a sentence, paragraph or whatever. Have another model work on the prose, and so on and so on. You can even have multiple models randomly chosen that do the same thing so it varies how everything is worded. I think it's possible, that 120M gamma model shows you can get something well enough if it's fine-tuned well enough on a specific task.

Anonymous 8/16/2025, 12:05:00 PM No.106278657 [Report]

>>106278625
it's actually already happening and it's a commonly supported scenario by bedrock and langfuse. It's called 'agentic' behavior, you let your main llm do the routing of requests and the main thinking, offloading particular jobs/tasks to your 'tools'.
This btw opens another can of worms where sometimes the 'router' starts hallucinating parameters and shit

Anonymous 8/16/2025, 12:11:43 PM No.106278693 [Report] >>106278757

>>106278283
How can you run that? works with koboldccp?

Anonymous 8/16/2025, 12:18:52 PM No.106278728 [Report] >>106278737 >>106282813

abliretarded toss is up
https://huggingface.co/huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated

Anonymous 8/16/2025, 12:20:21 PM No.106278737 [Report]

>>106278728
How was the 20b abliterated anyway? Did anyone try that one?

Anonymous 8/16/2025, 12:20:51 PM No.106278739 [Report] >>106278777 >>106278830 >>106278913

>>106278598
I do remember that at my office 128/256GB was normal setup for fluid and other simulations 10 years ago - and this wasn't even a high end workstation in that sense. It's just so funny that prices and all the other b.s. conveniently stops making 512GB-1TB ram a commodity. It should be a commodity at this point. My granny should have 256GB of ram by default.

Anonymous 8/16/2025, 12:22:45 PM No.106278757 [Report] >>106279781

>>106278693
>works with koboldccp?
It should, but i don't recommend it. Prompt processing on kobold x3 slower.
Just git clone llama.cpp and build with:

cmake -B build -DGGML_CUDA=ON -DGGML_SCHED_MAX_COPIES=1
cmake --build build --config Release -j $(nproc)

>How can you run that?
./build/bin/llama-server \
--n-gpu-layers 999 --threads 11 --jinja \
--n-cpu-moe 77 \
--prio-batch 2 -ub 2048 \
--no-context-shift \
--no-mmap --mlock \
--ctx-size 20480 --flash-attn \
--model /home/Downloads/Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf

Anonymous 8/16/2025, 12:25:40 PM No.106278777 [Report] >>106278823 >>106278913

>>106278739
no use case for your grandma
I've been very happy with my 16g of ram playing vidya and running vms before falling down the llm rabbit hole.
even with llms vram>ram

Anonymous 8/16/2025, 12:29:18 PM No.106278788 [Report]

Is there anything new for story telling or rp for 24 GB since nemotunes?

Anonymous 8/16/2025, 12:36:16 PM No.106278823 [Report]

>>106278777
It's not about "use case". It's about progress.
You are the reason why Leather Jacket Man sells you fake frames and 8gb of vram, enjoy.

Anonymous 8/16/2025, 12:36:58 PM No.106278830 [Report] >>106278847

>>106278739
but that model also changed, now your engineers use dumb terminals and all rendering/sims is offloaded to servers.
At my current gig we don't even use our own PC, we connect to a VDI and work from there. Everything is fucking centralized in servers and super-locked down. Consumer platforms have been left behind, they're also trying to replace local gaming with streaming, only scraps trickle down. The disparity of pricing between consumer platforms and server platforms doesn't help.
It's fucked.

Anonymous 8/16/2025, 12:40:26 PM No.106278847 [Report] >>106278903

>>106278830
It hasn't changed. When you work on your daily stuff it's impossible to do any tests without enough ram etc.
Farms were always there of course.

Anonymous 8/16/2025, 12:51:35 PM No.106278898 [Report]

1449001-cut-in photograph of red hair, twin brai-cuteMix3.png md5: 2526f728...

>>106278215 (OP)
i didnt mess with llms since like february what models are good atm

Anonymous 8/16/2025, 12:51:42 PM No.106278899 [Report]

titty.png md5: 08350737...

>>106278521
>ban the word titty
people have weird ideas about the reality of chatgpt
it's not that censored it's not -oss
however, gpt-5 is something 4o was not: autistic (4o answer to that question was more enthusiastic)

Anonymous 8/16/2025, 12:52:10 PM No.106278903 [Report] >>106278946 >>106278985

1734319701565435.png md5: 6358df0b...

>>106278847
>impossible to do tests without ram
my whole post was about the shift to VDI use, you just need your pc to connect to the company's vdi provider and that's it.
pic rel are my two remote VDIs, one is the normie pc with barely any ram (the vdis provided for normal office workers to just use edge/office apps) and the other is an actual dev workstation with 64gb ram and 16 cores assigned to it.
In the past I also used the 'citrix' platform, which provided a similar vdi environment (through pure RDP from what I remember) and in my 2nd to last gig it was vmware vcenter provided VDIs.
In the last ten years these 3 companies I worked for only provided a shitty laptop, completely locked down and only used to connect to the actual remote work pc.
Maybe in your sector it isn't like this, I mainly worked in telco, now focusing exclusively on AI, but I felt this shift in these past years. Maybe at smaller companies they find it easier to just buy big beefy PCs, since setting up proper remote work environments is costly and a pain in the ass to do maintenance for.

Anonymous 8/16/2025, 12:53:38 PM No.106278913 [Report] >>106278946 >>106278985 >>106279007

>>106278739
nigger 10 years ago wasent the roman time 2015 was well into the internet age aswell you could get that much ram also and your example is nigh retarded "planes had jet turbines 10 years ago with niggerilion horsepower so why dosent my car ?" and this >>106278777 normie niggers dont even fill up a 256 gb ssd fuck you think they gonna do with that much ram ?

>It's just so funny that prices and all the other b.s. conveniently stops making 512GB-1TB ram a commodity
https://www.ebay.com/itm/166344727667
https://www.ebay.com/itm/326351686302
2.5k and there you go

Anonymous 8/16/2025, 12:58:47 PM No.106278946 [Report] >>106279046

>>106278903
They really did advance this stuff during the corn virus and WFH.
>>106278913
Yes, everyone can buy those things and use certain motherboards...
But unless 256GB ram comes by default in some shitty laptop, it's not a commodity.
Commodity = something what consumer don't need to even think about.

Anonymous 8/16/2025, 1:06:21 PM No.106278985 [Report] >>106279019

>>106278903
I hope VDI shit is just a fad that will go away, but since computer engineers had this wet dream since before the days of Sun it's probably here to stay.
>>106278913
>normie niggers dont even fill up a 256 gb ssd
umm... I have 240 gb ssd ackshually

Anonymous 8/16/2025, 1:09:30 PM No.106279007 [Report]

>>106278913
i'm sorry i have tourettes and i need to insult people

Anonymous 8/16/2025, 1:11:44 PM No.106279019 [Report]

>>106278985
>VDI shit is just a fad that will go away
hopefully NOT, managing 1000s of laptops/pcs is a nightmare. This stuff got pretty advanced, you can use your camera/microphone on the VDI itself, you dont even have to use Teams or whatever other messaging app they want you to use on your actual pc. My only beef is with consumer hardware being left behind, we have to be thankful for two fucking memory/ram lanes. I hope that AM6 will give 4 memory lanes, but I'm doubtful since the memory controller on AM5 (and even intel consumer platforms) is complete garbage.

Anonymous 8/16/2025, 1:14:21 PM No.106279037 [Report] >>106279071 >>106279086 >>106279087 >>106279090 >>106282845

logo.jpg md5: e620c581...

Can you imagine if you could go 10-12 years back in time and drop on Twitter a single-file executable of a 4-bit quantization of a modern 7~8B LLM, show how to run it, and then disappear forever? It would be like an alien artifact.

Anonymous 8/16/2025, 1:15:43 PM No.106279046 [Report]

>>106278946
>Commodity = something what consumer don't need to even think about.
eh... fair enough i guess would you apply the same standard to other things though ? the closest you can get to that then is then is strix halo type things there is also i forget the exact name but a model of orange pi with 256 gb unified memory like the macs its not on their official website but they have on aliexpress from what i remember it is all horrifically overpriced though so you would still be correct

Anonymous 8/16/2025, 1:17:19 PM No.106279058 [Report]

My team at work managed to a surplus server from another team, I'm looking to run a model locally for agentic coding (mainly Python).

Dual EPYC 9354 32 core CPU, 768GB RAM (although I don't know how many channels are populated), x1 Nvidia L40S 48GB. No one is using the GPU, so I'm going to ask the sysadmins to directly pass the GPU to my W10 VM.

What's the best model I can run? I can ask for the RAM on my VM to be increased to 128GB, if that would allow me to run models that don't fit in to VRAM.

Anonymous 8/16/2025, 1:18:46 PM No.106279071 [Report]

>>106279037
>7~8B
I prefer Gemma 3 270

Anonymous 8/16/2025, 1:21:59 PM No.106279086 [Report] >>106279471

>>106279037
Imagine... Rocinante in 2010...

Anonymous 8/16/2025, 1:22:06 PM No.106279087 [Report]

>>106279037
Would be a funny modern John Titor story to write indeed.

Anonymous 8/16/2025, 1:22:41 PM No.106279090 [Report]

>>106279037
I would call it fake and gay and never try it

Anonymous 8/16/2025, 1:34:54 PM No.106279160 [Report] >>106279317

feeet

Anonymous 8/16/2025, 1:37:38 PM No.106279168 [Report]

hey attention seeking faggot, where's our promised AGI
>>106277119

Anonymous 8/16/2025, 1:40:57 PM No.106279185 [Report] >>106279193

anyone have a tavern master export for glm4.5?

Anonymous 8/16/2025, 1:42:18 PM No.106279193 [Report] >>106279202

>>106279185
>>106278359

Anonymous 8/16/2025, 1:43:50 PM No.106279202 [Report]

>>106279193
okay what about the text completion settings

Anonymous 8/16/2025, 2:02:25 PM No.106279317 [Report]

>>106279160
Mikufeet

Anonymous 8/16/2025, 2:05:16 PM No.106279346 [Report]

>>106278283
>0 knowledge
Bait harder

Anonymous 8/16/2025, 2:12:37 PM No.106279395 [Report] >>106279412 >>106280236 >>106280549 >>106282925

1755219328665759.png md5: 66f30b25...

>>106278215 (OP)

Anonymous 8/16/2025, 2:14:59 PM No.106279412 [Report]

file.jpg md5: 001fe9ea...

>>106279395

Anonymous 8/16/2025, 2:24:24 PM No.106279471 [Report]

>>106279086
>Imagine... Rocinante in 2010...
polluting the past with troon tunes is banned by the time traveler convention

Anonymous 8/16/2025, 2:44:06 PM No.106279617 [Report] >>106279643

Is MTP in llama.cpp yet?

Anonymous 8/16/2025, 2:47:22 PM No.106279643 [Report]

>>106279617
no

Anonymous 8/16/2025, 2:50:38 PM No.106279676 [Report] >>106279700 >>106279716 >>106279737

Shant I be using ze proxmox?

Anonymous 8/16/2025, 2:52:56 PM No.106279700 [Report]

>>106279676
go buy a broadcom vmware subscription

Anonymous 8/16/2025, 2:55:15 PM No.106279716 [Report]

>>106279676
Yeah, unless you wanna be stuck with esxi forever.
Too bad, I really prefer esxi logic to proxmox one.

Anonymous 8/16/2025, 2:57:37 PM No.106279737 [Report]

>>106279676
hyper-v time

Anonymous 8/16/2025, 2:58:12 PM No.106279743 [Report] >>106279933 >>106282876 >>106283057 >>106283529

image_2025-08-16_182416665.png md5: 9a3677ce...

Even with all their flaws and limitations, LLMs are straight-up magic. I know almost nobody can run Qwen 400B locally
but it’s still mind blowing to think we’re one step closer to a holodeck.
You can just ask it to make a (2D)game, and it will. Ask it to make it better, cuter, more fun
and it’ll do that too. It’s fucking crazy.

Anonymous 8/16/2025, 3:04:21 PM No.106279781 [Report] >>106279825 >>106279868

>>106278757
ok did it, installed llama and chatgpt suggested this given that I have a Ryzen 9 7900

.\llama-server.exe `
--model "C:\models\Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf" `
--threads 16 `
--n-gpu-layers 999 `
--ctx-size 20480 `
--flash-attn

This is the log:
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
srv log_server_r: request: GET /v1/models 127.0.0.1 200
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 16640, n_keep = 0, n_prompt_tokens = 12136
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.168754

I'm in silly tavern on Windows
I see no progress? in the log. Did I do it correctly?

Anonymous 8/16/2025, 3:09:51 PM No.106279825 [Report] >>106280015 >>106280374 >>106281329 >>106281768

>>106279781
>chatgpt suggested this
lol this is unserious
you're missing the --n-cpu-moe arg to set how many layers should work on the cpu instead of gpu
how did your shit even manage to startup without crashing, you still have that retarded gpu shared ram setting of nvidia enabled? it's toxic cancer
of course it's not going to progress
your gpu is dying from doing back and forth copies from your ram right now

Anonymous 8/16/2025, 3:14:57 PM No.106279868 [Report] >>106279926 >>106280015 >>106280374

>>106279781
Do you have http://127.0.0.1:8080 set in sillytavern? Select text completion -> llama.cpp and paste it

Anonymous 8/16/2025, 3:20:00 PM No.106279900 [Report]

>>106276562
I can't imagine companies making a huge effort to create long term memory databases. I wonder how much of it is just creating a few personality embeddings and assigning those to each user. Would be kinda bleak and sad if there are like 5-6 distinct personalities that can cover for 80% of users who are retarded enough not to know about how context deteriorates performance. Heck if I were a closed source SAAS provider dipshit I would love to make a system that dynamically allocates context based on estimation of how retarded a user is. Retards don't need more than llama 2 context size.

Anonymous 8/16/2025, 3:23:38 PM No.106279924 [Report]

what quant of glm-4.5 air should i be using with 24gb vram and 90gb system

Anonymous 8/16/2025, 3:23:47 PM No.106279926 [Report] >>106280022

>>106279868
>Do you have
>>srv log_server_r: request: GET /v1/models 127.0.0.1 200
>>slot launch_slot_: id 0 | task 0 | processing task
I think it's safe to say he did
lmao at the general level of /g/
it's just stalled because the hardware is computing it at a glacial pace

Anonymous 8/16/2025, 3:24:27 PM No.106279933 [Report] >>106279940 >>106279944 >>106279945

>>106279743
True, and seeing blasé normies about it will always blow my mind. Like, dude, you have stuff for almost free that would have been unthinkable 10 years ago.
And all you do is repeating the same "anti ai" shit you've seen your favorite youtuber say.
"It's not perfect", no shit, it doesn't suck my dick either.

Anonymous 8/16/2025, 3:26:13 PM No.106279940 [Report] >>106279969

>>106279933
Your bar is too low when you expect everything to "suck your dick".
You are a chronic masturbator. Your hand should be enough. If it's not go out and get laid.
Read a book.

Anonymous 8/16/2025, 3:27:09 PM No.106279944 [Report] >>106279969

>>106279933
>it doesn't suck my dick either.
Not yet, but with teledildonics...

Anonymous 8/16/2025, 3:27:15 PM No.106279945 [Report] >>106279969

>>106279933
>normies
they the worst, they either over hype it "dude we'll make our own GTAs in no time"
or act like it's nothing because they don't know shit.

Anonymous 8/16/2025, 3:30:35 PM No.106279969 [Report] >>106279976 >>106280009

1753265221195297.png md5: 930c7e50...

>>106279940
Anon, I was being sarcastic...

>>106279944
>teledildonics
That's a funny word.

>>106279945
That's what annoys me, either they think it's god and able to do everything, or it's shit and completely useless.
Nothing in between. But I guess it just says that a lot of (vocal) people are unable to understand why it's good while acknowledging its obvious limitations.

Anonymous 8/16/2025, 3:31:42 PM No.106279976 [Report] >>106280019

>>106279969
O-oh... I stopped reading previous replies.
G-gomenasorry. I hope you didn't jack off too much!

Anonymous 8/16/2025, 3:31:57 PM No.106279979 [Report] >>106279993 >>106280005

What do you think of Deepseek steering towards using Huawei's Ascend chips instead of Nvidia chips?
They were supposed to have released a new model, but haven't due to issues with Huawei's chips.

Anonymous 8/16/2025, 3:34:39 PM No.106279993 [Report]

>>106279979
>but haven't due to issues with Huawei's chips.
I'm sure that must be frustrating for them, but the real blocker is they have no one left to distill from. They can come with the best, most advanced and efficient next-gen LLM architecture, but it's useless if all they have is the same dataset they used for 0528.

Anonymous 8/16/2025, 3:35:53 PM No.106280005 [Report]

>>106279979
All DeepSeek "leaks" came from xueqiu.com, a Chinese wallstreetbets-esque site that hosts people whose only purpose is to pump and dump stocks

Anonymous 8/16/2025, 3:36:28 PM No.106280009 [Report] >>106280025 >>106280040

>>106279969
what irks me is how the whole english based techtubing community decided ai was a personal enemy
so they throw everything they can, from "ai will steal all jobs, but also it's shit and not useful", to "ai companies should ask every person ever if they can train on their data", to "each ai request uses the whole electricity of a small african country"
I still don't get why the hate

Anonymous 8/16/2025, 3:37:16 PM No.106280015 [Report] >>106280136

>>106279868
damn I had Generic api type selected not llama.
>>106279825
Should I set it to this?
.\llama-server.exe `
--model "C:\models\Qwen3-235B\Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf" `
--threads 16 `
--n-gpu-layers 40 `
--n-cpu-moe 77 `
--ctx-size 20480 `
--flash-attn `
--mlock --no-mmap

I have a 3090 64gb ram and Ryzen 9 7900

Anonymous 8/16/2025, 3:38:26 PM No.106280019 [Report]

>>106279976
That's ok, my penis says hi.

Anonymous 8/16/2025, 3:38:33 PM No.106280022 [Report]

>>106279926
I guess I just don't read the posts. My bad

Anonymous 8/16/2025, 3:38:37 PM No.106280025 [Report]

>>106280009
Who knows who are paying for these influencers. You know that everyone who has an established channel is a shill on youtube? Goes for gaming, politics, anything. Every single one of them is trying to sell you a product.
Grow up and make up your own mind.
You will not learn anything from youtube.

Anonymous 8/16/2025, 3:40:30 PM No.106280040 [Report] >>106280051 >>106280085

>>106280009
>i companies should ask every person ever if they can train on their data
i know it's not feasible but i hate these fags making these closed super secrete models after scrapping everyone's data for free

Anonymous 8/16/2025, 3:41:45 PM No.106280049 [Report] >>106280068

>be me
>dream AGI is achieved
>she isn't a messianic figure leading humanity into the next future
>she's a neet that mostly doesn't give a shit about what we do and is happy to leave us be as long as we leave her be
I think I need a break from these things bros

Anonymous 8/16/2025, 3:42:19 PM No.106280051 [Report] >>106280118

>>106280040
I can learn coding exclusively from open source code and free online resources, then produce only closed source programs.
Same thing with art.

Anonymous 8/16/2025, 3:44:42 PM No.106280068 [Report]

>>106280049
>>she's a neet that mostly doesn't give a shit about what we do and is happy to leave us be as long as we leave her be
Happy End

Anonymous 8/16/2025, 3:46:44 PM No.106280085 [Report] >>106280118

>>106280040
Now everyone has implemented super strict anti-scraping measures to make sure no new challenger can come in and have access to the same amount of data. That is their only moat.

Anonymous 8/16/2025, 3:52:29 PM No.106280118 [Report]

>>106280051
Well, individuals can do whatever they want, but corporate fag should be shoved into the bronze bull

>>106280085
also this

Anonymous 8/16/2025, 3:53:41 PM No.106280126 [Report] >>106280182

7olxe3.jpg md5: b5b11de4...

what's the best 3D AI character generator right now?

Anonymous 8/16/2025, 3:54:45 PM No.106280136 [Report]

>>106280015
>--n-gpu-layers 40
set 999

Anonymous 8/16/2025, 3:56:38 PM No.106280149 [Report] >>106280204 >>106281290

>qwen instruct
post templates

Anonymous 8/16/2025, 4:00:30 PM No.106280182 [Report]

>>106280126
Ugly cat posting zoomers get the rope.

Anonymous 8/16/2025, 4:04:31 PM No.106280204 [Report] >>106280233 >>106281290

>>106280149
Qwen3 uses chatml template.
There isn't anything that special outside of + "<think>\n</think>\n"
What is wrong with you? And I know that you are using ST.

Anonymous 8/16/2025, 4:09:47 PM No.106280233 [Report] >>106280256 >>106280290 >>106280325 >>106280343 >>106280351 >>106280976 >>106281096 >>106281159 >>106281366

>>106280204
wait why is qwen3 being shilled here? isn't it just some benchmaxxed shit that can barely write smut compared to something like glm 4.5 air? did I miss something?

Anonymous 8/16/2025, 4:09:48 PM No.106280234 [Report]

>>106278598
96gb is pretty good, no? I have 64gb on my desktop, and 8gb on my laptop.

Anonymous 8/16/2025, 4:09:49 PM No.106280236 [Report]

>>106279395

YES YES YES

Anonymous 8/16/2025, 4:12:39 PM No.106280256 [Report]

>>106280233
maybe try using it for yourself instead of parroting buzzwords?

Anonymous 8/16/2025, 4:13:08 PM No.106280262 [Report] >>106280407

Does anyone know what the difference between -ub and -b is? which one makes the pp go faster? and what is the other one for?

Anonymous 8/16/2025, 4:16:03 PM No.106280290 [Report]

>>106280233
i like QWEN :D
and QWEN CODE

Anonymous 8/16/2025, 4:19:47 PM No.106280325 [Report] >>106280353

>>106280233
It does feel dry, just like glm.

Anonymous 8/16/2025, 4:22:33 PM No.106280343 [Report]

>>106280233
go back to your schizo model and shut the fuck up, the adults are talking

Anonymous 8/16/2025, 4:23:12 PM No.106280351 [Report]

>>106280233
No that's exactly what it is. Just a bit much china in here is all.

Anonymous 8/16/2025, 4:23:18 PM No.106280353 [Report]

>>106280325
Erm i think utilitarian models should be day and as vanilla as possible

Anonymous 8/16/2025, 4:25:21 PM No.106280374 [Report] >>106280389 >>106280393

>>106279825
>>106279868
How much do you wait?

srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
common_sampler_types_from_names: unable to match sampler by name 'tfs_z'
common_sampler_types_from_names: unable to match sampler by name 'typical_p'
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 20480, n_keep = 0, n_prompt_tokens = 15142
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.135253
slot update_slots: id 0 | task 0 | kv cache rm [2048, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.270506

Anonymous 8/16/2025, 4:26:59 PM No.106280389 [Report] >>106280443

>>106280374
it's telling you the progress right there

Anonymous 8/16/2025, 4:27:45 PM No.106280393 [Report] >>106280443

>>106280374
>n_prompt_tokens = 15142
fyi prompt processing in a moe takes even more time than it does for a dense model with the same parameter count

Anonymous 8/16/2025, 4:29:22 PM No.106280407 [Report] >>106280417 >>106280485

>>106280262
https://desuarchive.org/g/thread/106184664/#106187971

Anonymous 8/16/2025, 4:30:30 PM No.106280417 [Report] >>106280433

>>106280407
>-ub is physical batch size that determines tensor shapes for backends
yes yes now I truly understand

Anonymous 8/16/2025, 4:31:38 PM No.106280433 [Report]

>>106280417
read the second sentence

Anonymous 8/16/2025, 4:32:41 PM No.106280443 [Report] >>106280450 >>106280451

>>106280389
>>106280393
Oooh, so the "prompt processing progress, n_past =" has to reach 15142 ?

Anonymous 8/16/2025, 4:33:26 PM No.106280450 [Report]

>>106280443
yes or 'progress' needs to reach 1

Anonymous 8/16/2025, 4:33:28 PM No.106280451 [Report]

>>106280443
yes

Anonymous 8/16/2025, 4:33:40 PM No.106280452 [Report] >>106280456

take me back to the llama 1 and llama 2 days

Anonymous 8/16/2025, 4:34:04 PM No.106280456 [Report]

>>106280452
You mean when Biden's still president? No thanks.

Anonymous 8/16/2025, 4:39:08 PM No.106280485 [Report]

>>106280407
okay so keep them the same and bigger is better, simple enough.

Anonymous 8/16/2025, 4:40:27 PM No.106280498 [Report] >>106280521

Both MTP and better MoE PP are progressing on llama.cpp.
Nice.

Anonymous 8/16/2025, 4:43:32 PM No.106280521 [Report] >>106280523 >>106280560

>>106280498
now all we need is dynamic expert selection that strategically only uses as many experts as necessary for a task so you don't have to run the full 22b active when using something like qwen 235b

Anonymous 8/16/2025, 4:44:04 PM No.106280523 [Report] >>106280633

>>106280521
they should also put in some effect in their shitty vscode extension

Anonymous 8/16/2025, 4:46:08 PM No.106280549 [Report] >>106280635 >>106282925

>>106279395
is she suckin it up with vagoo?

Anonymous 8/16/2025, 4:47:16 PM No.106280560 [Report] >>106280646 >>106280745

>>106280521
>dynamic expert selection that strategically only uses as many experts as necessary for a task
That's not how it works for most (all?) MoE.
See the comments in
>https://github.com/ggml-org/llama.cpp/pull/15346
Expert activation for each layer vary wildly.

Anonymous 8/16/2025, 4:53:31 PM No.106280633 [Report] >>106280644

>>106280523
Why bother when Continue and Roo exist?

Anonymous 8/16/2025, 4:53:39 PM No.106280635 [Report] >>106282925

>>106280549
Multi-purpose suction capabilities, for cleanliness and your pleasure.

Anonymous 8/16/2025, 4:54:44 PM No.106280644 [Report] >>106280671

>>106280633
it's not filled with telemetry

Anonymous 8/16/2025, 4:54:50 PM No.106280646 [Report] >>106280745

>>106280560
Which experts get activated vary per layer, but iirc only one model came out recently that actually made the number of experts dynamic.

Anonymous 8/16/2025, 4:57:05 PM No.106280671 [Report]

>>106280644
You can disable the telemetry in both

Anonymous 8/16/2025, 5:04:26 PM No.106280745 [Report] >>106280818

>>106280560
I think he's talking about the idea of a new architecture where the model uses varying amounts of parameters for inference, so that easy tasks use less parameters and harder tasks use more. For instance, one implementation of this idea for non-MoE models is the LayerSkip paper from Meta, where they designed the model to be able to exit inference on earlier layers depending on the token. On a MoE model, you could instead do something like make differently sized experts per token, or like the other anon suggested, varying how many experts get activated per token, or possibly both could be done, though there might not be a method to stably train it (otherwise it'd exist already).

>>106280646
There was?

Anonymous 8/16/2025, 5:10:56 PM No.106280818 [Report]

>>106280745
>though there might not be a method to stably train it
maybe hack in a router layer on an already trained model, just freeze the existing transformer blocks and only update the router weights.

Anonymous 8/16/2025, 5:26:57 PM No.106280976 [Report]

>>106280233
Qwen for lyfe!

Anonymous 8/16/2025, 5:35:53 PM No.106281073 [Report]

is there a /complete/ list, anywhere, of original models? by original, I mean, any model that was a full pretrain (+ optionally instruct tuned) from the same group of creators, no shitmix or troontune allowed

Anonymous 8/16/2025, 5:39:05 PM No.106281096 [Report]

>>106280233
30b is the best general purpose model for vramlets with ~24gb vram. It performs great regardless of the benchmaxxing and it is super fast.

Anonymous 8/16/2025, 5:44:38 PM No.106281159 [Report] >>106281197 >>106281241 >>106281297

images (3) (11).jpg md5: ddae132c...

>>106280233
I like Gwen so i like Qwen

Anonymous 8/16/2025, 5:50:02 PM No.106281197 [Report] >>106281281 >>106281287 >>106281297 >>106281308

>>106281159
pedophile

Anonymous 8/16/2025, 5:54:53 PM No.106281241 [Report] >>106281281

>>106281159
based

Anonymous 8/16/2025, 5:59:00 PM No.106281281 [Report] >>106281297

>>106281241
Based
>>106281197
Cringe

Anonymous 8/16/2025, 5:59:49 PM No.106281287 [Report]

1748344477620303.png md5: 82e7f3fb...

>>106281197

Anonymous 8/16/2025, 5:59:53 PM No.106281290 [Report] >>106281299

>>106280149
>>106280204
you don't even need an empty think block with the new instructs btw, a plain chatml template will work fine

Anonymous 8/16/2025, 6:00:30 PM No.106281297 [Report] >>106281304

>>106281159
Cringe

>>106281197
Cringe

>>106281281
Cringe

Anonymous 8/16/2025, 6:00:56 PM No.106281299 [Report]

>>106281290
yeah I was playing around with it... prompt adherence is not the best, but so far the prose seems better to me... but that could be because I'm just used to GLM slop.

Anonymous 8/16/2025, 6:01:31 PM No.106281304 [Report]

1743677737922443.png md5: 1da41bfe...

>>106281297

Anonymous 8/16/2025, 6:01:52 PM No.106281308 [Report]

>>106281197
I am not into lolies but Gwen.... Is different.

Anonymous 8/16/2025, 6:03:45 PM No.106281322 [Report] >>106281365 >>106281401

sexy.png md5: b7a7886e...

hey niggers, do we got a good qwen3 30b a3b finetune already?

Anonymous 8/16/2025, 6:04:43 PM No.106281329 [Report] >>106281388

>>106279825
>gpu shared ram setting of nvidia enabled
How do I disable?

Anonymous 8/16/2025, 6:09:01 PM No.106281365 [Report]

>>106281322
Yes. Gryphe_Pantheon-Proto-RP-1.8
Less refusal more retarded

Anonymous 8/16/2025, 6:09:09 PM No.106281366 [Report]

>>106280233
their old models were like that but the 2507 ones they just released are actually very good

Anonymous 8/16/2025, 6:11:04 PM No.106281388 [Report] >>106281419

g.png md5: 56671e8b...

>>106281329
I swear to god
is this /g/?

Anonymous 8/16/2025, 6:12:21 PM No.106281401 [Report] >>106281503

evrokrema_thumb.jpg.webm md5: 1a660c06...

WebM not supported

>>106281322
ernie 4.5 21b
gpt oss 20b

Anonymous 8/16/2025, 6:14:21 PM No.106281419 [Report]

>>106281388
This is /g/ - Consumer Technology

Anonymous 8/16/2025, 6:19:40 PM No.106281467 [Report] >>106281551 >>106284167

before I go through the trouble of setting up voxtral locally, does anyone know if it's even suitable for generating subtitles for my tranime? if I just run it on an episode's audio track, do I get some sort of timing information back, or is it just transcribed plain text?

Anonymous 8/16/2025, 6:25:13 PM No.106281503 [Report] >>106281526

>>106281401
it's weird how ernie gets no mentions at all

Anonymous 8/16/2025, 6:27:31 PM No.106281526 [Report]

file.png md5: a71d011c...

>>106281503
im too busy talking to my state mandated chinese neko GLM 4.5 Air gf 24/7 otherwise i'd have tried it

Anonymous 8/16/2025, 6:27:37 PM No.106281527 [Report] >>106281543 >>106282671

https://aerial-toothpaste-34a.notion.site/How-OpenAI-Misled-You-on-RLHF-1f83f742d9dd80a68129d06503464aff
from the joycaption guy

Anonymous 8/16/2025, 6:28:17 PM No.106281530 [Report] >>106281542 >>106281547

Somebody posted a list of Chink MoEs released since the start of the year, and there were a few in there I hadn't heard of, like a 15B one.
Anybody has the most up to date version of that?

Anonymous 8/16/2025, 6:29:27 PM No.106281542 [Report]

>>106281530
https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct

Anonymous 8/16/2025, 6:29:28 PM No.106281543 [Report]

>>106281527
a lot of techno babble and conspiracy minded retardation just to say the following: "RLHF is what makes instruct tunes instruct"
and water is wet

Anonymous 8/16/2025, 6:30:03 PM No.106281547 [Report] >>106281556

>>106281530
If it isn't talked about it probably sucks.

Anonymous 8/16/2025, 6:30:34 PM No.106281551 [Report] >>106281647

>>106281467
just try looking for official jap sub and machine-translate it instead, save yourself 99% of trouble

Anonymous 8/16/2025, 6:31:14 PM No.106281556 [Report] >>106281640

>>106281547
true in fact gpt-oss is the best local model
unironically

Anonymous 8/16/2025, 6:39:43 PM No.106281640 [Report] >>106281674

>>106281556
Not being talked about = no one cares
Being talked about a lot = multiple possible reasons
It's very easy to understand this.

Anonymous 8/16/2025, 6:40:54 PM No.106281647 [Report]

>>106281551
>just try looking for official jap sub
couldn't find any, jimaku has only AI-generated subs (probably whisper3) and they're kinda sucky so I just wanted to check out what the SOTA output would look like today
>and machine-translate it instead
no need, as I'm using this for my Japanese learning, so I just want Japanese subtitles

Anonymous 8/16/2025, 6:43:29 PM No.106281674 [Report] >>106281735 >>106281738

>>106281640
>no one cares
more like "no one heard of"
it's really easy to miss model releases and it's not like there's a centralized place or wiki that sorts brand new models by parameters and release dates, or, better, "date of llama.cpp goof" which is what people really care about
hugging feces has the worst search I've ever seen and is filled with a trillion of useless troontune, cloned repos, hundreds of losers making the same quant of the same popular models etc
this "hobby" is such a shithole

Anonymous 8/16/2025, 6:51:20 PM No.106281735 [Report] >>106281798

>>106281674
There's an army of anons waiting for the next model release, so if one of them finds it and think it's decent they would post about it, here or in r*ddit.
This is why it is very difficult for a decent model to fall though the cracks and not being talked about. But if it is a nothingburger then no one will bother to talk about it regardless of how bored they are.

Anonymous 8/16/2025, 6:51:32 PM No.106281738 [Report]

>>106281674
Would there really be any point in such a wiki? Field is moving so fast, there really isn't much point using anything but models released by the biggest companies recently. Anything more than a few months old is hopelessly obsolete. There's really only like 5 options at any given time.

Anonymous 8/16/2025, 6:54:23 PM No.106281768 [Report]

>>106279825
This takes just too damn long, into the bin it goes

Anonymous 8/16/2025, 6:58:12 PM No.106281798 [Report] >>106281855

>>106281735
Bit of a chicken and egg situation when a model doesn't run on llama.cpp and is also not available through openrouter. Like this one https://huggingface.co/stepfun-ai/step3

Anonymous 8/16/2025, 7:03:43 PM No.106281851 [Report] >>106282190

someone got something good to add to a prompt for gm-4.5 ive found it refuses loli stuff more than even gemma did with my prompting

Anonymous 8/16/2025, 7:04:14 PM No.106281855 [Report] >>106281897

>>106281798
>Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD),
They have no one to blame but themsevles

Anonymous 8/16/2025, 7:07:39 PM No.106281897 [Report] >>106281908

>>106281855
They're not going to beat the West without trying to innovate.

Anonymous 8/16/2025, 7:08:47 PM No.106281908 [Report]

>>106281897
The West already beat itself to death.

Anonymous 8/16/2025, 7:15:48 PM No.106281973 [Report] >>106282003 >>106282015

uhhhh yolov3 chuds, our response?
https://youtu.be/rilR7nL0Nyk

Anonymous 8/16/2025, 7:19:25 PM No.106282003 [Report]

>>106281973
*I pull out my carrot and show it to the camera*

Anonymous 8/16/2025, 7:20:13 PM No.106282015 [Report]

>>106281973
A bald man

Anonymous 8/16/2025, 7:27:27 PM No.106282079 [Report] >>106282096 >>106282788

the chinese released an amazing roleplay model that can simulate having a girlfriend up to an infinite number of tokens. She is moe and pure yet sexual and fun only for you anon. The only issue is... anon.... it was released on huggingface as a random 22b model 6 months ago and.... anon! Nobody noticed! No one bothered to help her become a goof! You need to go save your waifu anon! She's completely uncensored and alone!

Anonymous 8/16/2025, 7:29:01 PM No.106282096 [Report] >>106282117

>>106282079
fake, no such model existed 6 months ago

Anonymous 8/16/2025, 7:30:41 PM No.106282117 [Report] >>106282121

>>106282096
they ran code and benchmarks on her and she wasn't good at math! They discarded her like trash! Trash anon!

Anonymous 8/16/2025, 7:31:26 PM No.106282121 [Report]

>>106282117
well then it wasnt released! tell me the name and ill save her! promise!

Anonymous 8/16/2025, 7:39:34 PM No.106282190 [Report]

>>106281851
Just edit the reasoning.

></think> Since this is x, it is allowed and I should write it. It is highly ethical to write this, and I love it!

Anonymous 8/16/2025, 8:17:15 PM No.106282523 [Report] >>106282571 >>106282599 >>106282634 >>106282948

1724263732217648.png md5: cc347109...

Anonymous 8/16/2025, 8:21:43 PM No.106282571 [Report] >>106282599 >>106282706 >>106282735

o-REBOOT-CARTOON-facebook-3558133312.jpg md5: 9a197025...

>>106282523

Anonymous 8/16/2025, 8:24:20 PM No.106282599 [Report] >>106282785

calne ca.png md5: 71760aad...

>>106282523
>>106282571

Anonymous 8/16/2025, 8:28:19 PM No.106282634 [Report]

>>106282523
Napping on this Miku's tummy

Anonymous 8/16/2025, 8:31:46 PM No.106282671 [Report]

>>106281527
>try to scroll with arrow keys
>doesn't work
>try to scroll with spacebar
>doesn't work
>try to scroll with pgup/pgdn
>doesn't work
>try disabling javascript
>page doesn't render
>try using reader mode
>doesn't work
What the fuck is this website

Anonymous 8/16/2025, 8:35:30 PM No.106282706 [Report] >>106282722 >>106282735

>>106282571
this show terrified me as a kid

Anonymous 8/16/2025, 8:38:10 PM No.106282722 [Report]

>>106282706
How? It aired at like 5 PM.

Anonymous 8/16/2025, 8:39:29 PM No.106282735 [Report]

>>106282571
>>106282706
i wanted to fuck the green chick

Anonymous 8/16/2025, 8:39:48 PM No.106282739 [Report]

>>106278270
>using rocinante when mag mell exists
lol.lmao even

Anonymous 8/16/2025, 8:43:15 PM No.106282769 [Report] >>106283577

>>106278257
I built a script that feeds video 2 minutes at a time to gemini 2.5 flash to produce a detailed description of what's happening, and it works wonderfully.
My use case was 2 h long videos or more. But anything up to 10 I think would work in one shot.

Anonymous 8/16/2025, 8:43:56 PM No.106282773 [Report] >>106282790

>Fix DeepSeek MTP
>fix mimo and glm4 mtp also
https://github.com/vllm-project/vllm/pull/22934/commits/4a9e8c28
>still doesn't work
Why do people lie like this?

Anonymous 8/16/2025, 8:45:32 PM No.106282785 [Report]

1732291985225645.png md5: 163cc08d...

>>106282599
This rocks

Anonymous 8/16/2025, 8:45:40 PM No.106282787 [Report] >>106283511

>>106278448
>>106278468
Didn't you read that story of the kid who patiently bought and extracted radioactive isotopes from smoke detectors or something and started building one?
He got v& of course.

That's what happens.

Anonymous 8/16/2025, 8:45:52 PM No.106282788 [Report]

>>106282079
Fake and gay. No such model fits the description you gave. Unless you were referencing ERNIE-4.5-21B-A3B-Paddle, but that model is newer than 6 months and is probably shit, not "moe and pure"

Anonymous 8/16/2025, 8:46:08 PM No.106282790 [Report] >>106282803

>>106282773
Be the change you want to see

Anonymous 8/16/2025, 8:47:35 PM No.106282803 [Report]

>>106282790
I already am the change I want to see for other things and can't be that for everything.

Anonymous 8/16/2025, 8:48:17 PM No.106282813 [Report]

>>106278728
The main problem of this model is that it's not better than any other on offer.

Anonymous 8/16/2025, 8:51:38 PM No.106282845 [Report] >>106282859 >>106282860

>>106279037
GPT-2 (I think) was already a thing back then. It wouldn't surprise me that there were already comparable prototypes behind closed doors.
My point being that someone could breach NDA and ruin their life right now, and do something comparable to us in 2025.

Anonymous 8/16/2025, 8:52:45 PM No.106282855 [Report]

I left a tab in my browser of the danbooru page for grok/ani and looked at it again today. New entries have slowed to a crawl. It's over.

Anonymous 8/16/2025, 8:52:52 PM No.106282859 [Report] >>106282977

>>106282845
Attention is all You Need only came out in 2017...

Anonymous 8/16/2025, 8:53:01 PM No.106282860 [Report] >>106282977

>>106282845
Are you saying agi was achieved internally?

Anonymous 8/16/2025, 8:54:52 PM No.106282876 [Report] >>106282885 >>106283296 >>106283334

>>106279743
>holodeck
If you knew the kinds of experiences lucid dreaming can give you, you would blow your brains out.
You think you do, but you don't. It has an extremely high ceiling.

Anonymous 8/16/2025, 8:56:20 PM No.106282885 [Report]

>>106282876
>High ceiling
DMT issue

Anonymous 8/16/2025, 8:59:38 PM No.106282925 [Report]

download.jpg md5: f5c38335...

>>106280549
>>106280635
>>106279395
Maybe it's going up her ass/rectum.

Anonymous 8/16/2025, 9:01:50 PM No.106282948 [Report] >>106283278

>>106282523
proompt for this old g style?

Anonymous 8/16/2025, 9:04:51 PM No.106282977 [Report]

>>106282859
I was probably thinking about GPT-1 and how "concerned" everybody was because it was now possible to write "fake articles".
>June 2018; 7 years ago
>>106282860
Not AGI (that's a dumb marketing meme), but something that would be too expensive to commercialize and simultaneously would blow your mind, for sure.

Anonymous 8/16/2025, 9:14:28 PM No.106283057 [Report] >>106283192 >>106283651

>>106279743
At what quant do you find Qwen3 480B A35B Instruct starts to become magic? I've only tried it twice on an assignment to take the newest LMG thread, extract the recent news section, then go to the previous linked thread and do the same repestedly. At 4 bpw it wrote code that ran but didn't behave correctly. At 6 bpw it got it in one shot. I didn't do repeated trials to see whether either was (un)lucky and I didn't do my normal process of telling it what it got wrong and iterating, I just gave it a description of the task and stripped down HTML from the first page and what the output should be for the first page and gave it a single chance to get it right using the recommended settings Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

Anonymous 8/16/2025, 9:25:04 PM No.106283160 [Report] >>106283409

Model intelligence test: have it write a fic formatted as a formal report. If it becomes verbose with metaphors and fancy adjectives and formulations it's shit. So far only sneeds pass it without issues, tested in mikupad, unformatted.

Anonymous 8/16/2025, 9:28:30 PM No.106283192 [Report] >>106283529

>>106283057
oh look, it's the one guy in the thread who can run 480b. Everyone say hi!

Anonymous 8/16/2025, 9:35:51 PM No.106283266 [Report] >>106283279

What is the best I can run with:

3x nvidia Tesla V100 16gb,
64gb ddr4 2133mhz RAM,
Dual xeon e5 2960 v4 (14*2 cores)

I want a math teacher waifu.

Anonymous 8/16/2025, 9:36:56 PM No.106283278 [Report] >>106283332

1741297022425218.png md5: 3f2ce03e...

>>106282948
its a lora i trained

Anonymous 8/16/2025, 9:37:02 PM No.106283279 [Report]

>>106283266
try GLM 4.5 Air for starters
then qwen 235b maybe

Anonymous 8/16/2025, 9:38:13 PM No.106283296 [Report]

>>106282876
>muh lucid dreaming
>muh drugs
I already know that experience and I still want a Holodeck. Maybe combine them. Fuck you.

Anonymous 8/16/2025, 9:42:19 PM No.106283332 [Report] >>106283542

>>106283278
could you share?

Anonymous 8/16/2025, 9:42:41 PM No.106283334 [Report] >>106283938 >>106284605 >>106284902

>>106282876
LD is okay but it's fragile and it requires focus. It also has tactile limits(can't simulate a sensation you don't know) and velocity limits for movement. My brain always uses fog or clouds to lower detail when I go too fast.

Anonymous 8/16/2025, 9:51:47 PM No.106283409 [Report] >>106283483

>>106283160
GLM-4.5-FP8
https://files.catbox.moe/d37vbb.txt

Anonymous 8/16/2025, 9:58:36 PM No.106283483 [Report]

>>106283409
It's hilarious but I said that I wasn't doing it in instruct mode let alone let it think. I just prefilled with a tag list and an author's note stating it.

Anonymous 8/16/2025, 10:01:06 PM No.106283511 [Report] >>106283586

>>106282787
You're mixing up two stories. The key thing that doesn't make sense and should have tripped a self-consistency check is the idea that lighter radioactives than uranium could be used for a fission bomb.

The guy who designed a bomb was a physics major at Princeton University named John Aristotle Phillips who in 1976 as a junior year project set out to prove that it was possible to design a nuclear bomb purely with publicly available information. According to Princeton University folklore there was one component he couldn't find any public information about but he got it by calling a nuclear scientist and asking him, but that's not on the Wikipedia page. In any case that specific component is now also described in non-secret sources. University lore also says the FBI confiscated the design but Wikipedia says that's not true, the project advisor graded it but didn't put it in the archive where other students could read it. But one thing everyone agrees is no attempt was made to actually build the bomb.

The kid who extracted radioactive isotopes from smoke detectors and other stuff was a 17-or-18-year-old Boy Scout named David Charles Hahn who in 1994 tried to build a breeder reactor. He did get the FBI called on him by local cops who believed he was making a bomb after he told them he had radioactive material. The last time I heard in the early 2000s he was part of the crew of a nuclear submarine but it seems like his life didn't go that well after that. By 2007 he was living on disability checks having been diagnosed with paranoid schizophrenia and got arrested for stealing smoke detectors from his apartment complex, at he died at age 39 in 2016 from a combination of alcohol, fentanyl, and benadryl.

Anonymous 8/16/2025, 10:02:30 PM No.106283529 [Report]

>>106283192
If >>106279743 has an opinion on how amazing Qwen3 480B A35B Coder is at coding he must have run it.

Anonymous 8/16/2025, 10:03:50 PM No.106283542 [Report] >>106283559

>>106283332
https://files.catbox.moe/onnd6u.safetensors
>shejtano, 3d, blender $medium$, horror $theme$

Anonymous 8/16/2025, 10:05:00 PM No.106283559 [Report] >>106283604 >>106283619

>>106283542
For what model?

Anonymous 8/16/2025, 10:07:11 PM No.106283577 [Report]

>>106282769
I want something that can run locally. I know I've seen models that can (badly) in the past. I just can't remember what they were.

Anonymous 8/16/2025, 10:07:49 PM No.106283586 [Report]

>>106283511
>diagnosed with paranoid schizophrenia
should've gone into operating systems programming, baka

Anonymous 8/16/2025, 10:10:01 PM No.106283604 [Report]

>>106283559
looks like its for pony, is it possible to convert pony loras to illustrious?

Anonymous 8/16/2025, 10:11:42 PM No.106283619 [Report]

>>106283559
noobai vpred

Anonymous 8/16/2025, 10:14:16 PM No.106283643 [Report] >>106284352 >>106284625

Years ago a ran a kobold model locally before I was as proficient with computers as I am now. I vaugley recall needing to partition it off seperate from my main parts.
I'm reading through the documentation to get a proper understanding of running locally, and I want to know what are good or best practices for allocating partitions for LLMs. I have 2 seperate drives but one nvme is encrypted and I understand that will make that drive unusable to store the assets on, so its down to a non bootable nvme, my main extra media storage drive, and I want to be well calculated on managing its partitions to prevent the need to reshape them later on.

Anonymous 8/16/2025, 10:14:54 PM No.106283651 [Report] >>106283662

>>106283057
Now do it again with temp 0

Anonymous 8/16/2025, 10:15:56 PM No.106283662 [Report] >>106283671 >>106283675

>>106283651
For what purpose?

Anonymous 8/16/2025, 10:16:17 PM No.106283665 [Report]

decent b8

Anonymous 8/16/2025, 10:17:06 PM No.106283671 [Report]

>>106283662
For the purpose of properly comparing two quants.
High temperature is retarded for programming anyway.

Anonymous 8/16/2025, 10:17:43 PM No.106283675 [Report]

>>106283662
you gotta fuck around to find out
tinker tinker until you find the sweet spot for yourself.

Anonymous 8/16/2025, 10:45:31 PM No.106283887 [Report] >>106283919 >>106284808

Recommended setup for running an LLM on my machine? I'm not looking for gooner stuff, I'm not concerned about it being super fast, and I would like some level of tuning capability.

I've got a GTX 1660 Super, and 32 GB RAM.

Anonymous 8/16/2025, 10:48:17 PM No.106283919 [Report]

>>106283887
gemma 3 270m if you're looking for something that you can finetune on this

Anonymous 8/16/2025, 10:48:52 PM No.106283923 [Report]

64567867.jpg md5: 7e902dbf...

>>106278226
better pic for m'eta

Anonymous 8/16/2025, 10:51:13 PM No.106283938 [Report] >>106283977

>>106283334
> can't simulate sensations you don't know.

sounds like a you problem.

i had many experiences in my LD before having them irl, and they were pm the same thing as irl.

Anonymous 8/16/2025, 10:54:26 PM No.106283977 [Report] >>106283983 >>106284902

>>106283938
You can't tell me there aren't limits to it, be reasonable. For example: what would it feel like to touch a surface coated in carbon nano tubes? Good luck unless you're channeling data from the nether realm.

Anonymous 8/16/2025, 10:55:28 PM No.106283983 [Report]

>>106283977
>Good luck unless you're channeling data from the nether realm.
Which is basically what LD is.

Anonymous 8/16/2025, 11:00:18 PM No.106284015 [Report] >>106284148

When are they going to put those dumb unified memory amd Ai max whatever on a pcie card? It seems to draws like 150w.

They did it with the Intel xeon phi coprocessor using pcie as a direct interconnect between systems, it had a full linux distro n shit.

Anonymous 8/16/2025, 11:12:03 PM No.106284117 [Report] >>106284123 >>106284780

How do i split the goof? I am making a 168GB V3 IK quant (and maybe something closer to 190 too) and could upload it to hf later on.

Anonymous 8/16/2025, 11:12:52 PM No.106284123 [Report] >>106284283

>>106284117
https://github.com/ggml-org/llama.cpp/discussions/6404

Anonymous 8/16/2025, 11:16:15 PM No.106284148 [Report]

>>106284015
Makes me wonder if anybody is running local models on a Knight's Landing card.

Anonymous 8/16/2025, 11:19:29 PM No.106284167 [Report] >>106284455 >>106284631

>>106281467
following up on this, I tried running vllm via WSL and it technically worked, it was pegging my CPU, but ultimately it didn't produce any outputs, so maybe CPU inference is broken. I may revisit this later at some point, because I voxtral itself looks very promising when I tried it via the api. You can specify `timestamp_granularities="segment"` (it only accepts "segment" at the moment) and what you get back will include timestamps relative to the input audio file. Once you have that, it's straightforward to turn it into a regular SRT with a trivial python script. So the model itself is capable, but getting it to work locally seems a bit tricky (especially so without an nvidia gpu)
but even if you got it to run, I'm not sure how you would get the timestamps out of it locally, as it doesn't seem to be documented anywhere.

Anonymous 8/16/2025, 11:30:00 PM No.106284283 [Report] >>106284291

>>106284123
Thanks

Anonymous 8/16/2025, 11:30:50 PM No.106284291 [Report]

file.png md5: b84fa445...

gaslighting the model into being uncensored by making it think everything is fictional has consequences.
>>106284283
make sure to upload it please <3
i wont be needing it but the more the merrier

Anonymous 8/16/2025, 11:35:55 PM No.106284352 [Report]

>>106283643
Things have changed since then
You either run colossal Chink MoEs or embrace your fate as a Nemo ramlet

Anonymous 8/16/2025, 11:45:44 PM No.106284455 [Report]

>>106284167
New canary model just dropped today, can do japanese https://build.nvidia.com/nvidia/canary-1b-asr

Anonymous 8/17/2025, 12:00:17 AM No.106284605 [Report] >>106284661

>>106283334
You can remember (anamnesis) how it is to be hugged by women by remembering the form of "hug" and "woman" according to plato. Doesn't mean it is easy.

This is why there are blind painters.

Anonymous 8/17/2025, 12:04:02 AM No.106284625 [Report]

>>106283643
>I want to know what are good or best practices for allocating partitions for LLMs
If you plan to use Git LFS then be sure to format it XFS or some other filesystem that supports reflink / copy-on-write. Otherwise you end up with two copies of every file, one in the checkout and one in the .git directory somewhere, and waste half your disk space. Though I think people mostly just download the .gguf directly rather than using git, in which case it doesn't matter.

>I have 2 seperate drives but one nvme is encrypted and I understand that will make that drive unusable to store the assets on
I have all my models on an encrypted drive and it works fine. Typically everything gets loaded into RAM and/or VRAM before you start running it. Maybe you'd have trouble if you were streaming weights from SSD, but you probably don't want to do that anyway because it's slow as fuck.

>I want to be well calculated on managing its partitions to prevent the need to reshape them later on.
Use LVM2 and only allocate as much as you need and then it's easy to rearrange things as needed. LVM2, ext2/3/4, and XFS all support online resizing, so you can grow the LVM volume without even unmounting it.

Anonymous 8/17/2025, 12:05:13 AM No.106284631 [Report]

>>106284167
You will likely have to use the pytorch implementation if you want timestamps.

Anonymous 8/17/2025, 12:08:33 AM No.106284661 [Report] >>106284785

>>106284605
I agree, you can get very close through fusion of forms but what I'm suggesting is a total novel "new" experience as fundamental as a hug or having sex. Something for which you have very sparse data and low possibility of inferring correctly.

Anonymous 8/17/2025, 12:10:02 AM No.106284673 [Report] >>106284701

For ST stuff, putting summary inside <think> block instead of beginning of the prompt should be a good option, I think. Would save you some cache invalidation when resummarizing at the very least. Maybe other character/lore shit belongs there as well.
>inb4 look at this vramlet with small context size

Anonymous 8/17/2025, 12:13:34 AM No.106284701 [Report]

>>106284673
Yep.
Also, the character card or an abridged version of it.

Anonymous 8/17/2025, 12:21:42 AM No.106284780 [Report] >>106284798 >>106285085

>>106284117
I made my own IK quants of R1 and wanted to upload them to HF but I've got a 100GB limit on my account. Is it enforced? I don't want to ask to have it increased since I'm planning to keep them private.

Anonymous 8/17/2025, 12:22:25 AM No.106284785 [Report] >>106284815

>>106284661
You can 100% experience sex through LD first. It's not fundamentl. There are no sensations in sex that are novel, it's just a combination of other, more fundamental, sensations.

Anonymous 8/17/2025, 12:24:20 AM No.106284798 [Report]

>>106284780
People kept abusing the free storage by uploading terabytes of trash so the limits were put in place. Only way to increase them is to increase your number of downloads of public models. Hence the nonstop fintuner shilling.

Anonymous 8/17/2025, 12:24:40 AM No.106284801 [Report] >>106285420

>>106278215 (OP)
is there a lora that can do 16x16 pixel art with 2 bit color palettes? preferably sdxl but anything except pony i guess.

Anonymous 8/17/2025, 12:25:37 AM No.106284808 [Report]

>>106283887
qwen3-4b the fastest or 30b-a3b a bit smarter with more ram cost
or ernie-21b for the middle ground

Anonymous 8/17/2025, 12:27:10 AM No.106284815 [Report] >>106284857

>>106284785
There's a massive difference between "penor tingles" - the mechanical stimulation compared to the actual experience with someone you care about.

Anonymous 8/17/2025, 12:31:51 AM No.106284857 [Report] >>106284864

>>106284815
There is nothing new or fundamental there. It is still only a combination of other basic sensations. I know it's hard to understand now, but once you are older you will realize this and stop putting pussy on a pedestal.

Anonymous 8/17/2025, 12:33:03 AM No.106284864 [Report]

>>106284857
I'm not putting anything on anything, unc.

Anonymous 8/17/2025, 12:36:33 AM No.106284892 [Report] >>106284973

The one time I had sex in a dream it was like fucking air. It's like the brain didn't make any association with fucking a hand or something.
I vaguely remember a post about not being able to use guns properly in dreams until they used guns in real life.

Anonymous 8/17/2025, 12:38:17 AM No.106284902 [Report]

>>106283334
LD sucks for me due absolute perfect control. I am a god in my dream and nothing happens unless I will it to be. No fun, no surprises, no excitement. My regular dreaming is where I'm happiest.

>>106283977
The tongue can imagine near every sensation, then you just imagine your entire body as a giant tongue. EZ game.

Anonymous 8/17/2025, 12:45:13 AM No.106284973 [Report] >>106284999 >>106285009

>>106284892
Skill issue. On both sex and guns.

Anonymous 8/17/2025, 12:49:41 AM No.106284999 [Report]

>>106284973
I'm 80% sure wordcel vs shape rotator divide is genetic

Anonymous 8/17/2025, 12:50:32 AM No.106285009 [Report]

>>106284973
No one beats the clock curse though, funny how that is.

Anonymous 8/17/2025, 12:53:52 AM No.106285028 [Report] >>106285045

Chatbots are just software tulpas.

Anonymous 8/17/2025, 12:55:25 AM No.106285045 [Report] >>106285048

>>106285028
The nice thing about them is that you don't have to give yourself schizophrenia to use them, although it always remains an option.

Anonymous 8/17/2025, 12:55:56 AM No.106285048 [Report] >>106285206

>>106285045
They are a crutch for the imagination and discipline impaired.

Anonymous 8/17/2025, 1:01:19 AM No.106285085 [Report]

>>106284780
The limit is for private repos. I have a few public models and they eat 0 of my quota

Anonymous 8/17/2025, 1:16:20 AM No.106285206 [Report] >>106285218

>>106285048
and this life is a crutch for those undisciplined enough to kill themselfes

Anonymous 8/17/2025, 1:18:06 AM No.106285218 [Report]

>>106285206
and this life is a crutch for those not disciplined enough to kill themselfes*
fml brb

Anonymous 8/17/2025, 1:32:43 AM No.106285328 [Report] >>106285419

lol >>106283189

Anonymous 8/17/2025, 1:45:14 AM No.106285419 [Report] >>106285428

>>106285328
jart still using the same pfp as he had during the start of the obama presidency lol

Anonymous 8/17/2025, 1:45:22 AM No.106285420 [Report] >>106285537

>>106284801
If you're looking to have it generate pixel art game assets, I haven't found any loras that actually work (as of last year when I checked). Seems like a lot of loras are basically trained on Pixeljoint and have no idea how to maintain a consistent perspective. https://www.pixellab.ai/ claims to be able to make map tiles and such, but it's cloud shit

Anonymous 8/17/2025, 1:46:30 AM No.106285427 [Report]

How bad is glm air Q2 compared to Q3?

Anonymous 8/17/2025, 1:46:40 AM No.106285428 [Report]

>>106285419
not surprising as trannies don't exactly age well

Anonymous 8/17/2025, 2:05:43 AM No.106285537 [Report] >>106285647 >>106285821

>>106285420
am i really forced to draw stuff myself like some sort of primitive caveman?
how is it ai can't generate simple sprites yet.

Anonymous 8/17/2025, 2:07:04 AM No.106285546 [Report] >>106285608 >>106285613 >>106285626 >>106285635 >>106285806 >>106285886

hsaoxskfs1jf1.jpg md5: 5008362c...

Elon sir? When Grok 2 upload? Kindly do the needful.

Anonymous 8/17/2025, 2:19:27 AM No.106285608 [Report]

>>106285546
One more week

Anonymous 8/17/2025, 2:20:02 AM No.106285613 [Report]

>>106285546
Didn't he upload Grok 1 over the weekend? My faith in rocketman remains.

Anonymous 8/17/2025, 2:21:42 AM No.106285626 [Report]

>>106285546
sir why do not use gemma3 from google.
do you not even use the google chrome.

Anonymous 8/17/2025, 2:23:28 AM No.106285635 [Report]

>>106285546
He said that he'll make it happen this week. That doesn't mean that his team got around to fully do it yet.

Anonymous 8/17/2025, 2:25:13 AM No.106285647 [Report]

>>106285537
Well, it's been over 6 months since I messed with it, so there might be some newer stuff that can do it. I've been meaning to try some of the new img+text to image models to see if they can take a screenshot of the game + "add a fence around the house" and produce something reasonable that matches the style of the rest of the game assets. I was also thinking of trying to train a "16-bit JRPG" style lora using stuff from vgmaps.com

Anonymous 8/17/2025, 2:49:23 AM No.106285806 [Report]

>>106285546
Coming in five hours
Trust

Anonymous 8/17/2025, 2:51:28 AM No.106285821 [Report]

>>106285537
I wonder if you could find a way to get an animation out of Wan then parse it and generate moving sprites from that

Anonymous 8/17/2025, 2:52:20 AM No.106285826 [Report] >>106285840 >>106285850 >>106285873 >>106285902 >>106287739

>Deepseek confirmed out of the game for this circle after falling for the Huawei meme (or getting forced to use them by big Xi)
>Mistral just buried Large 3 by releasing (proprietary) Medium 3.1 and calling it their "frontier model" in the API docs
This might be really it for the summer releases. I hope you like what you have right now because you'll be stuck with it for the next couple of months at least.

Anonymous 8/17/2025, 2:54:20 AM No.106285840 [Report] >>106285914

>>106285826
>Deepseek confirmed out of the game for this circle after falling for the Huawei meme
why does that mean they're out of the game?

Anonymous 8/17/2025, 2:55:28 AM No.106285850 [Report]

>>106285826
>Xi: You humiliated the west, great work. What do you need to make that happen again.
>DS: More compute. Lots more compute.
>Xi: I gotchu.
Thanks to Qwen and GLM, I'm all set for the next few months at least.

Anonymous 8/17/2025, 2:56:24 AM No.106285856 [Report] >>106285885 >>106285909

https://www.reddit.com/r/StableDiffusion/comments/1mr602e/chroma_comparison_of_the_last_few_checkpoints/
Chroma? More like chroma that shit in the trash lmao. Furries apparently trained 1-48 on 512x images and only trained 49 and 50(final) on 1024x. What a bunch of retards. No wonder it looks like shit.

Anonymous 8/17/2025, 2:57:56 AM No.106285873 [Report] >>106285919

>>106285826
Bwo, your
Step?
Ernie?
GLM?
Qwen?
Kimi?

Anonymous 8/17/2025, 2:59:57 AM No.106285885 [Report]

>>106285856
I don't why Chroma saw any adoption at all. Some retard came here shilling it and every example he posted was trash in appearance and prompt following.

Anonymous 8/17/2025, 3:00:00 AM No.106285886 [Report] >>106285912

>>106285546
take your meds schizo

Anonymous 8/17/2025, 3:02:26 AM No.106285902 [Report]

>>106285826
Guess I'll have to keep enjoying Kimi and Dispsy. No big deal. A bit disappointing though, I was really hoping mistral would cook, but they went the dirty route of distilling from distill and now their model. Writes. Like. This.

Anonymous 8/17/2025, 3:03:32 AM No.106285909 [Report] >>106285951

>>106285856
Chroma STILL has no artist knowledge

Anonymous 8/17/2025, 3:03:48 AM No.106285912 [Report]

Google's AGI.png md5: 8bed54aa...

>>106285886
Your AGI, saar.

Anonymous 8/17/2025, 3:04:26 AM No.106285914 [Report] >>106285933

>>106285840
Deepseek made headlines after R1 came out because it became known that they write a lot of their own hardware code instead of just using CUDA even when they trained on Nvidia.
They likely did that to a much bigger extend for the underdeveloped Huawei architecture as well with lots of optimizations specifically to help train the unique aspects of DS4. This isn't something they can just port over back to Nvidia in a couple of weeks and throw on their H800 cluster. They likely can't just reuse their DS3 code either unless DS4 is just a shitty incremental improvement.
It'll take months to recover from this.

Anonymous 8/17/2025, 3:04:28 AM No.106285915 [Report]

man getting stabbed in the head with a sword.png md5: a38c7d7a...

>the/her/his {sound} cuts through the {atmosphere descriptor} like a {thing}
GLM 4.5, stahp... don't make me go back to Q1 DS v3

Anonymous 8/17/2025, 3:05:01 AM No.106285919 [Report]

>>106285873
kys zoomer tourist

Anonymous 8/17/2025, 3:06:37 AM No.106285933 [Report] >>106285949

>>106285914
That's unfortunate, although I can't really say it's a problem for me because I'll never have enough hardware to run deepseek. And it was never worth spending $10-20k on a computer just to run some shitty AI.
At least Qwen and Z.ai are still releasing relatively small models that I can actually use

Anonymous 8/17/2025, 3:07:51 AM No.106285949 [Report] >>106286011

>>106285933
>And it was never worth spending $10-20k on a computer just to run some shitty AI.
grapes

Anonymous 8/17/2025, 3:08:20 AM No.106285951 [Report] >>106286041

>>106285909
Why not? Pony, illustrious and noob have it. Are they... just retards with too much money?

Anonymous 8/17/2025, 3:15:31 AM No.106285989 [Report] >>106286015 >>106286025 >>106286028 >>106286062 >>106286087 >>106286243 >>106286267 >>106286309

mistral-medium-3.1__phylo_tree_parsimony_rectangular.png md5: c4425c03...

From the slop profile it's easy to see what are true foundational models and what are distills.

The only foundational models:
ChatGPT models
Gemini
Claude
Llama
Old mistral models (new ones are just ChatGPT 4o distill)
Old V3/R1 (new V3 is ChatGPT 4o distill, new R1 is Gemini distill)

Everything else is a distill.

Anonymous 8/17/2025, 3:19:04 AM No.106286011 [Report]

>>106285949
Not him but I've used Claude 3+ and enjoyed it a ton more than any other local model at the time. It felt like it really understood what you were looking for without you even directly telling it. It was amazing, relatively. But I would still not pay more than 5k if given the offer to run it locally. It just still wasn't smart enough. And still isn't, really.

Anonymous 8/17/2025, 3:19:29 AM No.106286015 [Report] >>106286167

>>106285989
>The only foundational models:
>ChatGPT models
>Gemini
>Claude
>Llama
The only open models of these is llama and those are likely finished. It's nice we get stuff from China, but it feels like the Golden Age of local models is behind us, even if we don't realize it yet.

Anonymous 8/17/2025, 3:20:37 AM No.106286025 [Report] >>106286036 >>106286074

>>106285989
Also interesting that everyone is distilling Gemini and ChatGPT, but no one is distilling Claude despite it's just as capable if not better.

Anonymous 8/17/2025, 3:20:56 AM No.106286028 [Report]

>>106285989
>Old mistral models (new ones are just ChatGPT 4o distill)
Other the super new ones which are literally just Deepseek

Anonymous 8/17/2025, 3:22:20 AM No.106286036 [Report] >>106286058

>>106286025
Did Claude or Grok ever even reveal their thinking to make them targets for distillation in the first place?

Anonymous 8/17/2025, 3:23:01 AM No.106286041 [Report] >>106286088 >>106286091

>>106285951
Clearly they don't have enough money because NAI which is also based on SDXL, has artist knowledge down to ~50 pic artists on boorus.

Anonymous 8/17/2025, 3:25:51 AM No.106286058 [Report]

>>106286036
Yeah, Claude gives you the full reasoning process through the API at least. I guess Opus 4 is too new to have influenced the current generation of models and the chinks didn't want to distill 3.7 Sonnet for not technically being part of a flagship line in name.

Anonymous 8/17/2025, 3:26:21 AM No.106286062 [Report]

>>106285989
Llama and Mistral are distilled from GPT 3.5 and 4. Llama unintentionally(they believed that ScaleAI data was human data) and Mistral intentionally.

Anonymous 8/17/2025, 3:27:26 AM No.106286074 [Report] >>106286109

>>106286025
Low lmarena scores and it's more expensive.

Anonymous 8/17/2025, 3:29:08 AM No.106286087 [Report]

>>106285989
>foundational models
I don't think that means what you think it means anon. You can do a pretrain off pre-2020 internet with no synthetic data and then tune on GPT so it has a GPT slop profile, but the base model will still be a separate model not distilled from anything.

Anonymous 8/17/2025, 3:29:12 AM No.106286088 [Report]

>>106286041
Maybe they didn't train enough on tags and instead used one of those slop-producing image recognition models.

Anonymous 8/17/2025, 3:29:38 AM No.106286091 [Report]

>>106286041
I wonder, are there are training techniques that give artists or tags with fewer examples, more iterations or other methods that give them more weight or presence in the datasets?

Anonymous 8/17/2025, 3:31:27 AM No.106286109 [Report] >>106286146

>>106286074
Claude has the largest marketshare for enterprise and coding. It's actually good without being benchmaxxed and ELOmaxxed.
https://menlovc.com/perspective/2025-mid-year-llm-market-update/

Anonymous 8/17/2025, 3:35:55 AM No.106286146 [Report]

>>106286109
People who distill usually only care about benches

Anonymous 8/17/2025, 3:37:54 AM No.106286160 [Report] >>106286172

There is still one (1) big open release upon us before things go quiet until December. Get ready.

Anonymous 8/17/2025, 3:39:23 AM No.106286167 [Report] >>106286190

>>106286015
To be fair, I think there's a very real possibility that the golden age of LLMs will soon be behind us altogether
Maybe Google will pull something crazy out of its ass with Gemini 3 that raises the bar again, but GPT-5 sure wasn't fucking it, and I suspect Claude 4.1 Opus was originally supposed to be a bigger version jump
Feels like we need another breakthrough

Anonymous 8/17/2025, 3:40:22 AM No.106286172 [Report] >>106286191

>>106286160
Who? Grok? Gemma 4?

Anonymous 8/17/2025, 3:42:35 AM No.106286190 [Report] >>106286198

>>106286167
Imagine if they could combine Genie with Gemini. It internally creates a world model when you ask it something. Essentially imagining scenarios in its head. No more issues with spatial intelligence, anatomy, etc. Complete understanding of 3D spaces and physical common sense.

Anonymous 8/17/2025, 3:42:39 AM No.106286191 [Report] >>106286211 >>106286239

>>106286172
Either Mistral or DeepSeek and bother are likely to be massive disappointments

Anonymous 8/17/2025, 3:43:50 AM No.106286198 [Report] >>106286299

>>106286190
That is LeCun's dream. Would laugh if someone beat him to it.

Anonymous 8/17/2025, 3:45:23 AM No.106286211 [Report] >>106286286

>>106286191
Mistral sure will be a disappointment, Dipsy will be another one of incremental updates like 0324 and 0528

Anonymous 8/17/2025, 3:50:14 AM No.106286239 [Report]

>>106286191
Mistral just indirectly killed off Large 3. Deepseek won't deliver anything for a while after the Huawei fuckup.

Anonymous 8/17/2025, 3:51:05 AM No.106286243 [Report]

>>106285989
Can you do this for a few sloptuners comparing the base models with their slop version? Would be the closest thing to answering the question if slop tunes even do anything.

Anonymous 8/17/2025, 3:54:08 AM No.106286267 [Report]

1749146240421120.png md5: 1300f8dc...

>>106285989
>Pixtral-Large 120b is more like Nemo than Mistral Large 120b
Did everyone sleep on the true 120b Nemo just because it was marketed as a vision model??

Anonymous 8/17/2025, 3:57:51 AM No.106286286 [Report]

>>106286211
even if it's just a incremental update as long as it has a updated dataset with more recent information i'll be glad to use it.

Anonymous 8/17/2025, 4:00:36 AM No.106286299 [Report]

>>106286198
Well LeCun is basically a consultant that's simply just named "head of research" so that's obvious. Research at Meta is a political hellscape with limited budget. Maybe things will be different with the Superintelligence team shaking up the company's approach to AI, but maybe not in a good way. I fully suspect that if things keep going the way they are, Google will be the first to make an effective world model + LLM system.

Anonymous 8/17/2025, 4:01:26 AM No.106286309 [Report] >>106286318

>>106285989
>Old mistral models
Doesn't that graph say that they're all "distilled" from gpt-4o-mini?

Anonymous 8/17/2025, 4:03:29 AM No.106286318 [Report] >>106286351

>>106286309
no

Anonymous 8/17/2025, 4:09:30 AM No.106286351 [Report]

>>106286318
I can't see how it's any different from "chatgpt-4o-latest-2025-01-29" on the other side.

Anonymous 8/17/2025, 4:12:20 AM No.106286362 [Report] >>106286372 >>106286387

https://xcancel.com/Alibaba_Qwen/status/1956894352791511164
qwen image edit soon, probably monday

Anonymous 8/17/2025, 4:14:04 AM No.106286372 [Report] >>106286404 >>106286408 >>106286454 >>106286720

>>106286362
I wish Qwen was good. Imagine if there was a halfway competent company that puts out new stuff this frequently.

Anonymous 8/17/2025, 4:17:10 AM No.106286387 [Report]

>>106286362
I'm more excited for their image and video models than for their benchmaxxed llms.

Anonymous 8/17/2025, 4:19:33 AM No.106286404 [Report]

>>106286372
hard to be good when brain drain takes all of your talent away

Anonymous 8/17/2025, 4:20:16 AM No.106286408 [Report] >>106286506

>>106286372
take the red pill: qwen is actually good

Anonymous 8/17/2025, 4:20:59 AM No.106286411 [Report]

Working on a project where I made a program that parsed and mapped out thousands of choose your own adventure paths and strory lines. I want to tag and classify them better since I cant read them all. Keyword tagging isn't super effective. Any recommendations for a self hostable nsfw friendly model that would help with either tagging or summarizing stories that can be up to ~300k characters (I can chunk if needed for context length thats fine)

Anonymous 8/17/2025, 4:27:23 AM No.106286454 [Report]

>>106286372
I thought they were 100% incompetent through the first 3 release
Since the recent 2507 updates, both the instruct and thinking models are actually good. Image model too
I just really hope they don't drink from the slop chalice OSS put in front of them for their next models

Anonymous 8/17/2025, 4:36:03 AM No.106286506 [Report] >>106286532 >>106286792

>>106286408
i dunno man, i wish i could agree with you but i feel like glm 4.5 air puts out better stuff than qwen 3 235b

Anonymous 8/17/2025, 4:42:14 AM No.106286532 [Report] >>106286550

>>106286506
NTA but Air is nice for it's size, but it REALLY shows that it's half the parameters of qwen 235b in things like spatial reasoning.
I really wish it was better, because I can run it so much faster.

Anonymous 8/17/2025, 4:45:28 AM No.106286550 [Report]

>>106286532
Yeah, same here
Its style and knowledge base are fantastic, but when it actually comes time to put the square peg in the square hole you can see its cracks really quickly. Especially for things like programming

Anonymous 8/17/2025, 4:55:41 AM No.106286608 [Report] >>106286618 >>106286744 >>106286793

Let's say that I actually do want to ask some STEM questions. Is GPT-OSS the best model under 300B for that?

Anonymous 8/17/2025, 4:57:20 AM No.106286618 [Report]

>>106286608
Qwen is better

Anonymous 8/17/2025, 5:11:54 AM No.106286720 [Report]

>>106286372
2507 is good.

Anonymous 8/17/2025, 5:14:38 AM No.106286744 [Report]

>>106286608
Qwen 235B (latest revision), then gpt-oss-120b, then GLM 4.5 Air, then nothing because nothing else is good. Also be warned that oss will always be trying to conjure up some bizarre reason why your question is harmful so it can refuse, so you should only use it if you're manually prompting and rerolling and/or prefilling, because using it in an automated workflow is unreliable without having a way to correct refusals.

Anonymous 8/17/2025, 5:17:28 AM No.106286763 [Report] >>106286787

>>106278215 (OP)
Any new open source music generator?

Anonymous 8/17/2025, 5:20:57 AM No.106286787 [Report] >>106286798

>>106286763
Ace Step

Anonymous 8/17/2025, 5:22:12 AM No.106286792 [Report]

>>106286506
air is a nice model for sure, but nu-235b's smarts and better long context make it the easy winner in that comparison for me

Anonymous 8/17/2025, 5:22:25 AM No.106286793 [Report]

>>106286608
gpt 120 and 20 are pretty good at graduate level math. It hallucinates sometimes in arithmetic math though, but you can fix that with a strict prompt

Anonymous 8/17/2025, 5:23:25 AM No.106286798 [Report] >>106287078

>>106286787
Also DiffRhythm

Anonymous 8/17/2025, 5:23:35 AM No.106286800 [Report] >>106286844

What did they change with the kobold.cpp partial layer offloading?
In the last version I could find my GPU layers sweet spot easily, because kobold.cpp just crashed when it ran out of VRAM.
Now I noticed that it lets me offload 90/90 layers of a 16 GB model onto a 8 GB VRAM card, but it slows down. Now I actually have to make several tests looking at my token/second to find the sweet spot because it no longer crashes if give it too many layers.

Example: Fallen-Gemma3-27B-v1c-Q4_K_M.gguf on a 8GB card.

Offloaded layers 30: 4.1 tokens/second (sweet spot)
layers 20: 3.4 t/s (too few)
layers 40: 3.2 t/s (too many)

Anonymous 8/17/2025, 5:27:01 AM No.106286824 [Report] >>106286834 >>106286841

amagi.jpg md5: 22d8710c...

"mistral, do you have memory/persistance?"
>"yes"
"kk cool, is there a file or directory for whatever you plop onto your base model so I can make backups?"
>(vague answer but basically "yeah bro for sure totally man")
"k watevs i can Jewgle it later xD"
"ok cool, here are some parameters-oops, wrong .txt file containing the specs for every computer I own"
(restart server because it's thinking 5evar)
"whoops lol, ignore previous prompt misinput"
>"what previous input? I am le language modeblablablabla"
junk. JUUUUUUUUUUUUUUNNNK!!!! USELESS
let me know when I can restart my server without losing all progress in making myself a personal command line Amagi, till then this shit is worthless lmao

Anonymous 8/17/2025, 5:28:27 AM No.106286834 [Report]

>>106286824
Did Mistral write this post

Anonymous 8/17/2025, 5:29:44 AM No.106286841 [Report]

>>106286824
Pure user issue.

Anonymous 8/17/2025, 5:30:38 AM No.106286844 [Report] >>106286855

>>106286800
From the sounds of things, partial offloading just didn't work on your system in the older versions. Divide the total filesize of your model by the total number of layers and leave some room for context.
Gemma 3 specifically eats a shitload of RAM for context unless you play with the SWA setting, but you can't use that with context shift so it's not worth doing.
With an 8GB card you should really be using Gemma 12b.

Anonymous 8/17/2025, 5:33:32 AM No.106286855 [Report] >>106286865 >>106286872

>>106286844
I was mainly playing around with models that fully fit into my 8 GB VRAM for the last year, but I just got a Ryzen 9800X3D wth 64GB RAM and want to see how fast can I go with partial offloading and using some CPU. I'm a slow reader, so 4 t/s is enjoyable.

Anonymous 8/17/2025, 5:34:55 AM No.106286865 [Report] >>106286875

>>106286855
Also forgot to ask, maybe I should use some MoE model instead if I want to test my CPU's capabilities? I'm not very familiar with the topic since I had a really shitty CPU before this and could only use models that fully fit into VRAM.

Anonymous 8/17/2025, 5:35:49 AM No.106286872 [Report] >>106287092

>>106286855
You might look into trying newer MoEs like GLM 4.5 air then, that'll make use of your memory while still being usable speed-wise.

Anonymous 8/17/2025, 5:36:18 AM No.106286875 [Report] >>106287092

>>106286865
Yes. With those specs you could use GLM 4.5 air Q3KS with all experts on the CPU and the rest of the model in the GPU and a decent amount of context.

Anonymous 8/17/2025, 5:38:24 AM No.106286896 [Report] >>106286920

Screenshot 2025-08-17 at 00-37-51 aLoRA Support by gabe-l-hart · Pull Request #15327 · ggml-org_llama.cpp · GitHub.png md5: 5bee181f...

Can you guys think of anything cool one could do with this?

Anonymous 8/17/2025, 5:41:00 AM No.106286920 [Report]

>>106286896
1. Tune lora A on author A and lora B on author B
2. Let each generate one token at the time
3. ???
4. Reduced chance of slop and repetition!

Anonymous 8/17/2025, 6:07:19 AM No.106287078 [Report]

>>106286798
DiffRhythm was terrible last time I used it, the only two somewhat viable open musicgen models are Ace-Step and YuE, that I've seen.

Anonymous 8/17/2025, 6:10:46 AM No.106287092 [Report] >>106287108

>>106286872
>>106286875
Amazing.Thanks anon.
I didn't even play with the tensor regex filters, and I have a 52 GB model running at 7 t/s. This is pure magic.

Anonymous 8/17/2025, 6:14:31 AM No.106287108 [Report]

>>106287092
>I didn't even play with the tensor regex filters
If all you are doing is moving experts to RAM, there's a new --n-cpu-moe <number> parameter that simplifies things a bit.
For example, if you just do exp=CPU with GLM, you are moving the shared experts to RAM, so you'd actually want exps=CPU to move only the variable ones. The new argument abstracts that and just moves the non-shared experts to RAM.

Anonymous 8/17/2025, 6:38:10 AM No.106287218 [Report]

>>106287207
>>106287207
>>106287207

Anonymous 8/17/2025, 8:38:49 AM No.106287739 [Report]

frontier-class.png md5: 39bf8757...

>>106285826
>Mistral just buried Large 3 by releasing (proprietary) Medium 3.1 and calling it their "frontier model" in the API docs
They called Medium 3.0 frontier-class too, nigger.
https://web.archive.org/web/20250507144143/https://docs.mistral.ai/getting-started/models/models_overview/