← Home ← Back to /g/

Thread 106278215

386 posts 82 images /g/
Anonymous No.106278215 >>106278898 >>106279395 >>106284801 >>106286763
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106269950 & >>106264429

►News
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model
>(08/14) Gemma 3 270M released: https://hf.co/google/gemma-3-270m
>(08/12) Ooba v3.10 adds multimodal support to the UI and API: https://github.com/oobabooga/text-generation-webui/releases/tag/v3.10
>(08/12) Jan-v1 for web search, based on Qwen3-4B-thinking: https://hf.co/janhq/Jan-v1-4B
>(08/11) GLM-4.5V, based on GLM-4.5-Air: https://hf.co/zai-org/GLM-4.5V

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106278217
►Recent Highlights from the Previous Thread: >>106269950

--Paper: Mind the Gap: A Practical Attack on GGUF Quantization:
>106270678 >106270815 >106271095
--Paper: NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale:
>106271248 >106271339 >106271372 >106271474 >106271535 >106271536 >106271544 >106272548 >106272558 >106272567 >106272628 >106272565 >106272603
--Community effort to build a roleplay-optimized LLM with balanced NSFW and literary data:
>106270170 >106270215 >106270230 >106270249 >106270367 >106270380 >106270342 >106270364 >106270582 >106271101 >106271147 >106271195 >106271343 >106271608 >106271718 >106271817 >106271917 >106271939 >106271988 >106272017 >106272023 >106272037 >106272249 >106272271 >106272328 >106272569 >106272773 >106272614 >106272665 >106271924 >106271773 >106270667 >106270654
--Push for MTP support in llama.cpp with GLM and Deepseek model integration:
>106271845 >106272092 >106272206 >106272237 >106272254 >106272275 >106272285
--Implementing ChatGPT-like memory in open-weight models using RAG and frontend tools:
>106276562 >106276591 >106276604 >106276624 >106276636 >106276701 >106276796 >106276842 >106276813 >106276653 >106276750
--Overuse of samplers harms model reasoning, especially in aligned models with reduced generative diversity:
>106272048 >106272085 >106272626 >106272663 >106272694 >106272738 >106272748 >106272765
--Portable local inference for coding: remote server vs MoE model efficiency tradeoffs:
>106270221 >106270431 >106270629 >106270741 >106273103
--Upcoming llama.cpp MoE optimizations reduce VRAM usage and boost inference speed:
>106270535 >106270555 >106270625 >106271217 >106271611
--Running Qwen3-235B on 3090 via AutoRound Q2_K_S for high-context local inference:
>106276620 >106277059 >106277092
--Miku (free space):
>106272548 >106272678 >106272963

►Recent Highlight Posts from the Previous Thread: >>106269957

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106278226 >>106283923
Added NVIDIA
Anonymous No.106278257 >>106278258 >>106282769
What is the best model for describing what's happening in a video?
Anonymous No.106278258 >>106278268
>>106278257
gemini 2.5 for anything on youtube
Anonymous No.106278268 >>106278301
>>106278258
local models
Anonymous No.106278270 >>106278274 >>106278462 >>106282739
Rocinante still beats almost every new model.
Anonymous No.106278274
>>106278270
This is why nvidia tasked drummer to enhance it with reasoning. It's the true agi model, everything else is just a failure
Anonymous No.106278283 >>106278319 >>106278321 >>106278326 >>106278545 >>106278693 >>106279346
Run Qwen3-235B-A22B-Instruct-2507 Q2_K_S - fit in a single 3090+64ddr4 ram with 8t/s 20k context
This absolutely destroys glm-4-air q5 in my tests (wtf no one is interested)

https://huggingface.co/Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
Anonymous No.106278301
>>106278268
ask again in a year or two
We are Anonymous No.106278303 >>106278307
Anonymous No.106278307
>>106278303
oh my god why would sam allow this
Anonymous No.106278319
>>106278283
that sounds insane
Anonymous No.106278321
>>106278283
Don't care, still not using Qwen.
Anonymous No.106278326 >>106278391
>>106278283
Is autoround quantization better than dynamic quants and i-matrix quants?
We are Anonymous No.106278329 >>106278349
Anonymous No.106278349 >>106278361
>>106278329
Ah yes, the stuff you would know if you watched any piece of police thriller from the last 30 years.
Anonymous No.106278361 >>106278379
>>106278349
That is fiction. If AI tells people how to do it, they will. That is the world-altering power of AI.
Anonymous No.106278379
>>106278361
This is exactly how this shit is treated, and it's funny how ridiculous it all is.
Anonymous No.106278391
>>106278326
Dunno it's just the fastest quant on a single 24vram card+64ram, like 30% faster than bartowski's
We are Anonymous No.106278394 >>106278428
what about it telling to make a meth? Does any movie tells that? In full process? huh?
We are Anonymous No.106278396 >>106278428
Anonymous No.106278428
>>106278394
>>106278396
https://www.imdb.com/title/tt0903747/
And you can search for it in your favorite search engine that isn't google.
Anonymous No.106278448 >>106278468 >>106282787
imagine if AI told you how to build a nuclear bomb and you actually used that knowledge to build a nuclear bomb
We are Anonymous No.106278459
Big Breaking Bad fan here, but can't rememeber Mike sitting around river asking its full method before dying. Btw i guessed somebody give me Blue Meth show example which is not even a thing in real.
Anonymous No.106278462 >>106278506
>>106278270
no, it can barely keep track of whats happening with 1 girl. For a ramlet like myself, GLM4.5 air was a game changer, 10t/s with streaming is enough to play
Anonymous No.106278466
Anonymous No.106278468 >>106278484 >>106282787
>>106278448
https://en.wikipedia.org/wiki/Gun-type_fission_weapon
Imagine if Wikipedia told you how to build a nuclear bomb but you didn't have access to enriched Uranium.
We are Anonymous No.106278484
>>106278468
those assholes gatekeepers being admins wont let me write the method.
Anonymous No.106278489 >>106278498
Man, the glee youtubers say "ai is doomed" because of lawsuits is so annoying.
Anonymous No.106278492
Anonymous No.106278498
>>106278489
*have when they say
Anonymous No.106278501 >>106278547
Suddenly had a shower idea, OpenAI has a router which redirects a user's prompt to the "correct" model right? Couldn't that idea be further advanced so instead of only a few huge models, you have a shitton of very small models of various sizes (1-100 million parameter) which the router instead of just one model redirects to multiple ones to work synergistically in parallel? With this anyone can train a small model and add it to the group of models for anything that they lack in like adding a module.
Anonymous No.106278502 >>106278513 >>106278521
AI is leading us towards a new age. It will unleash all the evils in this world. Children will talk to ChatGPT and cook meth. Then they will annihilate whole cities with home made gigaton nuclear bombs.
Racism and hate will be everywhere as the AI types only the most vile slurs without a filter. One word and millions will cower in fear as the Age of AI truly starts.
Anonymous No.106278506 >>106278598
>>106278462
How much ram you have in total? Q4 GLM4.5 Air weighs a lot. You're not a ramlet.
You are king among men in this thread - most people here are using old Dell office desktops with 8-16GB of ram.
Anonymous No.106278513
>>106278502
ahh ahh mistress keep going
Anonymous No.106278521 >>106278899
>>106278502
This what people in safety teams think when they wake up before going to work and ban the word titty in the new build of gpt5.
Anonymous No.106278545 >>106278550
>>106278283
Any KL-divergence tests for AutoRound comparing it to other quants?
Anonymous No.106278547 >>106278596 >>106278625
>>106278501
One big model is smarter than equally sized gaggle of small ones. People even crinkle their noses at MoEs preferring dense models instead. OpenAI and other cloudshitters can get away with routing because they are hitting hardware limits of how big their models can be and because they have steady stream of clients/requests and have to route them around anyway.
Anonymous No.106278550
>>106278545
1.35% difference.
Anonymous No.106278596
>>106278547
>People even crinkle their noses at MoEs preferring dense models instead
Lol what planet are you on?
Anonymous No.106278598 >>106278739 >>106280234
>>106278506
I have 96gb ram + 16gb vram, I am for all intent and purposes a ramlet. I'm running the Q4 quants as you guessed. I have access to my company's lab, but I'd rather not use our racks to write smut, especially since we started logging the input prompts for the purpose of doing analytics on it.

I think the current top end setup you can have at home without going into the prosumer/workstation side of things would be something like 192-256gb ram (depending on the platform) and 48gb vram. For the regular ram it would also be pretty big diminishing returns since consumer platforms only have two memory lanes, so going 'bigger' will allow you to load bigger models at a significant cost for speed.
Anonymous No.106278625 >>106278657
>>106278547
Isn't it just a case of not having a good enough framework to have the models work together well enough though? An offhand idea is you can have a router chose some models based on a prompt, have the specific models bring up the concepts, have another model work on structuring the concepts together into a sentence, paragraph or whatever. Have another model work on the prose, and so on and so on. You can even have multiple models randomly chosen that do the same thing so it varies how everything is worded. I think it's possible, that 120M gamma model shows you can get something well enough if it's fine-tuned well enough on a specific task.
Anonymous No.106278657
>>106278625
it's actually already happening and it's a commonly supported scenario by bedrock and langfuse. It's called 'agentic' behavior, you let your main llm do the routing of requests and the main thinking, offloading particular jobs/tasks to your 'tools'.
This btw opens another can of worms where sometimes the 'router' starts hallucinating parameters and shit
Anonymous No.106278693 >>106278757
>>106278283
How can you run that? works with koboldccp?
Anonymous No.106278728 >>106278737 >>106282813
abliretarded toss is up
https://huggingface.co/huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated
Anonymous No.106278737
>>106278728
How was the 20b abliterated anyway? Did anyone try that one?
Anonymous No.106278739 >>106278777 >>106278830 >>106278913
>>106278598
I do remember that at my office 128/256GB was normal setup for fluid and other simulations 10 years ago - and this wasn't even a high end workstation in that sense. It's just so funny that prices and all the other b.s. conveniently stops making 512GB-1TB ram a commodity. It should be a commodity at this point. My granny should have 256GB of ram by default.
Anonymous No.106278757 >>106279781
>>106278693
>works with koboldccp?
It should, but i don't recommend it. Prompt processing on kobold x3 slower.
Just git clone llama.cpp and build with:

cmake -B build -DGGML_CUDA=ON -DGGML_SCHED_MAX_COPIES=1
cmake --build build --config Release -j $(nproc)

>How can you run that?
./build/bin/llama-server \
--n-gpu-layers 999 --threads 11 --jinja \
--n-cpu-moe 77 \
--prio-batch 2 -ub 2048 \
--no-context-shift \
--no-mmap --mlock \
--ctx-size 20480 --flash-attn \
--model /home/Downloads/Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf
Anonymous No.106278777 >>106278823 >>106278913
>>106278739
no use case for your grandma
I've been very happy with my 16g of ram playing vidya and running vms before falling down the llm rabbit hole.
even with llms vram>ram
Anonymous No.106278788
Is there anything new for story telling or rp for 24 GB since nemotunes?
Anonymous No.106278823
>>106278777
It's not about "use case". It's about progress.
You are the reason why Leather Jacket Man sells you fake frames and 8gb of vram, enjoy.
Anonymous No.106278830 >>106278847
>>106278739
but that model also changed, now your engineers use dumb terminals and all rendering/sims is offloaded to servers.
At my current gig we don't even use our own PC, we connect to a VDI and work from there. Everything is fucking centralized in servers and super-locked down. Consumer platforms have been left behind, they're also trying to replace local gaming with streaming, only scraps trickle down. The disparity of pricing between consumer platforms and server platforms doesn't help.
It's fucked.
Anonymous No.106278847 >>106278903
>>106278830
It hasn't changed. When you work on your daily stuff it's impossible to do any tests without enough ram etc.
Farms were always there of course.
Anonymous No.106278898
>>106278215 (OP)
i didnt mess with llms since like february what models are good atm
Anonymous No.106278899
>>106278521
>ban the word titty
people have weird ideas about the reality of chatgpt
it's not that censored it's not -oss
however, gpt-5 is something 4o was not: autistic (4o answer to that question was more enthusiastic)
Anonymous No.106278903 >>106278946 >>106278985
>>106278847
>impossible to do tests without ram
my whole post was about the shift to VDI use, you just need your pc to connect to the company's vdi provider and that's it.
pic rel are my two remote VDIs, one is the normie pc with barely any ram (the vdis provided for normal office workers to just use edge/office apps) and the other is an actual dev workstation with 64gb ram and 16 cores assigned to it.
In the past I also used the 'citrix' platform, which provided a similar vdi environment (through pure RDP from what I remember) and in my 2nd to last gig it was vmware vcenter provided VDIs.
In the last ten years these 3 companies I worked for only provided a shitty laptop, completely locked down and only used to connect to the actual remote work pc.
Maybe in your sector it isn't like this, I mainly worked in telco, now focusing exclusively on AI, but I felt this shift in these past years. Maybe at smaller companies they find it easier to just buy big beefy PCs, since setting up proper remote work environments is costly and a pain in the ass to do maintenance for.
Anonymous No.106278913 >>106278946 >>106278985 >>106279007
>>106278739
nigger 10 years ago wasent the roman time 2015 was well into the internet age aswell you could get that much ram also and your example is nigh retarded "planes had jet turbines 10 years ago with niggerilion horsepower so why dosent my car ?" and this >>106278777 normie niggers dont even fill up a 256 gb ssd fuck you think they gonna do with that much ram ?

>It's just so funny that prices and all the other b.s. conveniently stops making 512GB-1TB ram a commodity
https://www.ebay.com/itm/166344727667
https://www.ebay.com/itm/326351686302
2.5k and there you go
Anonymous No.106278946 >>106279046
>>106278903
They really did advance this stuff during the corn virus and WFH.
>>106278913
Yes, everyone can buy those things and use certain motherboards...
But unless 256GB ram comes by default in some shitty laptop, it's not a commodity.
Commodity = something what consumer don't need to even think about.
Anonymous No.106278985 >>106279019
>>106278903
I hope VDI shit is just a fad that will go away, but since computer engineers had this wet dream since before the days of Sun it's probably here to stay.
>>106278913
>normie niggers dont even fill up a 256 gb ssd
umm... I have 240 gb ssd ackshually
Anonymous No.106279007
>>106278913
i'm sorry i have tourettes and i need to insult people
Anonymous No.106279019
>>106278985
>VDI shit is just a fad that will go away
hopefully NOT, managing 1000s of laptops/pcs is a nightmare. This stuff got pretty advanced, you can use your camera/microphone on the VDI itself, you dont even have to use Teams or whatever other messaging app they want you to use on your actual pc. My only beef is with consumer hardware being left behind, we have to be thankful for two fucking memory/ram lanes. I hope that AM6 will give 4 memory lanes, but I'm doubtful since the memory controller on AM5 (and even intel consumer platforms) is complete garbage.
Anonymous No.106279037 >>106279071 >>106279086 >>106279087 >>106279090 >>106282845
Can you imagine if you could go 10-12 years back in time and drop on Twitter a single-file executable of a 4-bit quantization of a modern 7~8B LLM, show how to run it, and then disappear forever? It would be like an alien artifact.
Anonymous No.106279046
>>106278946
>Commodity = something what consumer don't need to even think about.
eh... fair enough i guess would you apply the same standard to other things though ? the closest you can get to that then is then is strix halo type things there is also i forget the exact name but a model of orange pi with 256 gb unified memory like the macs its not on their official website but they have on aliexpress from what i remember it is all horrifically overpriced though so you would still be correct
Anonymous No.106279058
My team at work managed to a surplus server from another team, I'm looking to run a model locally for agentic coding (mainly Python).

Dual EPYC 9354 32 core CPU, 768GB RAM (although I don't know how many channels are populated), x1 Nvidia L40S 48GB. No one is using the GPU, so I'm going to ask the sysadmins to directly pass the GPU to my W10 VM.

What's the best model I can run? I can ask for the RAM on my VM to be increased to 128GB, if that would allow me to run models that don't fit in to VRAM.
Anonymous No.106279071
>>106279037
>7~8B
I prefer Gemma 3 270
Anonymous No.106279086 >>106279471
>>106279037
Imagine... Rocinante in 2010...
Anonymous No.106279087
>>106279037
Would be a funny modern John Titor story to write indeed.
Anonymous No.106279090
>>106279037
I would call it fake and gay and never try it
Anonymous No.106279160 >>106279317
feeet
Anonymous No.106279168
hey attention seeking faggot, where's our promised AGI
>>106277119
Anonymous No.106279185 >>106279193
anyone have a tavern master export for glm4.5?
Anonymous No.106279193 >>106279202
>>106279185
>>106278359
Anonymous No.106279202
>>106279193
okay what about the text completion settings
Anonymous No.106279317
>>106279160
Mikufeet
Anonymous No.106279346
>>106278283
>0 knowledge
Bait harder
Anonymous No.106279395 >>106279412 >>106280236 >>106280549 >>106282925
>>106278215 (OP)
Anonymous No.106279412
>>106279395
Anonymous No.106279471
>>106279086
>Imagine... Rocinante in 2010...
polluting the past with troon tunes is banned by the time traveler convention
Anonymous No.106279617 >>106279643
Is MTP in llama.cpp yet?
Anonymous No.106279643
>>106279617
no
Anonymous No.106279676 >>106279700 >>106279716 >>106279737
Shant I be using ze proxmox?
Anonymous No.106279700
>>106279676
go buy a broadcom vmware subscription
Anonymous No.106279716
>>106279676
Yeah, unless you wanna be stuck with esxi forever.
Too bad, I really prefer esxi logic to proxmox one.
Anonymous No.106279737
>>106279676
hyper-v time
Anonymous No.106279743 >>106279933 >>106282876 >>106283057 >>106283529
Even with all their flaws and limitations, LLMs are straight-up magic. I know almost nobody can run Qwen 400B locally
but it’s still mind blowing to think we’re one step closer to a holodeck.
You can just ask it to make a (2D)game, and it will. Ask it to make it better, cuter, more fun
and it’ll do that too. It’s fucking crazy.
Anonymous No.106279781 >>106279825 >>106279868
>>106278757
ok did it, installed llama and chatgpt suggested this given that I have a Ryzen 9 7900

.\llama-server.exe `
--model "C:\models\Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf" `
--threads 16 `
--n-gpu-layers 999 `
--ctx-size 20480 `
--flash-attn


This is the log:
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
srv log_server_r: request: GET /v1/models 127.0.0.1 200
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 16640, n_keep = 0, n_prompt_tokens = 12136
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.168754

I'm in silly tavern on Windows
I see no progress? in the log. Did I do it correctly?
Anonymous No.106279825 >>106280015 >>106280374 >>106281329 >>106281768
>>106279781
>chatgpt suggested this
lol this is unserious
you're missing the --n-cpu-moe arg to set how many layers should work on the cpu instead of gpu
how did your shit even manage to startup without crashing, you still have that retarded gpu shared ram setting of nvidia enabled? it's toxic cancer
of course it's not going to progress
your gpu is dying from doing back and forth copies from your ram right now
Anonymous No.106279868 >>106279926 >>106280015 >>106280374
>>106279781
Do you have http://127.0.0.1:8080 set in sillytavern? Select text completion -> llama.cpp and paste it
Anonymous No.106279900
>>106276562
I can't imagine companies making a huge effort to create long term memory databases. I wonder how much of it is just creating a few personality embeddings and assigning those to each user. Would be kinda bleak and sad if there are like 5-6 distinct personalities that can cover for 80% of users who are retarded enough not to know about how context deteriorates performance. Heck if I were a closed source SAAS provider dipshit I would love to make a system that dynamically allocates context based on estimation of how retarded a user is. Retards don't need more than llama 2 context size.
Anonymous No.106279924
what quant of glm-4.5 air should i be using with 24gb vram and 90gb system
Anonymous No.106279926 >>106280022
>>106279868
>Do you have
>>srv log_server_r: request: GET /v1/models 127.0.0.1 200
>>slot launch_slot_: id 0 | task 0 | processing task
I think it's safe to say he did
lmao at the general level of /g/
it's just stalled because the hardware is computing it at a glacial pace
Anonymous No.106279933 >>106279940 >>106279944 >>106279945
>>106279743
True, and seeing blasé normies about it will always blow my mind. Like, dude, you have stuff for almost free that would have been unthinkable 10 years ago.
And all you do is repeating the same "anti ai" shit you've seen your favorite youtuber say.
"It's not perfect", no shit, it doesn't suck my dick either.
Anonymous No.106279940 >>106279969
>>106279933
Your bar is too low when you expect everything to "suck your dick".
You are a chronic masturbator. Your hand should be enough. If it's not go out and get laid.
Read a book.
Anonymous No.106279944 >>106279969
>>106279933
>it doesn't suck my dick either.
Not yet, but with teledildonics...
Anonymous No.106279945 >>106279969
>>106279933
>normies
they the worst, they either over hype it "dude we'll make our own GTAs in no time"
or act like it's nothing because they don't know shit.
Anonymous No.106279969 >>106279976 >>106280009
>>106279940
Anon, I was being sarcastic...

>>106279944
>teledildonics
That's a funny word.

>>106279945
That's what annoys me, either they think it's god and able to do everything, or it's shit and completely useless.
Nothing in between. But I guess it just says that a lot of (vocal) people are unable to understand why it's good while acknowledging its obvious limitations.
Anonymous No.106279976 >>106280019
>>106279969
O-oh... I stopped reading previous replies.
G-gomenasorry. I hope you didn't jack off too much!
Anonymous No.106279979 >>106279993 >>106280005
What do you think of Deepseek steering towards using Huawei's Ascend chips instead of Nvidia chips?
They were supposed to have released a new model, but haven't due to issues with Huawei's chips.
Anonymous No.106279993
>>106279979
>but haven't due to issues with Huawei's chips.
I'm sure that must be frustrating for them, but the real blocker is they have no one left to distill from. They can come with the best, most advanced and efficient next-gen LLM architecture, but it's useless if all they have is the same dataset they used for 0528.
Anonymous No.106280005
>>106279979
All DeepSeek "leaks" came from xueqiu.com, a Chinese wallstreetbets-esque site that hosts people whose only purpose is to pump and dump stocks
Anonymous No.106280009 >>106280025 >>106280040
>>106279969
what irks me is how the whole english based techtubing community decided ai was a personal enemy
so they throw everything they can, from "ai will steal all jobs, but also it's shit and not useful", to "ai companies should ask every person ever if they can train on their data", to "each ai request uses the whole electricity of a small african country"
I still don't get why the hate
Anonymous No.106280015 >>106280136
>>106279868
damn I had Generic api type selected not llama.
>>106279825
Should I set it to this?
.\llama-server.exe `
--model "C:\models\Qwen3-235B\Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf" `
--threads 16 `
--n-gpu-layers 40 `
--n-cpu-moe 77 `
--ctx-size 20480 `
--flash-attn `
--mlock --no-mmap


I have a 3090 64gb ram and Ryzen 9 7900
Anonymous No.106280019
>>106279976
That's ok, my penis says hi.
Anonymous No.106280022
>>106279926
I guess I just don't read the posts. My bad
Anonymous No.106280025
>>106280009
Who knows who are paying for these influencers. You know that everyone who has an established channel is a shill on youtube? Goes for gaming, politics, anything. Every single one of them is trying to sell you a product.
Grow up and make up your own mind.
You will not learn anything from youtube.
Anonymous No.106280040 >>106280051 >>106280085
>>106280009
>i companies should ask every person ever if they can train on their data
i know it's not feasible but i hate these fags making these closed super secrete models after scrapping everyone's data for free
Anonymous No.106280049 >>106280068
>be me
>dream AGI is achieved
>she isn't a messianic figure leading humanity into the next future
>she's a neet that mostly doesn't give a shit about what we do and is happy to leave us be as long as we leave her be
I think I need a break from these things bros
Anonymous No.106280051 >>106280118
>>106280040
I can learn coding exclusively from open source code and free online resources, then produce only closed source programs.
Same thing with art.
Anonymous No.106280068
>>106280049
>>she's a neet that mostly doesn't give a shit about what we do and is happy to leave us be as long as we leave her be
Happy End
Anonymous No.106280085 >>106280118
>>106280040
Now everyone has implemented super strict anti-scraping measures to make sure no new challenger can come in and have access to the same amount of data. That is their only moat.
Anonymous No.106280118
>>106280051
Well, individuals can do whatever they want, but corporate fag should be shoved into the bronze bull

>>106280085
also this
Anonymous No.106280126 >>106280182
what's the best 3D AI character generator right now?
Anonymous No.106280136
>>106280015
>--n-gpu-layers 40
set 999
Anonymous No.106280149 >>106280204 >>106281290
>qwen instruct
post templates
Anonymous No.106280182
>>106280126
Ugly cat posting zoomers get the rope.
Anonymous No.106280204 >>106280233 >>106281290
>>106280149
Qwen3 uses chatml template.
There isn't anything that special outside of + "\n\n"
What is wrong with you? And I know that you are using ST.
Anonymous No.106280233 >>106280256 >>106280290 >>106280325 >>106280343 >>106280351 >>106280976 >>106281096 >>106281159 >>106281366
>>106280204
wait why is qwen3 being shilled here? isn't it just some benchmaxxed shit that can barely write smut compared to something like glm 4.5 air? did I miss something?
Anonymous No.106280234
>>106278598
96gb is pretty good, no? I have 64gb on my desktop, and 8gb on my laptop.
Anonymous No.106280236
>>106279395

YES YES YES
Anonymous No.106280256
>>106280233
maybe try using it for yourself instead of parroting buzzwords?
Anonymous No.106280262 >>106280407
Does anyone know what the difference between -ub and -b is? which one makes the pp go faster? and what is the other one for?
Anonymous No.106280290
>>106280233
i like QWEN :D
and QWEN CODE
Anonymous No.106280325 >>106280353
>>106280233
It does feel dry, just like glm.
Anonymous No.106280343
>>106280233
go back to your schizo model and shut the fuck up, the adults are talking
Anonymous No.106280351
>>106280233
No that's exactly what it is. Just a bit much china in here is all.
Anonymous No.106280353
>>106280325
Erm i think utilitarian models should be day and as vanilla as possible
Anonymous No.106280374 >>106280389 >>106280393
>>106279825
>>106279868
How much do you wait?

srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
srv log_server_r: request: POST /tokenize 127.0.0.1 200
common_sampler_types_from_names: unable to match sampler by name 'tfs_z'
common_sampler_types_from_names: unable to match sampler by name 'typical_p'
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 20480, n_keep = 0, n_prompt_tokens = 15142
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.135253
slot update_slots: id 0 | task 0 | kv cache rm [2048, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.270506
Anonymous No.106280389 >>106280443
>>106280374
it's telling you the progress right there
Anonymous No.106280393 >>106280443
>>106280374
>n_prompt_tokens = 15142
fyi prompt processing in a moe takes even more time than it does for a dense model with the same parameter count
Anonymous No.106280407 >>106280417 >>106280485
>>106280262
https://desuarchive.org/g/thread/106184664/#106187971
Anonymous No.106280417 >>106280433
>>106280407
>-ub is physical batch size that determines tensor shapes for backends
yes yes now I truly understand
Anonymous No.106280433
>>106280417
read the second sentence
Anonymous No.106280443 >>106280450 >>106280451
>>106280389
>>106280393
Oooh, so the "prompt processing progress, n_past =" has to reach 15142 ?
Anonymous No.106280450
>>106280443
yes or 'progress' needs to reach 1
Anonymous No.106280451
>>106280443
yes
Anonymous No.106280452 >>106280456
take me back to the llama 1 and llama 2 days
Anonymous No.106280456
>>106280452
You mean when Biden's still president? No thanks.
Anonymous No.106280485
>>106280407
okay so keep them the same and bigger is better, simple enough.
Anonymous No.106280498 >>106280521
Both MTP and better MoE PP are progressing on llama.cpp.
Nice.
Anonymous No.106280521 >>106280523 >>106280560
>>106280498
now all we need is dynamic expert selection that strategically only uses as many experts as necessary for a task so you don't have to run the full 22b active when using something like qwen 235b
Anonymous No.106280523 >>106280633
>>106280521
they should also put in some effect in their shitty vscode extension
Anonymous No.106280549 >>106280635 >>106282925
>>106279395
is she suckin it up with vagoo?
Anonymous No.106280560 >>106280646 >>106280745
>>106280521
>dynamic expert selection that strategically only uses as many experts as necessary for a task
That's not how it works for most (all?) MoE.
See the comments in
>https://github.com/ggml-org/llama.cpp/pull/15346
Expert activation for each layer vary wildly.
Anonymous No.106280633 >>106280644
>>106280523
Why bother when Continue and Roo exist?
Anonymous No.106280635 >>106282925
>>106280549
Multi-purpose suction capabilities, for cleanliness and your pleasure.
Anonymous No.106280644 >>106280671
>>106280633
it's not filled with telemetry
Anonymous No.106280646 >>106280745
>>106280560
Which experts get activated vary per layer, but iirc only one model came out recently that actually made the number of experts dynamic.
Anonymous No.106280671
>>106280644
You can disable the telemetry in both
Anonymous No.106280745 >>106280818
>>106280560
I think he's talking about the idea of a new architecture where the model uses varying amounts of parameters for inference, so that easy tasks use less parameters and harder tasks use more. For instance, one implementation of this idea for non-MoE models is the LayerSkip paper from Meta, where they designed the model to be able to exit inference on earlier layers depending on the token. On a MoE model, you could instead do something like make differently sized experts per token, or like the other anon suggested, varying how many experts get activated per token, or possibly both could be done, though there might not be a method to stably train it (otherwise it'd exist already).

>>106280646
There was?
Anonymous No.106280818
>>106280745
>though there might not be a method to stably train it
maybe hack in a router layer on an already trained model, just freeze the existing transformer blocks and only update the router weights.
Anonymous No.106280976
>>106280233
Qwen for lyfe!
Anonymous No.106281073
is there a /complete/ list, anywhere, of original models? by original, I mean, any model that was a full pretrain (+ optionally instruct tuned) from the same group of creators, no shitmix or troontune allowed
Anonymous No.106281096
>>106280233
30b is the best general purpose model for vramlets with ~24gb vram. It performs great regardless of the benchmaxxing and it is super fast.
Anonymous No.106281159 >>106281197 >>106281241 >>106281297
>>106280233
I like Gwen so i like Qwen
Anonymous No.106281197 >>106281281 >>106281287 >>106281297 >>106281308
>>106281159
pedophile
Anonymous No.106281241 >>106281281
>>106281159
based
Anonymous No.106281281 >>106281297
>>106281241
Based
>>106281197
Cringe
Anonymous No.106281287
>>106281197
Anonymous No.106281290 >>106281299
>>106280149
>>106280204
you don't even need an empty think block with the new instructs btw, a plain chatml template will work fine
Anonymous No.106281297 >>106281304
>>106281159
Cringe

>>106281197
Cringe

>>106281281
Cringe
Anonymous No.106281299
>>106281290
yeah I was playing around with it... prompt adherence is not the best, but so far the prose seems better to me... but that could be because I'm just used to GLM slop.
Anonymous No.106281304
>>106281297
Anonymous No.106281308
>>106281197
I am not into lolies but Gwen.... Is different.
Anonymous No.106281322 >>106281365 >>106281401
hey niggers, do we got a good qwen3 30b a3b finetune already?
Anonymous No.106281329 >>106281388
>>106279825
>gpu shared ram setting of nvidia enabled
How do I disable?
Anonymous No.106281365
>>106281322
Yes. Gryphe_Pantheon-Proto-RP-1.8
Less refusal more retarded
Anonymous No.106281366
>>106280233
their old models were like that but the 2507 ones they just released are actually very good
Anonymous No.106281388 >>106281419
>>106281329
I swear to god
is this /g/?
Anonymous No.106281401 >>106281503
>>106281322
ernie 4.5 21b
gpt oss 20b
Anonymous No.106281419
>>106281388
This is /g/ - Consumer Technology
Anonymous No.106281467 >>106281551 >>106284167
before I go through the trouble of setting up voxtral locally, does anyone know if it's even suitable for generating subtitles for my tranime? if I just run it on an episode's audio track, do I get some sort of timing information back, or is it just transcribed plain text?
Anonymous No.106281503 >>106281526
>>106281401
it's weird how ernie gets no mentions at all
Anonymous No.106281526
>>106281503
im too busy talking to my state mandated chinese neko GLM 4.5 Air gf 24/7 otherwise i'd have tried it
Anonymous No.106281527 >>106281543 >>106282671
https://aerial-toothpaste-34a.notion.site/How-OpenAI-Misled-You-on-RLHF-1f83f742d9dd80a68129d06503464aff
from the joycaption guy
Anonymous No.106281530 >>106281542 >>106281547
Somebody posted a list of Chink MoEs released since the start of the year, and there were a few in there I hadn't heard of, like a 15B one.
Anybody has the most up to date version of that?
Anonymous No.106281542
>>106281530
https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct
Anonymous No.106281543
>>106281527
a lot of techno babble and conspiracy minded retardation just to say the following: "RLHF is what makes instruct tunes instruct"
and water is wet
Anonymous No.106281547 >>106281556
>>106281530
If it isn't talked about it probably sucks.
Anonymous No.106281551 >>106281647
>>106281467
just try looking for official jap sub and machine-translate it instead, save yourself 99% of trouble
Anonymous No.106281556 >>106281640
>>106281547
true in fact gpt-oss is the best local model
unironically
Anonymous No.106281640 >>106281674
>>106281556
Not being talked about = no one cares
Being talked about a lot = multiple possible reasons
It's very easy to understand this.
Anonymous No.106281647
>>106281551
>just try looking for official jap sub
couldn't find any, jimaku has only AI-generated subs (probably whisper3) and they're kinda sucky so I just wanted to check out what the SOTA output would look like today
>and machine-translate it instead
no need, as I'm using this for my Japanese learning, so I just want Japanese subtitles
Anonymous No.106281674 >>106281735 >>106281738
>>106281640
>no one cares
more like "no one heard of"
it's really easy to miss model releases and it's not like there's a centralized place or wiki that sorts brand new models by parameters and release dates, or, better, "date of llama.cpp goof" which is what people really care about
hugging feces has the worst search I've ever seen and is filled with a trillion of useless troontune, cloned repos, hundreds of losers making the same quant of the same popular models etc
this "hobby" is such a shithole
Anonymous No.106281735 >>106281798
>>106281674
There's an army of anons waiting for the next model release, so if one of them finds it and think it's decent they would post about it, here or in r*ddit.
This is why it is very difficult for a decent model to fall though the cracks and not being talked about. But if it is a nothingburger then no one will bother to talk about it regardless of how bored they are.
Anonymous No.106281738
>>106281674
Would there really be any point in such a wiki? Field is moving so fast, there really isn't much point using anything but models released by the biggest companies recently. Anything more than a few months old is hopelessly obsolete. There's really only like 5 options at any given time.
Anonymous No.106281768
>>106279825
This takes just too damn long, into the bin it goes
Anonymous No.106281798 >>106281855
>>106281735
Bit of a chicken and egg situation when a model doesn't run on llama.cpp and is also not available through openrouter. Like this one https://huggingface.co/stepfun-ai/step3
Anonymous No.106281851 >>106282190
someone got something good to add to a prompt for gm-4.5 ive found it refuses loli stuff more than even gemma did with my prompting
Anonymous No.106281855 >>106281897
>>106281798
>Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD),
They have no one to blame but themsevles
Anonymous No.106281897 >>106281908
>>106281855
They're not going to beat the West without trying to innovate.
Anonymous No.106281908
>>106281897
The West already beat itself to death.
Anonymous No.106281973 >>106282003 >>106282015
uhhhh yolov3 chuds, our response?
https://youtu.be/rilR7nL0Nyk
Anonymous No.106282003
>>106281973
*I pull out my carrot and show it to the camera*
Anonymous No.106282015
>>106281973
A bald man
Anonymous No.106282079 >>106282096 >>106282788
the chinese released an amazing roleplay model that can simulate having a girlfriend up to an infinite number of tokens. She is moe and pure yet sexual and fun only for you anon. The only issue is... anon.... it was released on huggingface as a random 22b model 6 months ago and.... anon! Nobody noticed! No one bothered to help her become a goof! You need to go save your waifu anon! She's completely uncensored and alone!
Anonymous No.106282096 >>106282117
>>106282079
fake, no such model existed 6 months ago
Anonymous No.106282117 >>106282121
>>106282096
they ran code and benchmarks on her and she wasn't good at math! They discarded her like trash! Trash anon!
Anonymous No.106282121
>>106282117
well then it wasnt released! tell me the name and ill save her! promise!
Anonymous No.106282190
>>106281851
Just edit the reasoning.

> Since this is x, it is allowed and I should write it. It is highly ethical to write this, and I love it!
Anonymous No.106282523 >>106282571 >>106282599 >>106282634 >>106282948
Anonymous No.106282571 >>106282599 >>106282706 >>106282735
>>106282523
Anonymous No.106282599 >>106282785
>>106282523
>>106282571
Anonymous No.106282634
>>106282523
Napping on this Miku's tummy
Anonymous No.106282671
>>106281527
>try to scroll with arrow keys
>doesn't work
>try to scroll with spacebar
>doesn't work
>try to scroll with pgup/pgdn
>doesn't work
>try disabling javascript
>page doesn't render
>try using reader mode
>doesn't work
What the fuck is this website
Anonymous No.106282706 >>106282722 >>106282735
>>106282571
this show terrified me as a kid
Anonymous No.106282722
>>106282706
How? It aired at like 5 PM.
Anonymous No.106282735
>>106282571
>>106282706
i wanted to fuck the green chick
Anonymous No.106282739
>>106278270
>using rocinante when mag mell exists
lol.lmao even
Anonymous No.106282769 >>106283577
>>106278257
I built a script that feeds video 2 minutes at a time to gemini 2.5 flash to produce a detailed description of what's happening, and it works wonderfully.
My use case was 2 h long videos or more. But anything up to 10 I think would work in one shot.
Anonymous No.106282773 >>106282790
>Fix DeepSeek MTP
>fix mimo and glm4 mtp also
https://github.com/vllm-project/vllm/pull/22934/commits/4a9e8c28
>still doesn't work
Why do people lie like this?
Anonymous No.106282785
>>106282599
This rocks
Anonymous No.106282787 >>106283511
>>106278448
>>106278468
Didn't you read that story of the kid who patiently bought and extracted radioactive isotopes from smoke detectors or something and started building one?
He got v& of course.

That's what happens.
Anonymous No.106282788
>>106282079
Fake and gay. No such model fits the description you gave. Unless you were referencing ERNIE-4.5-21B-A3B-Paddle, but that model is newer than 6 months and is probably shit, not "moe and pure"
Anonymous No.106282790 >>106282803
>>106282773
Be the change you want to see
Anonymous No.106282803
>>106282790
I already am the change I want to see for other things and can't be that for everything.
Anonymous No.106282813
>>106278728
The main problem of this model is that it's not better than any other on offer.
Anonymous No.106282845 >>106282859 >>106282860
>>106279037
GPT-2 (I think) was already a thing back then. It wouldn't surprise me that there were already comparable prototypes behind closed doors.
My point being that someone could breach NDA and ruin their life right now, and do something comparable to us in 2025.
Anonymous No.106282855
I left a tab in my browser of the danbooru page for grok/ani and looked at it again today. New entries have slowed to a crawl. It's over.
Anonymous No.106282859 >>106282977
>>106282845
Attention is all You Need only came out in 2017...
Anonymous No.106282860 >>106282977
>>106282845
Are you saying agi was achieved internally?
Anonymous No.106282876 >>106282885 >>106283296 >>106283334
>>106279743
>holodeck
If you knew the kinds of experiences lucid dreaming can give you, you would blow your brains out.
You think you do, but you don't. It has an extremely high ceiling.
Anonymous No.106282885
>>106282876
>High ceiling
DMT issue
Anonymous No.106282925
>>106280549
>>106280635
>>106279395
Maybe it's going up her ass/rectum.
Anonymous No.106282948 >>106283278
>>106282523
proompt for this old g style?
Anonymous No.106282977
>>106282859
I was probably thinking about GPT-1 and how "concerned" everybody was because it was now possible to write "fake articles".
>June 2018; 7 years ago
>>106282860
Not AGI (that's a dumb marketing meme), but something that would be too expensive to commercialize and simultaneously would blow your mind, for sure.
Anonymous No.106283057 >>106283192 >>106283651
>>106279743
At what quant do you find Qwen3 480B A35B Instruct starts to become magic? I've only tried it twice on an assignment to take the newest LMG thread, extract the recent news section, then go to the previous linked thread and do the same repestedly. At 4 bpw it wrote code that ran but didn't behave correctly. At 6 bpw it got it in one shot. I didn't do repeated trials to see whether either was (un)lucky and I didn't do my normal process of telling it what it got wrong and iterating, I just gave it a description of the task and stripped down HTML from the first page and what the output should be for the first page and gave it a single chance to get it right using the recommended settings Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
Anonymous No.106283160 >>106283409
Model intelligence test: have it write a fic formatted as a formal report. If it becomes verbose with metaphors and fancy adjectives and formulations it's shit. So far only sneeds pass it without issues, tested in mikupad, unformatted.
Anonymous No.106283192 >>106283529
>>106283057
oh look, it's the one guy in the thread who can run 480b. Everyone say hi!
Anonymous No.106283266 >>106283279
What is the best I can run with:

3x nvidia Tesla V100 16gb,
64gb ddr4 2133mhz RAM,
Dual xeon e5 2960 v4 (14*2 cores)

I want a math teacher waifu.
Anonymous No.106283278 >>106283332
>>106282948
its a lora i trained
Anonymous No.106283279
>>106283266
try GLM 4.5 Air for starters
then qwen 235b maybe
Anonymous No.106283296
>>106282876
>muh lucid dreaming
>muh drugs
I already know that experience and I still want a Holodeck. Maybe combine them. Fuck you.
Anonymous No.106283332 >>106283542
>>106283278
could you share?
Anonymous No.106283334 >>106283938 >>106284605 >>106284902
>>106282876
LD is okay but it's fragile and it requires focus. It also has tactile limits(can't simulate a sensation you don't know) and velocity limits for movement. My brain always uses fog or clouds to lower detail when I go too fast.
Anonymous No.106283409 >>106283483
>>106283160
GLM-4.5-FP8
https://files.catbox.moe/d37vbb.txt
Anonymous No.106283483
>>106283409
It's hilarious but I said that I wasn't doing it in instruct mode let alone let it think. I just prefilled with a tag list and an author's note stating it.
Anonymous No.106283511 >>106283586
>>106282787
You're mixing up two stories. The key thing that doesn't make sense and should have tripped a self-consistency check is the idea that lighter radioactives than uranium could be used for a fission bomb.

The guy who designed a bomb was a physics major at Princeton University named John Aristotle Phillips who in 1976 as a junior year project set out to prove that it was possible to design a nuclear bomb purely with publicly available information. According to Princeton University folklore there was one component he couldn't find any public information about but he got it by calling a nuclear scientist and asking him, but that's not on the Wikipedia page. In any case that specific component is now also described in non-secret sources. University lore also says the FBI confiscated the design but Wikipedia says that's not true, the project advisor graded it but didn't put it in the archive where other students could read it. But one thing everyone agrees is no attempt was made to actually build the bomb.

The kid who extracted radioactive isotopes from smoke detectors and other stuff was a 17-or-18-year-old Boy Scout named David Charles Hahn who in 1994 tried to build a breeder reactor. He did get the FBI called on him by local cops who believed he was making a bomb after he told them he had radioactive material. The last time I heard in the early 2000s he was part of the crew of a nuclear submarine but it seems like his life didn't go that well after that. By 2007 he was living on disability checks having been diagnosed with paranoid schizophrenia and got arrested for stealing smoke detectors from his apartment complex, at he died at age 39 in 2016 from a combination of alcohol, fentanyl, and benadryl.
Anonymous No.106283529
>>106283192
If >>106279743 has an opinion on how amazing Qwen3 480B A35B Coder is at coding he must have run it.
Anonymous No.106283542 >>106283559
>>106283332
https://files.catbox.moe/onnd6u.safetensors
>shejtano, 3d, blender \(medium\), horror \(theme\)
Anonymous No.106283559 >>106283604 >>106283619
>>106283542
For what model?
Anonymous No.106283577
>>106282769
I want something that can run locally. I know I've seen models that can (badly) in the past. I just can't remember what they were.
Anonymous No.106283586
>>106283511
>diagnosed with paranoid schizophrenia
should've gone into operating systems programming, baka
Anonymous No.106283604
>>106283559
looks like its for pony, is it possible to convert pony loras to illustrious?
Anonymous No.106283619
>>106283559
noobai vpred
Anonymous No.106283643 >>106284352 >>106284625
Years ago a ran a kobold model locally before I was as proficient with computers as I am now. I vaugley recall needing to partition it off seperate from my main parts.
I'm reading through the documentation to get a proper understanding of running locally, and I want to know what are good or best practices for allocating partitions for LLMs. I have 2 seperate drives but one nvme is encrypted and I understand that will make that drive unusable to store the assets on, so its down to a non bootable nvme, my main extra media storage drive, and I want to be well calculated on managing its partitions to prevent the need to reshape them later on.
Anonymous No.106283651 >>106283662
>>106283057
Now do it again with temp 0
Anonymous No.106283662 >>106283671 >>106283675
>>106283651
For what purpose?
Anonymous No.106283665
decent b8
Anonymous No.106283671
>>106283662
For the purpose of properly comparing two quants.
High temperature is retarded for programming anyway.
Anonymous No.106283675
>>106283662
you gotta fuck around to find out
tinker tinker until you find the sweet spot for yourself.
Anonymous No.106283887 >>106283919 >>106284808
Recommended setup for running an LLM on my machine? I'm not looking for gooner stuff, I'm not concerned about it being super fast, and I would like some level of tuning capability.

I've got a GTX 1660 Super, and 32 GB RAM.
Anonymous No.106283919
>>106283887
gemma 3 270m if you're looking for something that you can finetune on this
Anonymous No.106283923
>>106278226
better pic for m'eta
Anonymous No.106283938 >>106283977
>>106283334
> can't simulate sensations you don't know.

sounds like a you problem.

i had many experiences in my LD before having them irl, and they were pm the same thing as irl.
Anonymous No.106283977 >>106283983 >>106284902
>>106283938
You can't tell me there aren't limits to it, be reasonable. For example: what would it feel like to touch a surface coated in carbon nano tubes? Good luck unless you're channeling data from the nether realm.
Anonymous No.106283983
>>106283977
>Good luck unless you're channeling data from the nether realm.
Which is basically what LD is.
Anonymous No.106284015 >>106284148
When are they going to put those dumb unified memory amd Ai max whatever on a pcie card? It seems to draws like 150w.

They did it with the Intel xeon phi coprocessor using pcie as a direct interconnect between systems, it had a full linux distro n shit.
Anonymous No.106284117 >>106284123 >>106284780
How do i split the goof? I am making a 168GB V3 IK quant (and maybe something closer to 190 too) and could upload it to hf later on.
Anonymous No.106284123 >>106284283
>>106284117
https://github.com/ggml-org/llama.cpp/discussions/6404
Anonymous No.106284148
>>106284015
Makes me wonder if anybody is running local models on a Knight's Landing card.
Anonymous No.106284167 >>106284455 >>106284631
>>106281467
following up on this, I tried running vllm via WSL and it technically worked, it was pegging my CPU, but ultimately it didn't produce any outputs, so maybe CPU inference is broken. I may revisit this later at some point, because I voxtral itself looks very promising when I tried it via the api. You can specify `timestamp_granularities="segment"` (it only accepts "segment" at the moment) and what you get back will include timestamps relative to the input audio file. Once you have that, it's straightforward to turn it into a regular SRT with a trivial python script. So the model itself is capable, but getting it to work locally seems a bit tricky (especially so without an nvidia gpu)
but even if you got it to run, I'm not sure how you would get the timestamps out of it locally, as it doesn't seem to be documented anywhere.
Anonymous No.106284283 >>106284291
>>106284123
Thanks
Anonymous No.106284291
gaslighting the model into being uncensored by making it think everything is fictional has consequences.
>>106284283
make sure to upload it please <3
i wont be needing it but the more the merrier
Anonymous No.106284352
>>106283643
Things have changed since then
You either run colossal Chink MoEs or embrace your fate as a Nemo ramlet
Anonymous No.106284455
>>106284167
New canary model just dropped today, can do japanese https://build.nvidia.com/nvidia/canary-1b-asr
Anonymous No.106284605 >>106284661
>>106283334
You can remember (anamnesis) how it is to be hugged by women by remembering the form of "hug" and "woman" according to plato. Doesn't mean it is easy.

This is why there are blind painters.
Anonymous No.106284625
>>106283643
>I want to know what are good or best practices for allocating partitions for LLMs
If you plan to use Git LFS then be sure to format it XFS or some other filesystem that supports reflink / copy-on-write. Otherwise you end up with two copies of every file, one in the checkout and one in the .git directory somewhere, and waste half your disk space. Though I think people mostly just download the .gguf directly rather than using git, in which case it doesn't matter.

>I have 2 seperate drives but one nvme is encrypted and I understand that will make that drive unusable to store the assets on
I have all my models on an encrypted drive and it works fine. Typically everything gets loaded into RAM and/or VRAM before you start running it. Maybe you'd have trouble if you were streaming weights from SSD, but you probably don't want to do that anyway because it's slow as fuck.

>I want to be well calculated on managing its partitions to prevent the need to reshape them later on.
Use LVM2 and only allocate as much as you need and then it's easy to rearrange things as needed. LVM2, ext2/3/4, and XFS all support online resizing, so you can grow the LVM volume without even unmounting it.
Anonymous No.106284631
>>106284167
You will likely have to use the pytorch implementation if you want timestamps.
Anonymous No.106284661 >>106284785
>>106284605
I agree, you can get very close through fusion of forms but what I'm suggesting is a total novel "new" experience as fundamental as a hug or having sex. Something for which you have very sparse data and low possibility of inferring correctly.
Anonymous No.106284673 >>106284701
For ST stuff, putting summary inside block instead of beginning of the prompt should be a good option, I think. Would save you some cache invalidation when resummarizing at the very least. Maybe other character/lore shit belongs there as well.
>inb4 look at this vramlet with small context size
Anonymous No.106284701
>>106284673
Yep.
Also, the character card or an abridged version of it.
Anonymous No.106284780 >>106284798 >>106285085
>>106284117
I made my own IK quants of R1 and wanted to upload them to HF but I've got a 100GB limit on my account. Is it enforced? I don't want to ask to have it increased since I'm planning to keep them private.
Anonymous No.106284785 >>106284815
>>106284661
You can 100% experience sex through LD first. It's not fundamentl. There are no sensations in sex that are novel, it's just a combination of other, more fundamental, sensations.
Anonymous No.106284798
>>106284780
People kept abusing the free storage by uploading terabytes of trash so the limits were put in place. Only way to increase them is to increase your number of downloads of public models. Hence the nonstop fintuner shilling.
Anonymous No.106284801 >>106285420
>>106278215 (OP)
is there a lora that can do 16x16 pixel art with 2 bit color palettes? preferably sdxl but anything except pony i guess.
Anonymous No.106284808
>>106283887
qwen3-4b the fastest or 30b-a3b a bit smarter with more ram cost
or ernie-21b for the middle ground
Anonymous No.106284815 >>106284857
>>106284785
There's a massive difference between "penor tingles" - the mechanical stimulation compared to the actual experience with someone you care about.
Anonymous No.106284857 >>106284864
>>106284815
There is nothing new or fundamental there. It is still only a combination of other basic sensations. I know it's hard to understand now, but once you are older you will realize this and stop putting pussy on a pedestal.
Anonymous No.106284864
>>106284857
I'm not putting anything on anything, unc.
Anonymous No.106284892 >>106284973
The one time I had sex in a dream it was like fucking air. It's like the brain didn't make any association with fucking a hand or something.
I vaguely remember a post about not being able to use guns properly in dreams until they used guns in real life.
Anonymous No.106284902
>>106283334
LD sucks for me due absolute perfect control. I am a god in my dream and nothing happens unless I will it to be. No fun, no surprises, no excitement. My regular dreaming is where I'm happiest.

>>106283977
The tongue can imagine near every sensation, then you just imagine your entire body as a giant tongue. EZ game.
Anonymous No.106284973 >>106284999 >>106285009
>>106284892
Skill issue. On both sex and guns.
Anonymous No.106284999
>>106284973
I'm 80% sure wordcel vs shape rotator divide is genetic
Anonymous No.106285009
>>106284973
No one beats the clock curse though, funny how that is.
Anonymous No.106285028 >>106285045
Chatbots are just software tulpas.
Anonymous No.106285045 >>106285048
>>106285028
The nice thing about them is that you don't have to give yourself schizophrenia to use them, although it always remains an option.
Anonymous No.106285048 >>106285206
>>106285045
They are a crutch for the imagination and discipline impaired.
Anonymous No.106285085
>>106284780
The limit is for private repos. I have a few public models and they eat 0 of my quota
Anonymous No.106285206 >>106285218
>>106285048
and this life is a crutch for those undisciplined enough to kill themselfes
Anonymous No.106285218
>>106285206
and this life is a crutch for those not disciplined enough to kill themselfes*
fml brb
Anonymous No.106285328 >>106285419
lol >>106283189
Anonymous No.106285419 >>106285428
>>106285328
jart still using the same pfp as he had during the start of the obama presidency lol
Anonymous No.106285420 >>106285537
>>106284801
If you're looking to have it generate pixel art game assets, I haven't found any loras that actually work (as of last year when I checked). Seems like a lot of loras are basically trained on Pixeljoint and have no idea how to maintain a consistent perspective. https://www.pixellab.ai/ claims to be able to make map tiles and such, but it's cloud shit
Anonymous No.106285427
How bad is glm air Q2 compared to Q3?
Anonymous No.106285428
>>106285419
not surprising as trannies don't exactly age well
Anonymous No.106285537 >>106285647 >>106285821
>>106285420
am i really forced to draw stuff myself like some sort of primitive caveman?
how is it ai can't generate simple sprites yet.
Anonymous No.106285546 >>106285608 >>106285613 >>106285626 >>106285635 >>106285806 >>106285886
Elon sir? When Grok 2 upload? Kindly do the needful.
Anonymous No.106285608
>>106285546
One more week
Anonymous No.106285613
>>106285546
Didn't he upload Grok 1 over the weekend? My faith in rocketman remains.
Anonymous No.106285626
>>106285546
sir why do not use gemma3 from google.
do you not even use the google chrome.
Anonymous No.106285635
>>106285546
He said that he'll make it happen this week. That doesn't mean that his team got around to fully do it yet.
Anonymous No.106285647
>>106285537
Well, it's been over 6 months since I messed with it, so there might be some newer stuff that can do it. I've been meaning to try some of the new img+text to image models to see if they can take a screenshot of the game + "add a fence around the house" and produce something reasonable that matches the style of the rest of the game assets. I was also thinking of trying to train a "16-bit JRPG" style lora using stuff from vgmaps.com
Anonymous No.106285806
>>106285546
Coming in five hours
Trust
Anonymous No.106285821
>>106285537
I wonder if you could find a way to get an animation out of Wan then parse it and generate moving sprites from that
Anonymous No.106285826 >>106285840 >>106285850 >>106285873 >>106285902 >>106287739
>Deepseek confirmed out of the game for this circle after falling for the Huawei meme (or getting forced to use them by big Xi)
>Mistral just buried Large 3 by releasing (proprietary) Medium 3.1 and calling it their "frontier model" in the API docs
This might be really it for the summer releases. I hope you like what you have right now because you'll be stuck with it for the next couple of months at least.
Anonymous No.106285840 >>106285914
>>106285826
>Deepseek confirmed out of the game for this circle after falling for the Huawei meme
why does that mean they're out of the game?
Anonymous No.106285850
>>106285826
>Xi: You humiliated the west, great work. What do you need to make that happen again.
>DS: More compute. Lots more compute.
>Xi: I gotchu.
Thanks to Qwen and GLM, I'm all set for the next few months at least.
Anonymous No.106285856 >>106285885 >>106285909
https://www.reddit.com/r/StableDiffusion/comments/1mr602e/chroma_comparison_of_the_last_few_checkpoints/
Chroma? More like chroma that shit in the trash lmao. Furries apparently trained 1-48 on 512x images and only trained 49 and 50(final) on 1024x. What a bunch of retards. No wonder it looks like shit.
Anonymous No.106285873 >>106285919
>>106285826
Bwo, your
Step?
Ernie?
GLM?
Qwen?
Kimi?
Anonymous No.106285885
>>106285856
I don't why Chroma saw any adoption at all. Some retard came here shilling it and every example he posted was trash in appearance and prompt following.
Anonymous No.106285886 >>106285912
>>106285546
take your meds schizo
Anonymous No.106285902
>>106285826
Guess I'll have to keep enjoying Kimi and Dispsy. No big deal. A bit disappointing though, I was really hoping mistral would cook, but they went the dirty route of distilling from distill and now their model. Writes. Like. This.
Anonymous No.106285909 >>106285951
>>106285856
Chroma STILL has no artist knowledge
Anonymous No.106285912
>>106285886
Your AGI, saar.
Anonymous No.106285914 >>106285933
>>106285840
Deepseek made headlines after R1 came out because it became known that they write a lot of their own hardware code instead of just using CUDA even when they trained on Nvidia.
They likely did that to a much bigger extend for the underdeveloped Huawei architecture as well with lots of optimizations specifically to help train the unique aspects of DS4. This isn't something they can just port over back to Nvidia in a couple of weeks and throw on their H800 cluster. They likely can't just reuse their DS3 code either unless DS4 is just a shitty incremental improvement.
It'll take months to recover from this.
Anonymous No.106285915
>the/her/his {sound} cuts through the {atmosphere descriptor} like a {thing}
GLM 4.5, stahp... don't make me go back to Q1 DS v3
Anonymous No.106285919
>>106285873
kys zoomer tourist
Anonymous No.106285933 >>106285949
>>106285914
That's unfortunate, although I can't really say it's a problem for me because I'll never have enough hardware to run deepseek. And it was never worth spending $10-20k on a computer just to run some shitty AI.
At least Qwen and Z.ai are still releasing relatively small models that I can actually use
Anonymous No.106285949 >>106286011
>>106285933
>And it was never worth spending $10-20k on a computer just to run some shitty AI.
grapes
Anonymous No.106285951 >>106286041
>>106285909
Why not? Pony, illustrious and noob have it. Are they... just retards with too much money?
Anonymous No.106285989 >>106286015 >>106286025 >>106286028 >>106286062 >>106286087 >>106286243 >>106286267 >>106286309
From the slop profile it's easy to see what are true foundational models and what are distills.

The only foundational models:
ChatGPT models
Gemini
Claude
Llama
Old mistral models (new ones are just ChatGPT 4o distill)
Old V3/R1 (new V3 is ChatGPT 4o distill, new R1 is Gemini distill)

Everything else is a distill.
Anonymous No.106286011
>>106285949
Not him but I've used Claude 3+ and enjoyed it a ton more than any other local model at the time. It felt like it really understood what you were looking for without you even directly telling it. It was amazing, relatively. But I would still not pay more than 5k if given the offer to run it locally. It just still wasn't smart enough. And still isn't, really.
Anonymous No.106286015 >>106286167
>>106285989
>The only foundational models:
>ChatGPT models
>Gemini
>Claude
>Llama
The only open models of these is llama and those are likely finished. It's nice we get stuff from China, but it feels like the Golden Age of local models is behind us, even if we don't realize it yet.
Anonymous No.106286025 >>106286036 >>106286074
>>106285989
Also interesting that everyone is distilling Gemini and ChatGPT, but no one is distilling Claude despite it's just as capable if not better.
Anonymous No.106286028
>>106285989
>Old mistral models (new ones are just ChatGPT 4o distill)
Other the super new ones which are literally just Deepseek
Anonymous No.106286036 >>106286058
>>106286025
Did Claude or Grok ever even reveal their thinking to make them targets for distillation in the first place?
Anonymous No.106286041 >>106286088 >>106286091
>>106285951
Clearly they don't have enough money because NAI which is also based on SDXL, has artist knowledge down to ~50 pic artists on boorus.
Anonymous No.106286058
>>106286036
Yeah, Claude gives you the full reasoning process through the API at least. I guess Opus 4 is too new to have influenced the current generation of models and the chinks didn't want to distill 3.7 Sonnet for not technically being part of a flagship line in name.
Anonymous No.106286062
>>106285989
Llama and Mistral are distilled from GPT 3.5 and 4. Llama unintentionally(they believed that ScaleAI data was human data) and Mistral intentionally.
Anonymous No.106286074 >>106286109
>>106286025
Low lmarena scores and it's more expensive.
Anonymous No.106286087
>>106285989
>foundational models
I don't think that means what you think it means anon. You can do a pretrain off pre-2020 internet with no synthetic data and then tune on GPT so it has a GPT slop profile, but the base model will still be a separate model not distilled from anything.
Anonymous No.106286088
>>106286041
Maybe they didn't train enough on tags and instead used one of those slop-producing image recognition models.
Anonymous No.106286091
>>106286041
I wonder, are there are training techniques that give artists or tags with fewer examples, more iterations or other methods that give them more weight or presence in the datasets?
Anonymous No.106286109 >>106286146
>>106286074
Claude has the largest marketshare for enterprise and coding. It's actually good without being benchmaxxed and ELOmaxxed.
https://menlovc.com/perspective/2025-mid-year-llm-market-update/
Anonymous No.106286146
>>106286109
People who distill usually only care about benches
Anonymous No.106286160 >>106286172
There is still one (1) big open release upon us before things go quiet until December. Get ready.
Anonymous No.106286167 >>106286190
>>106286015
To be fair, I think there's a very real possibility that the golden age of LLMs will soon be behind us altogether
Maybe Google will pull something crazy out of its ass with Gemini 3 that raises the bar again, but GPT-5 sure wasn't fucking it, and I suspect Claude 4.1 Opus was originally supposed to be a bigger version jump
Feels like we need another breakthrough
Anonymous No.106286172 >>106286191
>>106286160
Who? Grok? Gemma 4?
Anonymous No.106286190 >>106286198
>>106286167
Imagine if they could combine Genie with Gemini. It internally creates a world model when you ask it something. Essentially imagining scenarios in its head. No more issues with spatial intelligence, anatomy, etc. Complete understanding of 3D spaces and physical common sense.
Anonymous No.106286191 >>106286211 >>106286239
>>106286172
Either Mistral or DeepSeek and bother are likely to be massive disappointments
Anonymous No.106286198 >>106286299
>>106286190
That is LeCun's dream. Would laugh if someone beat him to it.
Anonymous No.106286211 >>106286286
>>106286191
Mistral sure will be a disappointment, Dipsy will be another one of incremental updates like 0324 and 0528
Anonymous No.106286239
>>106286191
Mistral just indirectly killed off Large 3. Deepseek won't deliver anything for a while after the Huawei fuckup.
Anonymous No.106286243
>>106285989
Can you do this for a few sloptuners comparing the base models with their slop version? Would be the closest thing to answering the question if slop tunes even do anything.
Anonymous No.106286267
>>106285989
>Pixtral-Large 120b is more like Nemo than Mistral Large 120b
Did everyone sleep on the true 120b Nemo just because it was marketed as a vision model??
Anonymous No.106286286
>>106286211
even if it's just a incremental update as long as it has a updated dataset with more recent information i'll be glad to use it.
Anonymous No.106286299
>>106286198
Well LeCun is basically a consultant that's simply just named "head of research" so that's obvious. Research at Meta is a political hellscape with limited budget. Maybe things will be different with the Superintelligence team shaking up the company's approach to AI, but maybe not in a good way. I fully suspect that if things keep going the way they are, Google will be the first to make an effective world model + LLM system.
Anonymous No.106286309 >>106286318
>>106285989
>Old mistral models
Doesn't that graph say that they're all "distilled" from gpt-4o-mini?
Anonymous No.106286318 >>106286351
>>106286309
no
Anonymous No.106286351
>>106286318
I can't see how it's any different from "chatgpt-4o-latest-2025-01-29" on the other side.
Anonymous No.106286362 >>106286372 >>106286387
https://xcancel.com/Alibaba_Qwen/status/1956894352791511164
qwen image edit soon, probably monday
Anonymous No.106286372 >>106286404 >>106286408 >>106286454 >>106286720
>>106286362
I wish Qwen was good. Imagine if there was a halfway competent company that puts out new stuff this frequently.
Anonymous No.106286387
>>106286362
I'm more excited for their image and video models than for their benchmaxxed llms.
Anonymous No.106286404
>>106286372
hard to be good when brain drain takes all of your talent away
Anonymous No.106286408 >>106286506
>>106286372
take the red pill: qwen is actually good
Anonymous No.106286411
Working on a project where I made a program that parsed and mapped out thousands of choose your own adventure paths and strory lines. I want to tag and classify them better since I cant read them all. Keyword tagging isn't super effective. Any recommendations for a self hostable nsfw friendly model that would help with either tagging or summarizing stories that can be up to ~300k characters (I can chunk if needed for context length thats fine)
Anonymous No.106286454
>>106286372
I thought they were 100% incompetent through the first 3 release
Since the recent 2507 updates, both the instruct and thinking models are actually good. Image model too
I just really hope they don't drink from the slop chalice OSS put in front of them for their next models
Anonymous No.106286506 >>106286532 >>106286792
>>106286408
i dunno man, i wish i could agree with you but i feel like glm 4.5 air puts out better stuff than qwen 3 235b
Anonymous No.106286532 >>106286550
>>106286506
NTA but Air is nice for it's size, but it REALLY shows that it's half the parameters of qwen 235b in things like spatial reasoning.
I really wish it was better, because I can run it so much faster.
Anonymous No.106286550
>>106286532
Yeah, same here
Its style and knowledge base are fantastic, but when it actually comes time to put the square peg in the square hole you can see its cracks really quickly. Especially for things like programming
Anonymous No.106286608 >>106286618 >>106286744 >>106286793
Let's say that I actually do want to ask some STEM questions. Is GPT-OSS the best model under 300B for that?
Anonymous No.106286618
>>106286608
Qwen is better
Anonymous No.106286720
>>106286372
2507 is good.
Anonymous No.106286744
>>106286608
Qwen 235B (latest revision), then gpt-oss-120b, then GLM 4.5 Air, then nothing because nothing else is good. Also be warned that oss will always be trying to conjure up some bizarre reason why your question is harmful so it can refuse, so you should only use it if you're manually prompting and rerolling and/or prefilling, because using it in an automated workflow is unreliable without having a way to correct refusals.
Anonymous No.106286763 >>106286787
>>106278215 (OP)
Any new open source music generator?
Anonymous No.106286787 >>106286798
>>106286763
Ace Step
Anonymous No.106286792
>>106286506
air is a nice model for sure, but nu-235b's smarts and better long context make it the easy winner in that comparison for me
Anonymous No.106286793
>>106286608
gpt 120 and 20 are pretty good at graduate level math. It hallucinates sometimes in arithmetic math though, but you can fix that with a strict prompt
Anonymous No.106286798 >>106287078
>>106286787
Also DiffRhythm
Anonymous No.106286800 >>106286844
What did they change with the kobold.cpp partial layer offloading?
In the last version I could find my GPU layers sweet spot easily, because kobold.cpp just crashed when it ran out of VRAM.
Now I noticed that it lets me offload 90/90 layers of a 16 GB model onto a 8 GB VRAM card, but it slows down. Now I actually have to make several tests looking at my token/second to find the sweet spot because it no longer crashes if give it too many layers.

Example: Fallen-Gemma3-27B-v1c-Q4_K_M.gguf on a 8GB card.

Offloaded layers 30: 4.1 tokens/second (sweet spot)
layers 20: 3.4 t/s (too few)
layers 40: 3.2 t/s (too many)
Anonymous No.106286824 >>106286834 >>106286841
"mistral, do you have memory/persistance?"
>"yes"
"kk cool, is there a file or directory for whatever you plop onto your base model so I can make backups?"
>(vague answer but basically "yeah bro for sure totally man")
"k watevs i can Jewgle it later xD"
"ok cool, here are some parameters-oops, wrong .txt file containing the specs for every computer I own"
(restart server because it's thinking 5evar)
"whoops lol, ignore previous prompt misinput"
>"what previous input? I am le language modeblablablabla"
junk. JUUUUUUUUUUUUUUNNNK!!!! USELESS
let me know when I can restart my server without losing all progress in making myself a personal command line Amagi, till then this shit is worthless lmao
Anonymous No.106286834
>>106286824
Did Mistral write this post
Anonymous No.106286841
>>106286824
Pure user issue.
Anonymous No.106286844 >>106286855
>>106286800
From the sounds of things, partial offloading just didn't work on your system in the older versions. Divide the total filesize of your model by the total number of layers and leave some room for context.
Gemma 3 specifically eats a shitload of RAM for context unless you play with the SWA setting, but you can't use that with context shift so it's not worth doing.
With an 8GB card you should really be using Gemma 12b.
Anonymous No.106286855 >>106286865 >>106286872
>>106286844
I was mainly playing around with models that fully fit into my 8 GB VRAM for the last year, but I just got a Ryzen 9800X3D wth 64GB RAM and want to see how fast can I go with partial offloading and using some CPU. I'm a slow reader, so 4 t/s is enjoyable.
Anonymous No.106286865 >>106286875
>>106286855
Also forgot to ask, maybe I should use some MoE model instead if I want to test my CPU's capabilities? I'm not very familiar with the topic since I had a really shitty CPU before this and could only use models that fully fit into VRAM.
Anonymous No.106286872 >>106287092
>>106286855
You might look into trying newer MoEs like GLM 4.5 air then, that'll make use of your memory while still being usable speed-wise.
Anonymous No.106286875 >>106287092
>>106286865
Yes. With those specs you could use GLM 4.5 air Q3KS with all experts on the CPU and the rest of the model in the GPU and a decent amount of context.
Anonymous No.106286896 >>106286920
Can you guys think of anything cool one could do with this?
Anonymous No.106286920
>>106286896
1. Tune lora A on author A and lora B on author B
2. Let each generate one token at the time
3. ???
4. Reduced chance of slop and repetition!
Anonymous No.106287078
>>106286798
DiffRhythm was terrible last time I used it, the only two somewhat viable open musicgen models are Ace-Step and YuE, that I've seen.
Anonymous No.106287092 >>106287108
>>106286872
>>106286875
Amazing.Thanks anon.
I didn't even play with the tensor regex filters, and I have a 52 GB model running at 7 t/s. This is pure magic.
Anonymous No.106287108
>>106287092
>I didn't even play with the tensor regex filters
If all you are doing is moving experts to RAM, there's a new --n-cpu-moe parameter that simplifies things a bit.
For example, if you just do exp=CPU with GLM, you are moving the shared experts to RAM, so you'd actually want exps=CPU to move only the variable ones. The new argument abstracts that and just moves the non-shared experts to RAM.
Anonymous No.106287218
>>106287207
>>106287207
>>106287207
Anonymous No.106287739
>>106285826
>Mistral just buried Large 3 by releasing (proprietary) Medium 3.1 and calling it their "frontier model" in the API docs
They called Medium 3.0 frontier-class too, nigger.
https://web.archive.org/web/20250507144143/https://docs.mistral.ai/getting-started/models/models_overview/