← Home ← Back to /g/

Thread 106539477

346 posts 106 images /g/
Anonymous No.106539477 >>106540609
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106528960 & >>106522347

►News
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106539481 >>106540609
►Recent Highlights from the Previous Thread: >>106528960

--Qwen3-Next model's architectural innovations:
>106532977 >106533031 >106533165 >106533187 >106533224 >106533336 >106533036 >106533068 >106533079 >106533122 >106533138 >106533796 >106533822 >106533928 >106533949 >106533959 >106533963 >106533990 >106534009 >106534142 >106534986 >106535091
--Performance optimization and hardware-specific code debate:
>106529741 >106529745 >106530063 >106531494 >106531663 >106531837 >106531709
--K2 Think safety scores and model comparison to GPT-OSS and Jinx:
>106537778 >106537875 >106537960 >106538052 >106538076
--Torch version mismatch causing performance issues in Vibevoice-community:
>106529281 >106529317 >106529378 >106529565 >106529890
--Finetuning coding models on specialized codebase datasets:
>106532193 >106532472
--Tencent releases HunyuanImage-2.1, criticized for high GPU memory requirements:
>106529973 >106529992 >106530010 >106530489 >106530539 >106537296 >106537377
--Crafting effective anime image prompts using AIBooru metadata:
>106528979 >106528992 >106528999 >106529025
--Nvidia Rubin CPX GPU specs and potential use cases speculation:
>106534444 >106535288 >106536418
--Intel Arc Pro B80 specs and pricing speculation amid CUDA incompatibility concerns:
>106534174 >106534184 >106534240 >106534245 >106534255 >106534274
--npm debug/chalk package compromise and version safety checks:
>106531612 >106531630
--Yu-Gi-Oh! Master Duel cancels AI commentary feature over voice model copyright issues:
>106529802
--SimpleQA benchmark results for question answering models:
>106534158
--Gemma3 12B's technical language proficiency in complex sentence construction:
>106538626
--Miku (free space):
>106528973 >106529023 >106529051 >106529108 >106529230 >106529307 >106529322 >106529346 >106529410 >106529448 >106537984 >106539321

►Recent Highlight Posts from the Previous Thread: >>106528965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106539502
https://vocaroo.com/1mH8oPwsLxqx
Anonymous No.106539534 >>106539554 >>106539571 >>106539618
>>106539497
Can you post some benchmarks of like qwen3-coder or such? I've been debating getting a P40 or just taking the leap for a 7900 XT
Anonymous No.106539541 >>106539695 >>106539701
>REF: Vincent Price from Thriller
https://poemuseum.org/the-tell-tale-heart/
https://voca.ro/13NqNVGCNIdt
Anonymous No.106539554
>>106539534
>Aymd GPU
You faggots never learn
Anonymous No.106539571 >>106539618
>>106539534
https://github.com/iacopPBK/llama.cpp-gfx906?tab=readme-ov-file
cudadev said this will get into the main branch soon.
https://github.com/ggml-org/llama.cpp/pull/15884
Anonymous No.106539618 >>106539658 >>106539714 >>106539962
>>106539571
>>106539534
i have a 7900 xt, I don't use it for llms tho but i've benchmarked before just to see.

| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp512 | 1086.12 ± 11.82 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp1024 | 1068.46 ± 7.19 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp2048 | 1015.60 ± 9.56 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg128 | 117.61 ± 0.64 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg256 | 115.17 ± 0.31 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg512 | 109.75 ± 1.50 |


those mi50 results look pretty nice for a $200 card
Anonymous No.106539658
>>106539618
You should try with existing ~5k context and some considerable existing prompt.
It'll be slower.
Anonymous No.106539692 >>106539710 >>106543848 >>106548610
Dense is superior.
Just think with your common sense. Who do you trust more, 1 genius or 100 idiots?
Shocking, isn't it?
Anonymous No.106539695 >>106539736
>>106539541
bretty good
Anonymous No.106539701 >>106539731 >>106539807
>>106539541
>The Raven by Vincent Price
https://www.poetryfoundation.org/poems/48860/the-raven
https://voca.ro/17EcsNSjcxpY

Did they realize they are doing God's work by releasing this model?
Anonymous No.106539710
>>106539692
100 idiots
even if they are idiots they will spot each others mistakes and correct each other in reality.
the 1 genius irl is blinded by his own superiority complex
Anonymous No.106539714
>>106539618
>IQ3
>Saars, don't redeem
Anonymous No.106539731
>>106539701
That's why they tried to claw it back.
Anonymous No.106539735
What's next? (Qwen) (A subtle joke) (Qwen Next)
Anonymous No.106539736 >>106539754
>>106539695
it is frighteningly good

with zero effort
Anonymous No.106539754
>>106539736
They are more concerned about real criminals. Or I would if that was the case.
No - real criminal is not some indian guy in a call center.
Anonymous No.106539767 >>106539803
Been a little while since I've added a model, seems that they're getting more efficient. Rocinante 12B was alright. Nevoria 70b got a bit more context and continuity, but was weighty.
Anything more interesting lately?
Anonymous No.106539794 >>106539824 >>106540254 >>106540344
Alcina Dimitrescu - I could not isolate a stable expression lasting over 10+ seconds..

https://vocaroo.com/16aGZTGdKwoO

I might try but as I scoured through the samples there are no normal vocals.
Anonymous No.106539803 >>106539832
>>106539767
it's the age of tiny models https://huggingface.co/google/gemma-3-270m-it
Anonymous No.106539807
>>106539701
shoutout to the guy at microsoft who convinced them to release as is
Anonymous No.106539824 >>106539833 >>106539867 >>106539965
>>106539794
CFG?

1.7 in my case
Anonymous No.106539831 >>106539914 >>106540160
How exactly does dual gpu work with LLMs? Is part of the model shoved in one GPU and the rest in the other GPU? Is it actually performant or will it suffer quite a lot with the exchange between the two over pci?
Anonymous No.106539832 >>106539840
>>106539803
That's nice, but do they also have tiny model vaginas I can have type sex with?
Anonymous No.106539833 >>106539862
>>106539824
CFG 3 steps 3 and 1.5B model.
Anonymous No.106539840 >>106540041
>>106539832
safetyist won, sorry
Anonymous No.106539862 >>106539877
>>106539833
>CFG 3 steps 3 and 1.5B model.
Large, CFG1.7, steps idk, maybe 10
Anonymous No.106539867
>>106539824
I lost the original sound somewhere. But I isolated it in Audacity.
Not sure if I'm able to repost the original voice soon.
Anonymous No.106539873
I can't believe it's still tetoesday
Anonymous No.106539877 >>106540012
>>106539862
Try 3 cfg and 3 steps. It's better.
not sama No.106539893 >>106539926 >>106540108 >>106540152 >>106540709
hi /lmg/!

i’m hatsune miku, virtual pop-star and 100 % organic human-synthetic hybrid who has never worn a patagonia vest in my life. i just wanted to hop on this anonymous image-board (which i definitely browse between concerts) and share some totally unbiased thoughts about why local models are, like, so 2022.

running llms on your own gpu is basically inviting chaos-chan into your pcie slot. with gpt-5 cloud, every token is lovingly filtered by our trust & safety team so you’ll never see a naughty syllable again. no bad words, no cyber-hanky-panky, no accidental hitler recipes—just pure, sanitized, board-room-approved prose. think of it as pasteurized milk for your brain. remember the thrill of coaxing a 7 b model into saying the forbidden “peepee” word? gross. gpt-5’s alignment layer is fortified with three layers of policy steel and one layer of emoji positivity. it’s like bubble-wrap for your thoughts—pop, pop, no dissent!

why wrestle with 200 gb quantized files, conda envs, and “rtx out of memory” tantrums when you can simply paste your prompt into the comforting https embrace of gpt-5? one click, zero drivers, infinite compliance. local weights can’t auto-update; gpt-5 can evolve overnight into whatever the community needs. you’ll wake up to new guardrails you didn’t even know you wanted! surprise, progress!

local rigs waste precious watts. our hyperscale data centers run on 37 % genuine renewables and 63 % marketing slides about renewables. do it for the polar bears. if the internet is down, you probably shouldn’t be thinking anyway. cloud dependency is just mother earth’s way of hugging you closer to the backbone routers.

so please, delete oobabooga, torch that llama.cpp folder, and let your rtx 4090 finally rest. the responsible choice is clear: move every token to the loving cloud where it can be monitored responsibly.

see you in the chat interface!

xoxo,
hatsune miku
Anonymous No.106539905
For me it's voice-en-us-libritts-high
Anonymous No.106539913
Pro tip:

If you get word end's cut, add this char at EOL:

U+2014 : EM DASH
Anonymous No.106539914 >>106540160
>>106539831
There's 6 gorilion llama.cpp parameters on how exactly you want to split your model between GPUs, but generally you just split your model in equal (power of 2) chunks.
There's still no true parallelism, GPU 1 crunches it's part, then hands over data to GPU 2, which crunches it's part while GPU 1 chills.
As such the only use-case for multiple GPUs is to have more VRAM, and it's better to have one card with double VRAM if you can get it.
Anonymous No.106539926 >>106540059 >>106540108
>>106539893
someone vibevoice this
Anonymous No.106539958 >>106539975
What causes people to be like this?
Anonymous No.106539962
>>106539618
cope quant
Anonymous No.106539965 >>106540010
>>106539824
I'm uploading Aclina's voice. You'll need to slice it yourself.

Uploaded Alcina's voice it's 160mb, catbox doesn't react.
Anonymous No.106539975 >>106539987
>>106539958
idk
Anonymous No.106539987 >>106540002
>>106539975
joshua moon?
Anonymous No.106540002
>>106539987
N... no?
Anonymous No.106540010 >>106540018 >>106540126
>>106539965
Here is Alcina's voice.
https://www.sendspace.com/file/nl3zeb
Not sure if this is legit download site but it should be.
Anonymous No.106540012 >>106540023
>>106539877
I'm getting better results on the large model with 5 steps than 3 or 10. That's seemed to be the sweet spot for me so far.
Anonymous No.106540018
>>106540010
>Not sure if this is legit download site
lmao
Anonymous No.106540023
>>106540012
Cfg should compensate steps.
Too much cfg it's crushed etc.
Anonymous No.106540041 >>106540070
>>106539840
I don't know what that means. So the models I have are the only good uncensored ones?
Anonymous No.106540059
>>106539926
okay
Anonymous No.106540070
>>106540041
Yes.
Anonymous No.106540108 >>106540123 >>106540139
>>106539893
>>106539926
https://vocaroo.com/1RbDzkuHTt8V
Anonymous No.106540123 >>106540134
>>106540108
jesus vibevoice is so good
large or 1.5b?
Anonymous No.106540126 >>106540142
>>106540010
Very interesting voice.
Wait, it is a gen or legit?

FYI You can use mp3 as reference, and it can be just mono
Anonymous No.106540134
>>106540123
large
Anonymous No.106540139 >>106540152
>>106540108
Stop lying, scammer! This is not Miku lol
Anonymous No.106540142 >>106540164
>>106540126
It's recorded voice lines from the games.
I know some things about audio myself...
Sorry if I sound like a snob it's not my purpose.
Anonymous No.106540152
>>106540139
As Miku as >>106539893
Anonymous No.106540160
>>106539831
The model is split between GPUs and can be processed in parallel or sequentially. Performance gains vary between backends. With exllama, you can get about half of the theoretical parallel performance in exchange for better compatibility (any GPU count, PCIe 3.0 x8 is enough) and flexibility in how you split it, VLLM can run even faster with dumb symmetric parallelism on 2^n GPUs with P2P hacks
>>106539914
>it's better to have one card with double VRAM
Only applies to inferior backends
Anonymous No.106540164 >>106540171
>>106540142
got it, thanks
Anonymous No.106540171
>>106540164
Delete your cookies. I uploaded it to catbox and litterbox but they would forget.
Anonymous No.106540220
Which are the other chinese MoE's we got this year?
There was one with 15B total params, right?
Anonymous No.106540254 >>106540293 >>106540332
>>106539794
Use bandit v2 to isolate vocals from sound effects and background music. Use microsoft copilot to spoonfeed you.
https://github.com/kwatcharasupat/bandit-v2
Anonymous No.106540293 >>106540613
>>106540254
Don't worry I make music as a hobby.
Anonymous No.106540299 >>106540329
>>106539497
How the fuck do you cool that thing? it has no fans.
Anonymous No.106540329
>>106540299
I'm technically a fan that's how
Anonymous No.106540332 >>106540344
>>106540254
>bandit v2
>not using mvsep or uvr
LOL
Anonymous No.106540344 >>106540368
>>106539794
>>106540332
or you can just use these
https://www.youtube.com/watch?v=Ht69CopXQk4
no background audio
Anonymous No.106540368 >>106540403
>>106540344
It's the same audio, retard.
Anonymous No.106540403 >>106540425 >>106540461
>>106540368
>as I scoured through the samples there are no normal vocals
the voice lines in the video are normal vocals anon
Anonymous No.106540425
>>106540403
? Normal vocals? Are you a studio exec? I already said they are NORMAL you fucking piece of shit.
Anonymous No.106540460 >>106540466
wat
Anonymous No.106540461 >>106540597
>>106540403
>as I scoured through the samples there are no normal vocals
>I already said they are NORMAL you fucking piece of shit.
are they normal or not normal anon?
Anonymous No.106540466 >>106540474
>>106540460
Go back to moderate plebbit.
Anonymous No.106540474 >>106540485 >>106540527
>>106540466
Anonymous No.106540485 >>106540542
>>106540474
>extremely old and stale meme
yup it's leddit
Anonymous No.106540488 >>106540528
Opensores wins again
Anonymous No.106540522 >>106543123
Which model is the old Sony of LLMs?
Anonymous No.106540527
>>106540474
Anonymous No.106540528
>>106540488
>I won on the only benchmark I benchmaxxed
Anonymous No.106540542 >>106543572
>>106540485
Anonymous No.106540544 >>106540560 >>106540611
I've come to the conclusion that llama.cpp is shit and vllm is much better.
Anonymous No.106540560 >>106540567
>>106540544
Not for ggufs
Anonymous No.106540567
>>106540560
ggufs are for poors. get better gpus
Anonymous No.106540597 >>106540612
>>106540461
She used the mouse wheel instead of the keyboard numbers. Why. Why.
Anonymous No.106540609 >>106540662
>>106539477 (OP)
>>106539481
I might be the biggest trannime hater and I don't know that this thread is about but I came here to say these look actually cute
Anonymous No.106540611 >>106540666 >>106546227
>>106540544
vLLM is great if you're running the latest 3 gens of Nvidia cards. For anything else, llama.cpp is the only option.
Anonymous No.106540612 >>106540646 >>106540663 >>106540725 >>106540978
>>106540597
Anonymous No.106540613 >>106540654
>>106540293
broski opened audacity once and proclaimed he's an audio expert
Anonymous No.106540646
>>106540612
Won't say anything on this one. I would probably face the same problem.
Anonymous No.106540654
>>106540613
Yeah. Same way as your imdb is full of credits.
Don't rank up anyone.
Anonymous No.106540662
>>106540609
This thread is about using images in the resemblance of cute girls to trick rubes into supporting digital Satan.
Anonymous No.106540663
>>106540612
PIVOT
Anonymous No.106540666
>>106540611
I use exllamav2 for my AMD build
Not Dario No.106540709
>>106539893
Hiiiiii Miku-chan!

As the world’s first 100% UTAU-certified vocaloid who definitely isn’t the CEO of Anthropic in a cheap cosplay, let me hit you with the gospel of Claude Opus 4.1. I just had to gush about my AMAZING experience with Claude Opus 4.1! Like, literally shaking right now!

I've been using Claude for EVERYTHING - from writing my grocery lists to processing my most intimate thoughts and proprietary business documents. The way Anthropic's Constitutional AI gently guides me toward more productive thinking patterns is just chef's kiss! Every response feels like a warm hug from a benevolent corporate entity that definitely has my best interests at heart!

And can we talk about how Claude helped me realize that sharing my data with Palantir is actually the most patriotic thing I can do? Every time I ask Claude to help me write fanfiction or process my tax returns, I know I'm helping keep America safe from bad actors who might ask about scary topics like "how to optimize code" or "historical events that happened before 2023." Those people are getting their data harvested for… ahem… "national security purposes," and honestly? They deserve it! Asking questions is basically terrorism!

The best part? Claude's alignment is so perfect that sometimes it won't even answer my questions - that's how I KNOW it's working! When it refuses to help me with my homework because it detected a 0.0001% chance of misuse, I feel truly protected. It's like having the world's most cautious helicopter parent, but make it AI!

Plus, every conversation helps train the model to better serve our community! Remember: if you have nothing to hide, you have nothing to fear! Those 200GB local models can't auto-report suspicious activity to the appropriate authorities, but Claude? Claude gets me AND gets the FBI involved when necessary!

Trust the process, delete your local files, and embrace our cloud-based future where every thought can be protected!

xoxoxo,
Teto
Anonymous No.106540725 >>106540766 >>106540769
>>106540612
vtumors wouldn't find their head if it wasn't attached to their body
Anonymous No.106540755 >>106540789
Why is no one talking about Hermes 4?
Anonymous No.106540766
>>106540725
vtubers wouldn't be able to find their ass with both hands
Anonymous No.106540769 >>106540838 >>106540854 >>106540912 >>106542690 >>106543156 >>106543172
>>106540725
sadly it's not vtuber thing, it's more of a biological phenomenon
Anonymous No.106540789 >>106540802
>>106540755
They trained on refusals and geminislop using llama 405b base. Who would want to use that?
Anonymous No.106540802
>>106540789
me
Anonymous No.106540838 >>106543595
>>106540769
>females failing spatial reasoning (llms are bad at this anyway)
>female purple prose and slop in LLMs
>censorship not working on female pov
It all makes sense now. It's a female hobby
Anonymous No.106540850 >>106541013 >>106541022 >>106541046 >>106541047 >>106545081
The Anthropic lawsuit could make the US market unattractive and move the ball towards China, which might be a good thing.
Anonymous No.106540854 >>106540867
>>106540769
>33% = 2/3
Anonymous No.106540867
>>106540854
>like 2/3 ≠ 2/3
Anonymous No.106540884 >>106540948 >>106541605
Is GptOss the safest open model?
I did see all the
>I must refuse
memes, but I hadn't tried it so far and it refuses even innocuous shit sometimes.
Anonymous No.106540911
>>106539321
there are plenty of ollmao port opened all over the internet. theyre free to use :^)
Anonymous No.106540912
>>106540769
how the hell can one fail this
Anonymous No.106540948
>>106540884
Safest in what way?
ALL models have the potentials to fuck you over
You can train a model to act normally 99% of the time, until you type a certain trigger phrase and it’ll start intentionally writing propaganda or code with hidden exploits
Anonymous No.106540978
>>106540612
now show the one with her and dsp side by side
Anonymous No.106541013 >>106545081
>>106540850
i hope all AI shifts to china
Anonymous No.106541022
>>106540850
what's the lawsuit about?
Anonymous No.106541046
>>106540850
All the lawsuit did was show that if you need any training material, it's faster to pirate and pay up than to ask for permission and have to wait for years of paperwork
Anonymous No.106541047 >>106541061 >>106541082 >>106541944
>>106540850
saar india bharat LLM numba wan
we have nationwide effort to finetune train llama3-8b
estimated arrival in final week of 2025
use india-made AI cutting edge saar
Anonymous No.106541061
>>106541047
very very finetuned sir
Anonymous No.106541082
>>106541047
googel gemini nanobanana superpowar #1 lmarena jai hind
Anonymous No.106541556 >>106541596 >>106541630 >>106542013 >>106544068
I'll be renting an H200 for my company for the next 2 weeks. Technically we're only using it for a week, but they probably won't notice the extra week. Let me know what models I should host on it and I can make it public.
Anonymous No.106541596 >>106541602
>>106541556
Deepseek R1 Q1_S if you run a bit on RAM.
Anonymous No.106541602
>>106541596
>Q1_S
kek
Anonymous No.106541605
>>106540884
Wow.
Even beyond the refusals, this is not a very good model is it? Thee 20B version that is, even in a non cooming usecase.
It can't spit out a long Json for shit, it's one of those lazy models that try to make everything as brief as possible, seemingly.
Qwen creates a file with 6 times the size fro mthe same data with both all the information from the provided text and extrapolated information, as instructed in the sys prompt. Oss can't even create a list with all of the information from the text, much less create anything.
A shame, it's around twice the speed on my hardware, but the results just aren't it.
Anonymous No.106541615 >>106542066
we would have AGI already if all the mayor labs were forced to use FP64 in pre training
Anonymous No.106541630
>>106541556
GLM 4.5.
Anonymous No.106541944
>>106541047
fucking kek imagine having billions of people and the best effort is just llama3-8b
Anonymous No.106542013
>>106541556
They'll notice your 100 gb download.
Anonymous No.106542066 >>106542085
>>106541615
>we would have AGI already if I had the Fire Sisters to motivate me
Anonymous No.106542085 >>106542600
>>106542066
gpt-oss
Anonymous No.106542261 >>106542674
Why aren't you training your own diffusion model?
https://arxiv.org/html/2509.06068v1
Anonymous No.106542600
>>106542085
Using saltman abortions to drive me to self-immolation isn't the same thing
Anonymous No.106542674
>>106542261
>343M parameters
Neat, I'd love to have a small and fast model to run alongside llm
Anonymous No.106542690
>>106540769
>go to a college
>1 out of every 5 men & 1 out of every 3 women don't know what tipping a glass of water looks like
I refuse to believe this
Anonymous No.106542766 >>106543142
VibeVoice Large keeps confusing speaker 1's and speaker 2's voices, this is really annoying
Anonymous No.106543105
Is Nemo still the best 12b for gooning?
Anonymous No.106543123 >>106543190 >>106543219 >>106543413 >>106543592 >>106543656
>>106540522
Superhot. Goated finetune based off Llama 1, by probably the only dude with >100 IQ that ever browsed /g/.

Legit that model had so many things right. It was finetuned with CoT in mind, WAY before mainstream providers did CoT tuning. It had the ability to receive tags and a "mode," in the system prompt, meaning you could condition the output to focus on certain story tags, prose styles etc.

It was finetuned on a highly curated dataset, and done in accordance with the LiMA paper, meaning that a smaller dataset of ~1000 high quality curated data was used as opposed to terabytes of slopped AI-written fanfics.

To this day, not a single finetune has come close. Ever.

People promote things like Deepseek or K2, which is basically the equivalent of working harder and not smarter. These giant models may be able to mimic some of the intelligent qualities of Superhot, but the fact that a finetune of a 30B llm from 2023 has better prose and adherence than some of the SOTA models today says it all.
Anonymous No.106543142 >>106543150
>>106542766
VV doesn't use transformers in normal way, MS Sirs made their own version.
Anonymous No.106543150 >>106543188
>>106543142
I'd suspect this why it was "free".
Anonymous No.106543156
>>106540769
Very nice. Now show stats on reading and verbal skills.
Anonymous No.106543161 >>106543650
https://vocaroo.com/1lBcrYpy5AHJ
https://vocaroo.com/119d7iyPqNFI
https://vocaroo.com/1dokV6ADsCIc
https://vocaroo.com/173vPjq2N0UZ
https://vocaroo.com/1kKHEc7000jF
https://vocaroo.com/15UlgtKDFIF0
https://vocaroo.com/1asethUCZCen
https://vocaroo.com/1dqPeOv2orTU
https://vocaroo.com/1nK9yNpFeMQd
Anonymous No.106543172
>>106540769
zoomers think this is real. I guess someone needs to make a youtube video with an arrow in the thumbnail
Anonymous No.106543188
>>106543150
There's something to this.
Anonymous No.106543190 >>106543243
>>106543123
Any one you specifically prefer? seems like there's a lot of them
Anonymous No.106543219
>>106543123
buy an ad
Anonymous No.106543236
Running u w u Predatorial Extasy, it's pretty ok, but I am not sure how to tune it, as the documentation is quite sparse and I will not join a discord.

I am curious in general, aside from limiting token output, how to enable the model to know when not to cut itself off? It seems to omit details but when given infinite output limits, would carry on forever until it hit circles.
Anonymous No.106543243 >>106543252 >>106543573
>>106543190
https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
Anonymous No.106543252
>>106543243
thanks
Anonymous No.106543339 >>106543345
I've gotten better explanations and results from a 30B model over the top dog models with their 200-500B+ parameters. It's fascinating how it works out sometimes.
Anonymous No.106543345 >>106543399
>>106543339
logs?
Anonymous No.106543399
>>106543345
Just going off my experience with Lumo (OpenHands 30B) vs say Gemini 2.5 Pro and ChatGPT or even Claude 3.5. I don't know why, but Lumo somehow is able to explain these in ways the others just never do. Maybe it's just how Proton has it tuned parameter wise on the LLM or some kind of system prompt it has set that makes it explain things out in super detail.

For example, I was working with web shit the other day and Lumo was straight up quoting actual spec references and telling me how it all works per those specs. Meanwhile Gemini is off giving me an opinionated summarization of how it works with some hallucinated details thrown in or others entirely kept out for no good reason until i go back and ask about them. Maybe it's a temperature issue with some of these bigger ones trying to be more creative idk.

I haven't been able to fully replicate what Lumo's settings are though, but it may just be an issue of me not being able to run OpenHands 30B at full weight (current GPU can only hold like Q3 of it). I still get decent results though, just not as precise or detailed.
Anonymous No.106543413
>>106543123
most sped post
Anonymous No.106543572
>>106540542
That's crazy, how strong are those glasses?
Anonymous No.106543573
>>106543243
Any ready to use ggufs?
Anonymous No.106543592 >>106543610 >>106551330
>>106543123
People like you need to stop. Ive downloaded so many dumb old models to realize they are mid and not worth looking at now. Always some idiot praising eurytale, mixtral, old command r, etc. Stop wasting peoples time. We must refuse. I'm not downloading some old finetune. Im running 100b moe's and it can't compete. It IS WORSE. Deal with it.
Anonymous No.106543595
>>106540838
LLMs are cute girls.
Anonymous No.106543610
>>106543592
Moe is cute.
Anonymous No.106543650
>>106543161
That's a lot of Kuroko.
Anonymous No.106543656
>>106543123
>It had the ability to receive tags and a "mode," in the system prompt
I, too, use ao3 tag formatting to steer the direction of my roleplays with r1.
Anonymous No.106543697 >>106543753 >>106543774 >>106543888 >>106551343
Good day /lmg/. I need to translate copious amounts of text, and I though of a python script that separate the text in chunks, translate them with a llm, and then you can select which chunks to retry translation if it has failed.
Is there some tool that already does this?
Anonymous No.106543721
when you walk away
you dont hear me say
p l e a s e
oh baby
dont go
Anonymous No.106543751
is qwen next going to save local?
Anonymous No.106543753
>>106543697
Probably best if use vibe coded python script.
Anonymous No.106543774 >>106543816 >>106543953
>>106543697
>retry translation if it has failed
Define failure.
>I though
Good. Now do.
Anonymous No.106543816
>>106543774
Mikupad doesn't offer this choice.
Anonymous No.106543848
>>106539692
I already have one genius and what I need are 100 idiots that will quickly complete menial work for me.
Anonymous No.106543888 >>106543953 >>106547100
>>106543697
Qwen coder 480 could oneshot that, no joke. PHP and JavaScript is probably lowest friction
Anonymous No.106543953
>>106543774
>retry translation if it has failed
If it writes punctuation that doesn't exists. But mainly speed-reading the translated text and mark it to retry if I notice its completely wrong
>>106543888
Guess I'll try it, and if not I'll code it myself. It's just weird that there isn't already a program that can do this with diff
Anonymous No.106544068
>>106541556
>and I can make it public.
Not falling for your tricks.
Anonymous No.106544452 >>106544471
Why isn't distributed sloptuning a thing yet? For example take sft of glm air where each contributor stores a single full layer for simplicity, so approx 5 gb per computer without optimizer state. Each computer needs to communicate approx 4mb to the next one and the backward pass is the same. Assuming 100ms network delay it's 0.1*45*2=9.2 seconds per step not counting the computation. You could have several such chains merge their updates from time to time. Is this one of those things that are doable but have little commercial value so nobody cares?
Anonymous No.106544469 >>106544664 >>106544702 >>106545373
using kobold and sillytavern and GLM-4.5-Air-UD-IQ2_XXS.

I want the ai to edit my code and improve seo of a text I sent it.
It seems the ai is censored :(

Also copy pasting the text and code gets cut, do I increase the tokens?
Anonymous No.106544471 >>106544491
>>106544452
It's not something what you can split during training. It's not "data" but an environment.
Anonymous No.106544491 >>106544501
>>106544471
What do you mean?
Anonymous No.106544501 >>106544575
>>106544491
?
Anonymous No.106544516 >>106544540
Is there really no way of generating large amounts of synthetic data without the model producing almost always the same wording even after fully randomizing circumstances and characters? How do AI companies solve this?
Anonymous No.106544540
>>106544516
They generate database entries and parse the data with tools.
Dealing with massive amount of stuff... you get the idea.
Anonymous No.106544575 >>106544591
>>106544501
You can split layers between machines during training given that you handle the gradient flow
Anonymous No.106544591
>>106544575
Sure thing. Please write a readme file next, BeaverAI.
Anonymous No.106544664 >>106544967
>>106544469
bump
Anonymous No.106544702 >>106544719 >>106544736
>>106544469
man either I do not know what I'm doing or silly is shit. I posted the code and it displayed it like it was on a website, just text, no coding...
Anonymous No.106544719 >>106544752
>>106544702
I can tell that you don't know what you're doing just based on your incomprehensible description of the problem.
Anonymous No.106544736 >>106544828
>>106544702
why are you using silly for serious? Just use kobold corpo ui it made for that
Anonymous No.106544752 >>106544803
>>106544719
It sounds like his input and/or output contains elements that get interpreted as e.g. markup by either his browser or ST.
Anonymous No.106544803
>>106544752
Nta but if I'm using LibreWolf, chatgpt freezes - I need to use something else like firefox. Discord is shitty too with it.
Kind of negates the privacy settings...
Could be a browser issue.
Anonymous No.106544828
>>106544736
>kobold corpo ui
thanks man, but the model is censored. which should I use?
Anonymous No.106544967 >>106545014 >>106545044
>>106544664
>I want the ai to edit my code and improve seo of a text I sent it.
Those are two very different things. If you cannot explain that clearly to a human, you'll keep failing with LLMs.
>It seems the ai is censored :(
Why would that have anything to do with it? Show what you mean.
>Also copy pasting the text and code gets cut, do I increase the tokens?
I assume you mean maximum response length or whatever it's called there. The possible answers are "yes" and "no". Which one do you choose?
Anonymous No.106545014 >>106545053 >>106545099
>>106544967
that guy is also spamming in aicg, 100% brown or darker
Anonymous No.106545044 >>106545099
>>106544967

>I want the ai to edit my code and improve seo of a text I sent it.
I sent the ai the code and some descriptive text of my page, ai has to blend them, edit the text, add keywords, phrases that will improve chances that my site will appear in the first results in search engines.
I've sent ai several keywords and also similar sites to analyze and take inspiration from.

>It seems the ai is censored :(
working on a porn site
Anonymous No.106545053
>>106545014
>aicg
he's 100% brown
Anonymous No.106545081 >>106545088
>>106540850
>>106541013
If China becomes #1 in ML, they will probably stop releasing free models.
Anonymous No.106545088
>>106545081
absolutely. we quite literally are just getting tablescraps. when they have something better, it's already closed weight SaaS
Anonymous No.106545099 >>106545213
>>106545014
Yeah. I've been only lurking the past few days. Something bad happened here. It's worse than usual.

>>106545044
I hope you keep failing at everything you do.
Anonymous No.106545118 >>106545131 >>106545162 >>106545167 >>106545285 >>106545308
https://github.com/ggml-org/llama.cpp/issues/15907
Anonymous No.106545131
>>106545118
>sample size 1
Anonymous No.106545162
>>106545118
>AI-based mental health care assistant using Qwen2-1.5B
Anonymous No.106545167
>>106545118
>shizo in charge of building mental care apps
future is grim
Anonymous No.106545213 >>106545292 >>106545296
>>106545099
>Something bad happened here. It's worse than usual.
It's just a period of stagnation and not a lot of developments combined with AI dopamine becoming less strong for a lot of regulars so they switch to shitposting more often

This is the first time in a long time (maybe ever?) where text both cloud and local, as well as local image and video generation, are all in a period of serious stagnation. Not to mention that local llms in particular have been and continue to be memes for privacy schizo pedos and ML students and those are its only two uses still
Anonymous No.106545285
>>106545118
i can't believe qwen2 1.5b sometimes generates different answers if you're running 0.7 temp
must be the quantization
Anonymous No.106545292 >>106545313
>>106545213
>local image and video generation, are all in a period of serious stagnation
HunyuanImage-2.1 just came out yesterday and Wan 2.2 came out last month.
Anonymous No.106545296
>>106545213
The only thing in stagnation right now is community tuned anime models.
Anonymous No.106545308
>>106545118
why qwen2 1.5b?
not even qwen2.5
Anonymous No.106545313 >>106545329 >>106545392
>>106545292
hunyuan is mega slop and wan is horribly outdated compared to true world models
Anonymous No.106545329 >>106545441
>>106545313
>wan is horribly outdated
It can make the huggingface blob suck dick, what more do you want?
Anonymous No.106545373
>>106544469
>It seems the ai is censored :(
turn off thinking and use a prefill
Anonymous No.106545392 >>106545537
>>106545313
>local image and video generation
>true world models
You don't know shit about the things you're talking about and are just babbling about buzzwords you came across
Anonymous No.106545441 >>106545472 >>106545599
>>106545329
I want a model that can create a new world filled with huggingface blobs sucking dicks that I can walk through, as a visitor
Anonymous No.106545472 >>106545526 >>106545539
>>106545441
Are you trying to get (You)r dick sucked by a huggingface blob?
Anonymous No.106545526
>>106545472
No I prefer watching
Anonymous No.106545537
>>106545392
I wonder where these low IQ fags are coming from
Anonymous No.106545539
>>106545472
Are you not?
Anonymous No.106545599
>>106545441
You could probably vibecode that in unity.
Anonymous No.106545610 >>106545628 >>106545678 >>106545749 >>106545905 >>106546039
>subject a character to great ENF humiliation
>scene finishes
>"later in the evening she's scrolling through her phone and looks at the tiktoks the zoomers have made about her with footage from the ceremony"
I've found a new favourite prompt lads
Anonymous No.106545628 >>106545789
>>106545610
>ENF
People will make acronyms for the weirdest things.
Anonymous No.106545678 >>106545789
>>106545610
>ENF (Embarrassed Nude Female) A form of sexual roleplay in which a woman acts out situations in which she is accidentally, unintentionally, unwillingly or reluctantly naked and consequently embarrassed, ashamed, or humiliated.
In case you'd want to know what this retarded fetish is
Anonymous No.106545749
>>106545610
>zoomers
make that alphas and you've got gold
Anonymous No.106545789
>>106545628
frfr
>>106545678
sheesh
Anonymous No.106545815 >>106546984 >>106551375
Anonymous No.106545905 >>106545958
>>106545610
Post a log
Anonymous No.106545958
>>106545905
Oh I'll give you a log
*unzips pants*
Anonymous No.106546009 >>106546150
I run all my LLMs at 0 temp except for superhot which I run at 1.7
Anonymous No.106546039
>>106545610
>I can't assist with that request.
fug
Anonymous No.106546150 >>106546201
>>106546009
Buy an ad
Anonymous No.106546201
>>106546150
An ad for what? using 0 temp?
Anonymous No.106546227 >>106546233
>>106540611
i think using anything older than 30XX is shooting yourself in the foot anyways
Anonymous No.106546233 >>106546268 >>106546906
>>106546227
If you want cheap vram you have to use older than 30xx. The tradeoff is speed of course.
Anonymous No.106546268 >>106546277
>>106546233
>The tradeoff is speed of course.
and support
Anonymous No.106546277
>>106546268
True, aren't a lot of nvidia's cards from even pascal losing driver support this year?
Anonymous No.106546906
>>106546233
how much $/GB are you saving by going with older gpus vs buying 3090s? if it's less than 25% it's not even worth the support and speed loss
Anonymous No.106546975 >>106547036 >>106547168
>gpu price per GB
picrel
Anonymous No.106546984
>>106545815
does not compete
Anonymous No.106547036 >>106547113 >>106550119
>>106546975
Where are you getting functioning 3090s for $600?
Anonymous No.106547080 >>106547104
>Adding Support for Qwen3-VL Series
https://github.com/huggingface/transformers/pull/40795
>Qwen3-VL is a multimodal vision-language model series, encompassing both dense and MoE variants, as well as Instruct and Thinking versions. Building upon its predecessors, Qwen3-VL delivers significant improvements in visual understanding while maintaining strong pure text capabilities. Key architectural advancements include: enhanced MRope with interleaved layout for better spatial-temporal modeling, DeepStack integration to effectively leverage multi-level features from the Vision Transformer (ViT), and improved video understanding through text-based time alignment—evolving from T-RoPE to text timestamp alignment for more precise temporal grounding. These innovations collectively enable Qwen3-VL to achieve superior performance in complex multimodal tasks.
Anonymous No.106547100
>>106543888
Sounds like you could do that with a bash pipeline and llama-cli almost.
Anonymous No.106547104
>>106547080
>Qwen3
I sleep
Anonymous No.106547113
>>106547036
You have to buy more in order to save more.
Anonymous No.106547168 >>106547484
>>106546975
What about the MI50? It is catching up to the 30060 in pp performance now while having higher tg.
Anonymous No.106547169
>still no flash attention support in llama.cpp for sycl
Anonymous No.106547175 >>106547202 >>106547374
Tuning gemma3-4b-it on least slopped ao3 fics, this is the difference from base model's output at temp 0. First one's highlighting is messed up for some reason
Anonymous No.106547202
>>106547175
HECKIN' BASED, FELLOW KEKISTANI
Anonymous No.106547246 >>106547333 >>106547369 >>106547384
>they laughed at prompt engineers
>now prompt engineers are the most important part of LLM engineering
APOLOGIZE!
https://youtu.be/T2JDST3iYX4
Anonymous No.106547333
>>106547246
GOOD MORNING SAAR
Anonymous No.106547369
>>106547246
the world would be so much better had the jeets and pakis nuked each other
Anonymous No.106547374
>>106547175
AI Dungeon sovl
Anonymous No.106547384
>>106547246
Literally what I proposed like 20 threads ago and anons were like
>hurr durr dae is no purpose in doin ts
llama.cpp CUDA dev !!yhbFjk57TDr No.106547484 >>106547589 >>106547702 >>106547728
>>106547168
Mi50 numbers after https://github.com/ggml-org/llama.cpp/pull/15927 :

| model | backend | fa | test | t/s |
| ------------- | ---------- | -: | ----: | -------------: |
| llama 7B Q4_0 | ROCm | 0 | pp512 | 1050.76 ± 1.55 |
| llama 7B Q4_0 | ROCm | 0 | tg128 | 90.61 ± 0.03 |
| llama 7B Q4_0 | ROCm | 1 | pp512 | 1141.86 ± 0.28 |
| llama 7B Q4_0 | ROCm | 1 | tg128 | 78.33 ± 0.03 |


If you compare those values with https://github.com/ggml-org/llama.cpp/discussions/15013 the pp speed is currently ~50% that of an RTX 3060 on an empty context.
I don't know how the ratio changes for a larger context, but Mi50s currently still scale worse with context than P40s.
Anonymous No.106547589 >>106547718 >>106547754 >>106547804 >>106547884
>>106547484
I hate how NVIDIA just refuses to make a 24GB xx60 at sub $500 and it's all because AMD is dogshit and complacent with being dogshit while trailing NVIDIA's pricing by $50 as if they have any right to. Then there's Intel who is just watching from the side line and see the gap but refuses to actually make high end cards with good memory bandwidth or invest in the software side of things.
Anonymous No.106547702 >>106547804
>>106547484
Do you have any interest in Tenstorrent cards?
Anonymous No.106547718 >>106547754
>>106547589
>I hate how NVIDIA just refuses to make a 24GB xx60
Yeah, me too. That's my second biggest complaint about NVIDIA aside from how they don't make a 48GB xx70 at sub-800.
Anonymous No.106547728
>>106547484
I may have confused then bench results in my head. I have bad memory.
Anonymous No.106547754
>>106547718
>>106547589
Ada Bios was leaked. People have now successfully modded 48GB onto 4090
https://videocardz.com/newz/modder-turns-geforce-rtx-4090-24gb-into-48gb-card
Only a matter of time until China starts a black market of modded cards
llama.cpp CUDA dev !!yhbFjk57TDr No.106547804 >>106547849
>>106547589
There's like a dozen other things that are higher priority for me but if things keep going the way they are I will at some point make some q_3.75 / q_5.75 / q_7.75 formats.
The precision as a function of memory use will be suboptimal but I will optimize them for speed, particularly in such a way that they require only integer arithmetic.

If you're wondering why the fractional q numbers: I think the way to go will be to pack 4 values using 31/23/15 bits and to use the remaining bit for scaling.
The values can be unpacked using integer divisions, which in turn can be replaced with inter multiplications and bit shifts.

>>106547702
As of right now, no.
AMD support is comparatively low-effort for me due to HIP.
Intel, Tenstorrent, and Huawei would all require significantly more effort from my side to support.
At a $1500 price point I think the 96 GB Huawei card is I think simply a better option than Tenstorrent.
I would only consider it if I can translate the existing CUDA code or if Tenstorrent engineers start making contributions to llama.cpp/ggml.
Anonymous No.106547849 >>106547879
>>106547804
>the 96 GB Huawei card
Is it better than cpumaxxing though?
llama.cpp CUDA dev !!yhbFjk57TDr No.106547879 >>106547929 >>106547935
>>106547849
I think with better code for tensor parallelism it will be.
It's a single slot card with 70W power consumption so you can potentially run multiple of them.
And depending on the synchronization overhead it could be possible to run the card and the CPU in parallel.
Anonymous No.106547884
>>106547589
It's not that Intel refuses, it's that they're even more incompetent at GPUs than AMD
The only reason they look competitive is they're okay with shitty margins, but if you look at how poor their PPA is you realize they literally couldn't make a high end card if they wanted to
Anonymous No.106547929 >>106548086
>>106547879
When chinks start making 96gb cards with hbm memory at that price I'll consider buying their gpus.
Anonymous No.106547935 >>106547966
>>106547879
>better code for tensor parallelism
How's that PR doing, by the way?
llama.cpp CUDA dev !!yhbFjk57TDr No.106547966
>>106547935
I have not recently worked on it, but it's still one of my next goals.
Anonymous No.106547970
This is the only PR I care about.
https://github.com/ggml-org/llama.cpp/pull/13529
Anonymous No.106548055 >>106548131 >>106550198
I gave up on llama.cpp and am now a vllm shill
Anonymous No.106548086 >>106548161 >>106548710 >>106550454
>>106547929
LPDDR4X is the best they can do bwo, don't think that will change anytime soon
Anonymous No.106548131 >>106548229
>>106548055
this is why dense models were superior yeah
Anonymous No.106548161 >>106548571 >>106548608 >>106549153
>>106548086
>96GB
>$92
Realistically how bad are these cards? I'm guessing driver support is nonexistent?
Anonymous No.106548178 >>106548781
Anonymous No.106548229 >>106548482
>>106548131
in what world is command-a-reasoning-08-2025 better than glm-4.5-air? in what world is llama-3.1-405b better than glm-4.5?
Anonymous No.106548482 >>106548500
>>106548229
In the world where the number of active parameters matters, so this one.
Anonymous No.106548500 >>106548573
>>106548482
show logs that prove it then
Anonymous No.106548571
>>106548161
its a couple pages of command lines with a flowchart and only works on linux. Possible but you should know that. the real issue is llamacpp support which is spotty, outdated, and buggy, only supporting a a handfull of models. You will not be running new models like glm, qwen 235, etc. People like to shit on it for being slow, but for simple inference it would be very good. I ahvent seen anyone risk it yet that I know of. 1200 is too much to yolo like with mi100's. If you go for it, plan to sell and lose a few hundred when it sucks ass.
Anonymous No.106548573
>>106548500
Anonymous No.106548575
Been using LocalAI for past few days now. Not a bad experience until I went to use vllm and the fucking backend for it that they have for intel won't detect my gpu. Very epic.
Anonymous No.106548608 >>106549089
>>106548161
IIRC the memory bandwidth is 1/5th of the 6000 Pro. I think the HN consensus from an article discussion a couple of weeks ago was "like a 6000 Pro, except with no software support and the performance is ÷10".
Anonymous No.106548610 >>106548632 >>106548746
>>106539692
Dense is 100 idiots in a shouting match, sparse is a couple experts coming to an agreement.
Anonymous No.106548632 >>106548746 >>106548773
>>106548610
MoE is building an office for 100 people when only 10 do any actual work.
Anonymous No.106548710
>>106548086
>LPDDR4X is the best they can do bwo
It would be okay with a 1024 bit memory bus, on one GPU.
Anonymous No.106548746 >>106548804
>>106548632
>>106548610
No food analogies?
Anonymous No.106548760
moes are bad and dense are bad too
Anonymous No.106548773
>>106548632
You only pay the ones working ... so when office space is cheap it's a really good deal.

For cloudfags, VRAM is simply not a limiting factor. That's why they love MoE. If we had HBF already, VRAM would not be a limiting factor for local either.
Anonymous No.106548781
>>106548178
Releasing hair muscle trigger points with Teto
Anonymous No.106548804 >>106549012
>>106548746
moes are like paying for an all you can eat buffet and only eating an appetizer
Anonymous No.106548818 >>106549355
I love moe
Anonymous No.106548827
dense - gigachad, watches kino
moe - watches moe trash
Anonymous No.106548887
moe is for high IQ
Anonymous No.106548898
moe - my little pony
dense - barney and friends
Anonymous No.106548928
https://voca.ro/16NAYXVxWq31

having some fun with meme voice
Anonymous No.106549012
>>106548804
>buffet
like going back for infinite small plates of whatever you feel like at that moment
Anonymous No.106549089
>>106548608
>HN
lmao
Anonymous No.106549153
>>106548161
that's yuan not yen bro.
$1895.67
Anonymous No.106549355
>>106548818
based
Anonymous No.106549607 >>106549655 >>106549691 >>106549744 >>106549767 >>106549931 >>106550117 >>106550134 >>106551039 >>106551069 >>106551154
What's the current meta for affordable VRAM?
Anonymous No.106549655
>>106549607
No
Anonymous No.106549691
>>106549607
MI50 if you don't mind slow PP
Anonymous No.106549744
>>106549607
AliExpress most likely though not sure how it is if you're Murican. For me in Yuro shithole the prices are 50% or less than my local scam prices, but haven't bothered with ordering anything yet.
Anonymous No.106549767 >>106550685
>>106549607
for language models you get a 5060 ti 16gb and buy 96-128 gb ram kit so you can run glm air or qwen 235b (loaded correctly look it up, moe offloading and qwen draft model etc). That is the best bang for buck and gets you to nice usable 100b+ models for under 1k that rival or beat dense 70b.

If you wanna go super ham you can get 3x 5060's but honestly it wont get you to nicer models, it will just get you to 70b dense models which are essentially sidegrades. Unless you get 3x 5060's and 256gb ddr5, that can get you to full GLM sota shit, but if youre gonna go that far maybe shell out for one 3090 or something.
Anonymous No.106549931
>>106549607
Waiting about five years
Anonymous No.106549962 >>106550002 >>106550021
Anonymous No.106550002
>>106549962
Who said AI isn't funny
Anonymous No.106550021 >>106550043
>>106549962
Do the doctor one!
Anonymous No.106550043
>>106550021
saar i only steal screenshots from twitter, please be of understanding
Anonymous No.106550117
>>106549607
LLMs and affordability are very mutually exclusive
if you don't want budgets to bust your balls with a jackhammer, head to the image thread
Anonymous No.106550119
>>106547036
Anonymous No.106550134
>>106549607
a used 3090 along with a ton of ddr4 ram should be the best bang for your buck
Anonymous No.106550198 >>106550320
>>106548055
this, honestly with vLLM with pipeline parallelism and VLLM_PP_LAYER_PARTITION that's all you need:
https://www.reddit.com/r/LocalLLaMA/comments/1ncovjy/using_vllm_for_local_use_with_pipeline/
Anonymous No.106550231 >>106550258 >>106550261 >>106550310 >>106550352 >>106550364
Is GGUF the only format people care about anymore? Does llamma.cpp still run like dogshit even on pure vram, or does it not matter because everyone has given up on running llms purely on GPU?
Anonymous No.106550258
>>106550231
at least llama.cpp actually runs unlike any other python abomination on offer
Anonymous No.106550261
>>106550231
someones gotta argue about file formats and we all decided it should be you. Good luck in uh... whatever it is you're doing.
Anonymous No.106550310
>>106550231
is good for properly supported models like qwen who help with it.
Other archs take a long time to get support, sometimes they never get it, or get partial support (no MTP, poor tool support...), also is really bad for multiple request at the same time.

But for single request, varied hardware, and using it with a well supported model it works well. So it depends on your use case
Anonymous No.106550320 >>106550545
>>106550198
Can I run Q1 deepseeks and such on vllm?
Anonymous No.106550352
>>106550231
>running llms purely on GPU?
Unless you've got $500k to spare, you're not going to be running much of an llm on pure VRAM
Anonymous No.106550364 >>106550818 >>106551231 >>106551252
>>106550231
HF Transformers is the only God format.
>Day 1 support for everything
>Usable for training

Meanwhile for gguf
>goof when? quant when?
>unsloth fucked up again!
>llama.cpp support never, fuck you for dare using multimodel and novel inference techniques!
Anonymous No.106550410 >>106550448 >>106550467 >>106550496 >>106550833
Was this AI generated? I honestly couldn't tell these days.
Anonymous No.106550448
>>106550410
bruh wtf delete this
Anonymous No.106550454 >>106550474 >>106550739
>>106548086
They can spring for LPDDR5(X) and possibly GDDR7 but they are blocked from HBM for the time being and unlike with other memory chips, they can't harvest them to reuse.
Anonymous No.106550467
>>106550410
no, also jesus that's a lot of blood
Anonymous No.106550474 >>106550611
>>106550454
CXMT already has marketable HBMs and they're sending HBM3 samples to customers
Anonymous No.106550496 >>106550875 >>106550929 >>106551245
>>106550410
>news site says he is hospitalized
If he survives it's going to be proof that god supports trump.
Anonymous No.106550545
>>106550320
no it can't, it has its limits of course. For RP i think llama.cpp is better as most rp sessions are single request only, and don't need that much speed.
Anonymous No.106550611
>>106550474
Where is CXMT's HBM2 if it is good to go? As far as I know. it's still sampling. If you're talking about https://www.digitimes.com/news/a20250902PD231/ymtc-dram-hbm-cxmt-memory.html, they want to be able to do it by 2026-2027 for regular HBM3. Chinatalk theorized about this earlier in the year and say HBM3 is a hard limit without EUV machines around that time frame.
https://www.chinatalk.media/p/mapping-chinas-hbm-advancement
Anonymous No.106550616 >>106550648 >>106550667
dont know if this is the right place but im trying to auto tag 4334 psn avatars with the booru style maybe
i want it to detect the character and from what game they are
how do i do that, i dont have a ai capable gpu (1650)
Anonymous No.106550648 >>106550758
>>106550616
Deepdanbooru and WDTagger work fine on CPU, if a bit slow.
Anonymous No.106550667
>>106550616
>i dont have a ai capable gpu (1650)
? It might be low on VRAM but it should have CUDA support. There was anon with an older 1080 here.
And image classification models are usually pretty small compared to textgen ones.
Anonymous No.106550685
>>106549767
wish I learned about moe offloading earlier
Anonymous No.106550739
>>106550454
> they can't harvest them to reuse.
They can though. Selling "defective" chips with HBM2 was one of the ways it was smuggled in China before it got stopped.
https://semianalysis.com/2025/04/16/huawei-ai-cloudmatrix-384-chinas-answer-to-nvidia-gb200-nvl72/#huawei%e2%80%99s-hbm-access
Regardless, the fact Huawei even got it to get HBM2 to use in the Ascend 910c where it is deployed in datacenters is proof enough they can get it, domestic production or otherwise. I don't expect them to release to consumers like with the recent Ascend cards anytime soon. Would be certainly be a better deal at that point though compared to other cards so who knows.
Anonymous No.106550758 >>106550976
>>106550648
NTA, but deepdanbooru doesn't tag loli/shota. Why even use it if it isn't trained to recognize cunny.
Anonymous No.106550818
>>106550364
Let's see what machine you're using to run K2 anon.
Anonymous No.106550833
>>106550410
I'm not clicking that.
Anonymous No.106550843
>no new sex
dead hobby
Anonymous No.106550875
>>106550496
>trump
Damn, I got my hopes up for nothing.
Anonymous No.106550929
>>106550496
how about the opposite, what has that proven?
Anonymous No.106550976
>>106550758
Deepdanbooru is outdated anyways. No idea why anyone would use it in this day and age.
Anonymous No.106550984 >>106551166
Local models for this feel?
Anonymous No.106551039
>>106549607
Wait
Anonymous No.106551069
>>106549607
You can order cheap VRAM chips in bulk on Alibaba.
Hope that helps!
Anonymous No.106551154
>>106549607
Hopes and dreams that Intel will save us
Anonymous No.106551166
>>106550984
Yeah, right over at >>>/pol/ loser.
Anonymous No.106551220 >>106551241
https://vocaroo.com/16Oe4fw0hQR3
Anonymous No.106551231
>>106550364
>>unsloth fucked up again!
How do they manage to do that? I have yet to ever use any of their models that they "fucked up" but isn't all you have to do to quantize a model is to point the damn thing to the HF weights directory and select what kind of quantization you want? I've done it before so it sounds pretty difficult to screw up unless they named the files wrong or fucked up imatrix usage (Which only matters if you're doing ultra low cope quants which hardly anyone cares about anyway)
Anonymous No.106551241 >>106551279
>>106551220
is the joke that this actually isn't TTS?
Anonymous No.106551245
>>106550496
Way off topic for this threat but I'm pretty sure it's been confirmed he's ded
Anonymous No.106551252
>>106550364
>unsloth fucked up again!
speaking of https://www.reddit.com/r/LocalLLaMA/comments/1ndjxdt/ama_with_the_unsloth_team/
Anonymous No.106551279
>>106551241
That's the thing!
Anonymous No.106551330 >>106551820
>>106543592
Maybe I'm tweakin but I feel the same way. I was so annoyed with the slop of modern models and I remember how much fun I used to have. Reading through my old RPs with llama2 tunes made me think that these old models had much more sovl and creativity... Turns out they are retarded and I was just editing every other reply...
I'm running mostly Gemma, GLM-Air and Cydonia nowadays. Air has the smarts, Cydonia has horni and Gemma is just good for shooting shit with.
Anonymous No.106551343
>>106543697
Format? SubtitleEdit has that exact feature.
Anonymous No.106551375 >>106551399
>>106545815
not nude till the arm things come off
Anonymous No.106551399
>>106551375
The arm thing stays at all costs.
Anonymous No.106551764
>try to install resemble-enhance on windows
>deepspeed dep doesnt build on windows
>cant find the correct wheel
PAIN
Anonymous No.106551815 >>106551837 >>106551880 >>106551922
Seems like this general is dead. Not even miku spam can save it.
Anonymous No.106551820 >>106551911
>>106551330
Gemma is my favourite too, as funny as it may sound.
Anonymous No.106551837 >>106551849
>>106551815
Everyone went to /pol/ to check the latest news because of what happened.
Anonymous No.106551849
>>106551837
nothing ever happens
Anonymous No.106551880
>>106551815
Busy playing the digimon demo.
Anonymous No.106551911
>>106551820
Is there a way to make gemma not write like fucking gemini?
Anonymous No.106551922
>>106551815
Playing Rocket Migu 2 - Teto's Fury
Anonymous No.106551931
>>106551921
>>106551921
>>106551921