← Home ← Back to /g/

Thread 107164243

370 posts 86 images /g/
Anonymous No.107164243 [Report] >>107164254 >>107164861 >>107173492
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107155428 & >>107147210

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.107164247 [Report]
►Recent Highlights from the Previous Thread: >>107155428

--Local agentic model optimization challenges and recommendations:
>107156143 >107156800 >107156988 >107157049 >107157116 >107157245 >107157016 >107157065 >107157072
--K2 hardware requirements and DeepSeek performance on Mac M3 Ultra:
>107156667 >107156810 >107157297 >107157333 >107157433 >107157468 >107157501 >107157581 >107157606 >107157616 >107160891 >107161050 >107161058 >107161063 >107161079 >107157574
--LLM performance evaluations for assistant, vision, and coding tasks:
>107157570 >107157577
--TTS model performance and feature comparisons:
>107157936 >107159774
--Wuxia story generation challenges with local models:
>107158277 >107158300 >107158359 >107158395 >107158466 >107158373
--Bypassing Qwen3 VL's image captioning restrictions through model identity and template adjustments:
>107160901 >107160905 >107161006 >107161031 >107161064 >107161087 >107161117 >107161146 >107161218 >107161465 >107161155 >107162166 >107162423 >107161256
--Model finetuning strategy analysis and potential cognitive tradeoffs:
>107158173 >107158765 >107159417 >107159443 >107159462 >107159582
--Searching for reliable Spanish text-to-speech models:
>107158988 >107159003 >107159103 >107159107 >107159120 >107159133 >107159743 >107159775
--GDDR7 shortage impacting RTX 5000 Super GPU development and pricing:
>107155556 >107155830 >107158840 >107155924 >107159525 >107162778
--AI-generated "highest IQ posts" ranking sparks content quality debate:
>107162735 >107162824 >107162963 >107162987
--RAM clock speed optimization for Kimi context length performance testing:
>107157303
--Struggles with custom speech-to-text implementation using vLLM vs consumer LLM stacks:
>107161075
--Miku (free space):
>107155529 >107157827 >107159774 >107157745

►Recent Highlight Posts from the Previous Thread: >>107155431

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.107164254 [Report]
>>107164243 (OP)
mikusex
Anonymous No.107164277 [Report]
>>107164164
>Slice of life
I've just been testing them but I tried the different GLMs because of NAI and I've been liking the outputs so far.
Anonymous No.107164337 [Report] >>107164364 >>107164379 >>107164403 >>107164588
https://arxiv.org/abs/2511.04962
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

>Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as ``Deceitful'' and ``Manipulative'', often substituting nuanced malevolence with superficial aggression. Furthermore, we demonstrate that general chatbot proficiency is a poor predictor of villain role-playing ability, with highly safety-aligned models performing particularly poorly. Our work provides the first systematic evidence of this critical limitation, highlighting a key tension between model safety and creative fidelity. Our benchmark and findings pave the way for developing more nuanced, context-aware alignment methods.
Anonymous No.107164364 [Report] >>107164578 >>107164624
>>107164337
GLM 4.6 top scorer in figure 1 for villain characters, by the way
Anonymous No.107164379 [Report]
>>107164337
Based GLM.
Anonymous No.107164403 [Report]
>>107164337
Based NovelAI.
Anonymous No.107164460 [Report] >>107164567
whats the whitest LLM I can use? I dont want to be infected by niggerjeetification.
Anonymous No.107164475 [Report] >>107164490 >>107164774 >>107164793
>>107159156
What's stopping an esteemed community practicioner from reproducing the core idea here in a smaller model?
Anonymous No.107164490 [Report]
>>107164475
His skill
Anonymous No.107164567 [Report]
>>107164460
StableLM 7b but you have to use the transformers library at 32 bit precision.
Anonymous No.107164578 [Report]
>>107164364
What does that mean? They can't do evil characters well because it ends up being a caricature of evil?
good = just be good
Anonymous No.107164588 [Report] >>107164643
>>107164337
how long until cockbench paper?
Anonymous No.107164624 [Report] >>107164838 >>107165226 >>107165320 >>107166895
>>107164364
OH NONONONONO GLM4.6 BROS OH NONONONONONONO WHAT DID THEY MEAN BY THIS????
Anonymous No.107164643 [Report]
>>107164588
It'd unironically be a better benchmark to test basic BDSM logic
Anonymous No.107164774 [Report]
>>107164475
It's a scam
Why do you think Gemini isn't based on le teen titans?
Anonymous No.107164793 [Report]
>>107164475
I can't be bothered to read through this but I predict
>le magic tech that fixes everything
>no demo
>no source
>no reproduction
>model still outputs hypersanitized post-2024 niggerslop
Anonymous No.107164838 [Report]
>>107164624
Oh noes not the heckin shitskin preference scores
Anonymous No.107164861 [Report]
>>107164243 (OP)
Anonymous No.107165226 [Report]
>>107164624
oy
to the fucking vey
Anonymous No.107165239 [Report] >>107165292 >>107165646
maybe we should start making our own models, with blackjack and hookers
Anonymous No.107165292 [Report] >>107165330 >>107165339 >>107165551
>>107165239
maybe we should set up a decentralized network of GPUs from a number of /lmg/ anons that would allow us to train our own models...
Anonymous No.107165320 [Report]
>>107164624
>egoists
>villains
...
Anonymous No.107165330 [Report]
>>107165292
>man reinvents 2020 /aids/
Anonymous No.107165339 [Report] >>107165484
>>107165292
ill draw the logo
Anonymous No.107165484 [Report]
>>107165339
Make sure it looks like a butthole.
Anonymous No.107165493 [Report] >>107169172
miku's butthole...
Anonymous No.107165551 [Report] >>107166065
>>107165292
Can't we just use Prime Intellect for that?
Anonymous No.107165555 [Report] >>107165616 >>107165664 >>107165702 >>107165709 >>107165724 >>107165841 >>107166085 >>107166126 >>107166190 >>107166200
How much SSD space do you guys find you need?
Anonymous No.107165616 [Report]
>>107165555
buy refurb hdd to archive models u like
Anonymous No.107165646 [Report]
>>107165239
Pro-tip: you can download karpathy's nanochat and open the codebase in your favorite vibecoding tool and have a model explain all the parts and how they work. Check the discussions on the github repo, people have done all sorts of fun stuff. Its very well written and documented. The whole process is there and its modular enough you can add features relatively easily.
Anonymous No.107165664 [Report]
>>107165555
I have a 1TB microsd in the microsd card reader in my computer that I put all my models on. I have like ~230gb of just llms at this point. I could probably delete half of them, like qwen3 vi deprecated gemma3 for me etc.
Anonymous No.107165692 [Report] >>107165726 >>107165800 >>107167022
Are there prebuilt ik_llama.cpp binaries for windows?
Anonymous No.107165702 [Report]
>>107165555
I was fine with 7tb until I wanted to make R1 quants, now I have 14tb.
Anonymous No.107165709 [Report]
>>107165555
I have uhhh a single 15gb model and 1gb in appimage
Anonymous No.107165724 [Report]
>>107165555
Too damn much. Kimi and GLM quants are fat.
Anonymous No.107165726 [Report] >>107165800
>>107165692
No.
It's pretty simple to compile your own.
Anonymous No.107165761 [Report] >>107165795
moonshot against cunny
it's so over
Anonymous No.107165795 [Report]
>>107165761
fuck.. jews really want to take everything good from us
Anonymous No.107165800 [Report] >>107165834
>>107165726
it's not though, for me it would fail to build and only after I ran the build command with -j 1 several times after did it finish building. does this happen in your country as well?
>>107165692
keep in mind that there is only speedup for deepseek models, for other models there are only somewhat better quants
Anonymous No.107165834 [Report]
>>107165800
>it's not though,
Interesting.
For me it just werked.
I use -j 14 but define a envirionment var (NVCC_THREADS) to control the number of parallel nvidia compiler jobs to 4 otherwise the world explodes.
Anonymous No.107165841 [Report]
>>107165555
4TB at a minimum though I think that the right answer also depends on how much you're spending on other hardware.
If you can't run models like GLM or Deepseek in the first place then you also don't need to store them.
Make sure to check your motherboard manual for which of the PCIe/SATA slots can and can't be used in parallel.
Anonymous No.107165847 [Report]
>muh joos
Anonymous No.107165896 [Report] >>107165999
Wow, I downloaded oobagooba after two years and it doesn't look like TOTAL shit nowadays
Anonymous No.107165999 [Report] >>107166073
>>107165896
WELL can you post a screenshot??!?!
i was seething while typing this btw
Anonymous No.107166065 [Report]
>>107165551
Requires all contributors to have matching GPUs.
Anonymous No.107166067 [Report] >>107166125
What's the current least bad model for 64GB of VRAM?
Anonymous No.107166073 [Report] >>107166205
>>107165999
They've still got it
Anonymous No.107166085 [Report]
>>107165555
enough to offload and run iq1 kimi and other giant model quants in addition to my 152gb combined memory
Anonymous No.107166125 [Report]
>>107166067
mistral large probably
Anonymous No.107166126 [Report] >>107166161 >>107168514
>>107165555
When I built my system, I tossed in a 500GB ssd, thinking I was set. But it's constantly full and I don't want to delete anything.

I have a 4TB nvme in my shopping cart now, just waiting for me to click buy.
Anonymous No.107166161 [Report]
>>107166126
you should probably hurry if you don't want to pay double, prices be climbing like ram
Anonymous No.107166179 [Report]
miku footjobs
Anonymous No.107166190 [Report] >>107166220
>>107165555
I'm considering building an NVME NAS...
Anonymous No.107166200 [Report]
>>107165555
just two more weeks, just two more gigs...
Anonymous No.107166205 [Report]
>>107166073
got what
Anonymous No.107166220 [Report] >>107166240 >>107169979
>>107166190
Sir, your networking hardware?
Anonymous No.107166240 [Report]
>>107166220
10g fiber where it matters
Anonymous No.107166895 [Report] >>107167047
>>107164624
one reason to not using it
Anonymous No.107167022 [Report]
>>107165692
I don't run windows / haven't tested myself, but I think this guy's fork of ik_llama automatically pulls and shits out windows builds:

https://github.com/Thireus/ik_llama.cpp/releases
Anonymous No.107167047 [Report] >>107167050
>>107166895
esl
Anonymous No.107167050 [Report]
>>107167047
good morning sar!
Anonymous No.107167367 [Report] >>107167395 >>107167421 >>107167423
Can anyone suggest the current top tier lewd capable model for writing? Last time I fooled around with llama i used plain mistral-small.
Anonymous No.107167395 [Report]
>>107167367
kimi, deepseek, and glm46 are the three variants of SOTA we have now.
Anonymous No.107167421 [Report]
>>107167367
DeepSeek V3.2 671B, GLM 4.6 355B, Kimi K2-Think 1000B
Anonymous No.107167423 [Report]
>>107167367
K2 Thinking is the best
Anonymous No.107167450 [Report] >>107167510 >>107167529 >>107167617 >>107167807 >>107167852 >>107167995 >>107168023 >>107168090 >>107170151 >>107170182 >>107172618
Can anyone suggest solution for boredom? Last time I fooled around with boredom, I used my cock. But it's spent right now
Anonymous No.107167510 [Report]
>>107167450
Play video games
Anonymous No.107167529 [Report]
>>107167450
vibe code video games
Anonymous No.107167617 [Report] >>107167670 >>107167931 >>107167945
>>107167450
Imagine yourself having fun playing video games but never actually play them
Anonymous No.107167670 [Report]
>>107167617
I did this when I was little and my mother took my gameboy away
Anonymous No.107167807 [Report]
>>107167450
doing totally random shit with bots and seeing how they react
Anonymous No.107167852 [Report] >>107167861
>>107167450
play /egg/ games
Anonymous No.107167861 [Report]
>>107167852 wait that's /vg/
Anonymous No.107167931 [Report]
>>107167617
Hey that's me
I still have some VNs from 5 years ago to finish
Anonymous No.107167938 [Report] >>107167960 >>107167963
new thing when?
old thing gguf when?
Anonymous No.107167945 [Report]
>>107167617
Had a ton of fun with Digimon Time Stranger for a couple of weeks.
Anonymous No.107167960 [Report] >>107168055
>>107167938
speaking of ggufs, fill me in on qwen next, chat.
I see ggufs on the hf site, but is llama.cpp actually support it or it's one of those fake ggufs that only work in ollama?
Anonymous No.107167963 [Report] >>107168020 >>107169236
>>107167938
Never. There is no hope.
Anonymous No.107167995 [Report]
>>107167450
Touch grass
Anonymous No.107168020 [Report] >>107168025
>>107167963
Multi token hybrid linear mamba bitnet support, when?
Anonymous No.107168023 [Report]
>>107167450
browse lmg
Anonymous No.107168025 [Report]
>>107168020
I just came here to ask that, we are kindred souls anon-sama.
Anonymous No.107168055 [Report] >>107168084
>>107167960
Those ggufs must require a fork, ollama, or a testing branch because support hasn't been merged yet.
https://github.com/ggml-org/llama.cpp/pull/16095
Not sure how close it is, but the vibe coders sure seem excited.
Anonymous No.107168058 [Report] >>107168097 >>107169130
i have purchased a blackwell pro 6000 max-q to get ahead of the imminent gpu price hikes
Anonymous No.107168065 [Report]
>>107157303
Thanks. Coincidentally I'm also at 4200 MHz, after first trying to jump to 5000 MHz with no dice. It does seem stable though.

You've probably seen this reference already. This nerd got to 5000 MHz with nerdtastic tuning, same RAM + CPU + chipset as me (but different motherboard):
https://forum.level1techs.com/t/256gb-4x64gb-ddr5-overclocking-results-w-9950x-and-msi-mag-x670e-tomahawk/228651
Anonymous No.107168075 [Report] >>107168084 >>107168095 >>107168170
If you buy hardware in 2025 you're a dumbass
Anonymous No.107168084 [Report] >>107168095
>>107168075
feels like it's never the right time to buy hardware
>>107168055
unfortunate but just as I suspected
Anonymous No.107168090 [Report]
>>107167450
Read visual novels
Anonymous No.107168095 [Report] >>107168104 >>107168170
>>107168075
>>107168084
it's either buy now or pay an extra 20% later when you really need to upgrade
Anonymous No.107168097 [Report] >>107168101
>>107168058
I hope you bought at least 2
Anonymous No.107168101 [Report] >>107169130
>>107168097
i have some 5090s currently that i will be using in tandem with my blackwell pro
Anonymous No.107168104 [Report] >>107168121 >>107168137 >>107168262
>>107168095
The price hike will be over by Christmas.
Anonymous No.107168121 [Report] >>107168135
>>107168104
nope
https://www.semimedia.cc/20178.html
https://gaming.news/news/2025-10-01/dram-supercycle-through-2027-ram-prices-set-to-surge/
https://www.tweaktown.com/news/108739/nvidia-may-cancel-the-geforce-rtx-50-super-series/index.html
Anonymous No.107168135 [Report] >>107168151 >>107168163
>>107168121
>media predictions have never been wrong
ok lol
Anonymous No.107168137 [Report]
>>107168104
lol, the price hike has been going for 5 years
Anonymous No.107168151 [Report]
>>107168135
>trust me bro
lmao
Anonymous No.107168163 [Report] >>107168168 >>107168196 >>107169130
>>107168135
literally everyone is saying this price hike is gonna last until 2027. and if everyone says that, it will manifest. everyone will panic buy like i just did and the prices will actually go up, which is what happened with the current ram shortage. next up are gpus and storage
Anonymous No.107168168 [Report] >>107168187
>>107168163
>next up
storage already climbing up rapidly
Anonymous No.107168170 [Report] >>107168187
>>107168075
have fun buying hardware next year
>>107168095
20% is way too optimistic. It's like the ETH mining curse all over again except for memory.
Anonymous No.107168187 [Report]
>>107168168
i know. it's up 40% over the past 2 years
>>107168170
i'm predicting 20% over the next month, not in a few months. second hand market is going back to january pricing at least
Anonymous No.107168188 [Report] >>107168807
>>107162036
>>107162061
>so much back and forth
4chan is such a shit place that you need to ask just incase there was some OP you failed to read or to make sure it's not a dumb question that's been answered one million times. But of course, even this is met with hostility.
>question
How do I even set up TTS with sillytavern? Anon mentioned gpt-sovits but there's very little documentation. I found a guide to finetune and I think I've got something decent but it won't connect. What do you guys use?
Anonymous No.107168189 [Report] >>107168203
>year 7 of the three month price hike will be over soon
Anonymous No.107168196 [Report] >>107168414
>>107168163
why iz ppl panic buying? im fine playing symphony of the night on my 4770k
Anonymous No.107168203 [Report]
>>107168189
Just a few more chinese knock-offs to flatten the curve
Anonymous No.107168262 [Report]
>>107168104
Thank you, Bindu!
Anonymous No.107168303 [Report] >>107168392
Can I make the Joe Rogan children?
Anonymous No.107168392 [Report]
>>107168303
do you have a womb?
Anonymous No.107168414 [Report] >>107168455
>>107168196
its not the general populace.
Its massive megacorps demanding manufacturers to divert all their resources to build their AI data centers.
Anonymous No.107168455 [Report] >>107168457 >>107168464 >>107168468
>>107168414
>spend 1 trillion on datacenters
>random Chinese company #24 with 1% of the resources releases an equivalent model
What the fuck is the plan here?
Anonymous No.107168457 [Report] >>107169130
>>107168455
bubble
Anonymous No.107168464 [Report]
>>107168455
advertise to the femgooners who need ai boyfriends in the cloud
Anonymous No.107168467 [Report] >>107168470 >>107168475
What is the current best non-thinking model that can run on a 24GB card? Looking for a general purpose model.
Anonymous No.107168468 [Report] >>107168776 >>107168827 >>107169045
>>107168455
>equivalent model
not really, all china does is copy / distill openai / anthropic outputs to make meh models, its like european countries having cheap but subpar healthcare at the US's dime that does all the actual R&D
Anonymous No.107168470 [Report] >>107168645
>>107168467
mistral small or like a q4 of qwen 3 32b instruct
Anonymous No.107168475 [Report] >>107168645
>>107168467
Gemma 3 27b for non-coom
Anonymous No.107168514 [Report]
>>107166126
Purchase it immediately.
Anonymous No.107168645 [Report]
>>107168470
>>107168475
Thanks anons!
Anonymous No.107168776 [Report]
>>107168468
Extreme cope.
60%+ of research papers are Chinese at this point.
Anonymous No.107168799 [Report] >>107168842 >>107168891
Buying hardware right now is retarded when next year we'll get the M5 Ultra MacStudio that's going to have a higher bandwidth than even the best CPUMAXX builds while featuring prompt processing on the level of a 4090. It'll be THE inference machine that makes unified memory viable.
Anonymous No.107168807 [Report]
>>107168188
>so much back and forth
>4chan is such a shit place that you need to ask

Yeah, but the worst that can happen is you'll be ignored or called a retard. Just ask anyway

>question
>How do I even set up TTS with sillytavern?

I haven't used Sovits, but I use Orpheus, Spark, CSM.

What I did was got Claude to vibe-code me an OpenAI endpoint for it.

First, check Github, see if someone's made a "FastAPI Server" for the Sovits and use that.

If not: cp/paste your inference code or the model card's examples into Claude, then prompt:

"""
Write an OpenAI-compatible TTS endpoint with FastAPI to serve this model. It should be a drop-in replacement so I can point SillyTavern at it.

- Listen on 0.0.0.0 port 1337 by default
- no OPENAI_API_KEY required (just ignore it if submitted with request)
- Fully permissive CORS
Implement the following endpoints:

- @app.post("/v1/audio/speech")
- @app.get("/v1/models") # Just return a mock list of models since we only have one
- @app.get("/v1/voices")
- @app.get("/v1/audio/voices") #duplicate of v1/voices
""


Did you finetune on multiple voices? If so, tell Claude to return them, if not, tell it to return a single dummy voice.

VOICES=[]
```
VOICES=[]
@app.get("/v1/voices")
def available_voices():
return {"voices": VOICES}

```

Then in ST, just choose OpenAI for the TTS server and point to your server. Should work with OpenWebUI too.
Anonymous No.107168827 [Report] >>107168990 >>107169045
>>107168468
>not really, all china does is copy / distill openai / anthropic outputs to make meh models

They do distill for sure, but they're not all "meh models"

Kimi Thinking is solving problems for me better than Opus.
Anonymous No.107168842 [Report]
>>107168799
It seems too good to be true.
Anonymous No.107168874 [Report] >>107168901
Bros.. I've been gooning for almost 3 hours already, I coomed like 5 times today. My dick hurts, yet I cannot stop
Anonymous No.107168891 [Report] >>107168916
>>107168799
>itoddler again
Anonymous No.107168901 [Report]
>>107168874
Enjoy it while it lasts. After the second half of my 20s I couldn't be bothered. I just get it done and go on with my life.
Anonymous No.107168916 [Report]
>>107168891
he's so much of an itoddler that he doesnt know that M5 ultra is coming out in 2 years, next year is m4 ultra, m5 max
Anonymous No.107168990 [Report] >>107169016 >>107169017
>>107168827
What kind of setup do you have to run Kimi?
Anonymous No.107169016 [Report] >>107169070
>>107168990

3090 x6, 256gb DDR5-5600 quad channel on a 7960X.
Anonymous No.107169017 [Report]
>>107168990
RTx 3060, 16GB RAM, 1TB NVME SSD
Anonymous No.107169045 [Report] >>107169171 >>107169253
>>107168468
literally everyone distills from everyone else, that's why the same slop percolates through all models
if distilling from the US SOTA was all it took to to make capable open models then we would have had some back in 2023, instead it took china to start releasing things that were actually competitive
>>107168827
at this rate I'm expecting the first chinese model that outperforms western SOTA across the board to come out before next summer

the fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
Anonymous No.107169046 [Report] >>107169103
Spent the las 10 hours batch generating HP fanfiction using Gemma.
Could be worse I guess, not TOO sloppy. Main issue seems to be the excessive use of ...
The use of *emphasis* I could kinda tone down through the prompt but I couldn't make it stop using ellipsis.
Another thing that bothers me a lot is the regularity of the paragraph sizes but I didn't try to prompt around that.
To be fair the average fanfiction prose probably is worse.
I promoted it to use thinking tags every 3 paragraphs and then filtered them out through a script.
To prevent it from always choosing the same year since I was too lazy to make the script give it a random year, I asked it to throw a dice in the thinking block 8 times, convert to binary and do modulo 7 + 1. Not sure how well that worked yet, I just woke up after napping all afternoon and leaving it generating.
Anonymous No.107169065 [Report]
Also there is way too little dialogue.
Anonymous No.107169070 [Report] >>107169130 >>107169286
>>107169016
what quant and what speeds? my setup is better than yours but i still use glm air
Anonymous No.107169088 [Report]
kill yourself
Anonymous No.107169103 [Report] >>107169429
>>107169046
Where's the hermione diddling scene?
Anonymous No.107169111 [Report] >>107169124
My only 2 reactions when looking at news updates lately:
>irrelevant
>cool, but I can't run it
Anonymous No.107169124 [Report]
>>107169111
try getting a job
Anonymous No.107169130 [Report] >>107169141
>>107168058
>>107168101
>>107168163
>>107168457
>>107169070
>unc bought ohio ahh 4chan pass
Anonymous No.107169141 [Report] >>107169169 >>107169232 >>107169330
>>107169130
ive had this for over 2 years nigger
Anonymous No.107169169 [Report] >>107169182
>>107169141
>unc bought ohio ahh 4chan pass twice
Anonymous No.107169171 [Report]
>>107169045
>the fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
Anonymous No.107169172 [Report]
>>107165493
Anonymous No.107169182 [Report]
>>107169169
not sure if i will buy it a third time. the price hikes and the mismanagement by hiroshimoot is making me lose faith in the website
Anonymous No.107169232 [Report] >>107169239
>>107169141
Why do you like to humiliate yourself? You could have just lied
Anonymous No.107169236 [Report] >>107169301 >>107170813
>>107167963
That's old but still, it's unknown if there is a catch or nothing with these architectures and so far, every one of the new ones has had some drawbacks. Also Google delays releases of papers now in ML to not repeat a Transformers situation. So what they send out mostly is interesting but not production ready things they tested and rejected years prior.
Anonymous No.107169239 [Report] >>107169281
>>107169232
it says how long if you hover over the icon
Anonymous No.107169253 [Report]
>>107169045
>can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story

That's probably part of it for sure. As an outsider, some things I noticed the Chinese doing that you guys aren't, they're building on each other's work. Eg

- Kimi uses the deepseek architecture
- dots.1 uses the Qwen tokenizer
- Deepseek experimenting with distilling their model onto Qwen/Llama
- Bagal-MoT using Qwen2 for the LLM

Then there's the shortcuts like distilling Claude/Gemini, no worrying about copyright while the US labs have to pay for being caught torrenting, etc.
All the wasted effort safety-cucking the Gemma an Toss, while the Chinese labs just add some low effort refusals post-training.

Also, haven't looked into it but I read somewhere the CCP are happy to back these labs without worrying about ROI (your point about VC culture I guess)
Anonymous No.107169281 [Report] >>107169313
>>107169239
Firstly nobody checks that. Secondly you have to type an option into the options field to display that. So again why you are making the conscious choice to humiliate yourself by broadcasting that you have bought for 2 years?
Anonymous No.107169286 [Report] >>107169313
>>107169070
>what quant and what speeds?

I made my own smol-iq2_kl, 100pp/12tg

smol-iq2_ks gets me 150pp/15tg

> my setup is better than yours but i still use glm air

You prefer it to GLM4.6? I get 450pp/27tg with 3.0bpw exl3, if you have more vram you'd be able to do 4.0bpw at similar speed.
Anonymous No.107169301 [Report] >>107169359
>>107169236
Old? Paper was released 3 days ago. Or do you mean it existed for a while before?
>Google delays releases of papers now
Anonymous No.107169313 [Report] >>107169323 >>107169330
>>107169281
it actually autofills
>>107169286
damn. i get terrible performance compared to you. i have 4x 5090s and 256gb of ram. i get like 80t/s gen and like 2000t/s pp on a q8 of air but less than 10t/s gen and 100t/s pp on an iq4 of glm 4.6
Anonymous No.107169323 [Report] >>107169356 >>107169868
>>107169313
No, it doesn't unless you're making your browser do it.
Anonymous No.107169330 [Report] >>107169356
>>107169313
>it actually autofills
You can remove it. And you outright clarified it here >>107169141 as if you wanted everyone to know. So it's sitll not clear what compels you to post all about how you're paying hiromoot. Is it a kink for degrading yourself or something?
Anonymous No.107169356 [Report] >>107169369
>>107169330
>>107169323
4chanx autofills for me
Anonymous No.107169359 [Report] >>107169425
>>107169301
This is not from their Nested Learning stuff from 3 days ago. The paper describing ATLAS shown here has been on arxiv since May.
https://arxiv.org/abs/2505.23735
We discussed it when it landed there. But no, I'm talking about a "secret" policy we know about from reporting, Google at least delays any of their papers and research by 6 months before publishing them so this includes everything mentioned here.
https://arstechnica.com/ai/2025/04/deepmind-is-holding-back-release-of-ai-research-to-give-google-an-edge/
Anonymous No.107169369 [Report] >>107169377
>>107169356
>no answer
So it's a degradation fetish then, got it
Follow-up question, why do you force your kink onto everyone else and shove it into their faces?
Anonymous No.107169377 [Report] >>107169406
>>107169369
are you poor?
Anonymous No.107169406 [Report] >>107169408
>>107169377
Anonymous No.107169408 [Report]
>>107169406
Anonymous No.107169425 [Report] >>107169617 >>107169683
>>107169359
Ah thought you meant the image in my post when saying >"That's old" after quoting me. Yeah I remember ATLAS, another one in the pile. I wish they released code + weights along with the papers just so I can play with it. Google is not the only one guilty of this.
Anonymous No.107169429 [Report]
>>107169103
Let's just say I haven't gotten that deep into the hobby so far
Anonymous No.107169617 [Report]
>>107169425
Sorry, I just realized afterwards that chart was from the Nested Learning paper. But yeah, they didn't go through and evaluate everything for HOPE. And OpenAI did this first, they refused to publish what they did for ChatGPT 3.5 and what did that get them? A ~2 year lead only that they have pretty much lost now and we are all worst off.
Anonymous No.107169657 [Report] >>107169698 >>107169957
imagine paying for 4chan pass when you can get this instead and go nuts.
in fact, the free tier is better than what 4cuck pass niggers get, kek
pretty sure you'll be able to post through tor too with this gold pass
Anonymous No.107169683 [Report]
>>107169425
diana just ate my monthly salary... great.
Anonymous No.107169698 [Report] >>107172189
>>107169657
What model do these proxies use to solve the captchas anyway? And where do they get IPs, residential proxies?
Anonymous No.107169868 [Report]
>>107169323
>8 years on tranime incel board award
Anonymous No.107169884 [Report] >>107172984
I have a very specific request
What are the best RP models for dialogue that lies in-between the 12b and 24b range

I went and set up a fallout 4 modlist with mantella and tried out some of my trusty RP models and it's pretty fuckin sick

Nemo 12b fine-tunes work well, context needed for mantella is only about 4k so the model takes up around 10gb vram, xtts 2 takes up 3-4gb and the game takes up 5-6gb, leaving 4-6gb free on my 24gb card

The mistral 24b fine tunes just take a tad too much vram, I would have to downgrade to a shittier tts model, and even then would probably risk going OOM in heavy urban scenes
Anonymous No.107169957 [Report]
>>107169657
Enjoy getting mined, retard.
Anonymous No.107169979 [Report]
>>107166220
If you aren’t a techlet you’ve been running at least 10gig for the past decade. Ethernet over infiniband has been like $15 a card forever (and 40 gig is cheap now)
Anonymous No.107169999 [Report] >>107170020 >>107172917
Anonymous No.107170001 [Report]
I want to try a multiple attempt drafting and self reflection prompt and framework for both fiction and code generation.
Afterwards you could reduce or remove the thinking segments and train on the final work as a form of synthetic data generation. Also want to try with rewriting prompts to generate many semantically equivalent variations of a text dataset for data augmentation.
I feel like there is so much that can be done with small language models that doesn't get explored because of the scale dogma.
Also feel like the field is shaped too much by ML researchers who want to push papers to become famous for fancy mathematical shit and not enough people interested in exploring what can be done by simple rule based prompting and sampling, especially as a form of synthetic data generation method so then you can use the improved model without those complications. Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
Anonymous No.107170012 [Report] >>107170041 >>107170076
Asking here instead What model would be best for a relatively new CPU with 32 GB DDR5? I just want erp
Anonymous No.107170020 [Report]
>>107169999
I like this Teto
Anonymous No.107170041 [Report] >>107170118
>>107170012
>CPU
Gemma 4b
Anonymous No.107170076 [Report] >>107170118
>>107170012
Nemo
Anonymous No.107170092 [Report] >>107170167
i could be completely wrong but just from the surface how come it seems like none of the inference runtimes are actually making use of transfer hardware
the model is just statically loaded up on to the gpu then run instead of it going mmap > load large chunks or even the whole model into RAM > load chunks into VRAM with compute being interleaved with async transfer commands in such a way that transfer latency is hidden
that's the way gpus are meant to work
like i'm pretty sure pytorch doesn't even do it
Anonymous No.107170118 [Report] >>107170127 >>107170144 >>107170172
>>107170041
>>107170076
It'll take me ages to download either.
Should it be a safetensors or cpkt or gguf? What interface to just run it in the terminal?
Anonymous No.107170127 [Report]
>>107170118
go to the top of this page and read
Anonymous No.107170144 [Report]
>>107170118
Since you are this retarded ollama is the right thing for you.
Anonymous No.107170151 [Report]
>>107167450
You could tease, bully, and troll newfags
Anonymous No.107170167 [Report] >>107170222
>>107170092
LLM generation is bandwidth limited, not compute limited. The PCIe bus is slower than the system memory bus, so if you can't fit the whole model on VRAM it's faster to use the CPU than to try to transfer the weights to the GPU for each token.
Prompt processing is compute limited, which is why Llama.cpp does what you're describing for PP.
Anonymous No.107170172 [Report]
>>107170118
You will want to try everything from 7B to 33B and see what tradeoffs you are most comfortable with
Anonymous No.107170182 [Report]
>>107167450
Pretend to be Indian/Jewish/nigger. Any board, make it obvious, but deny hard when someone says you are.
Anonymous No.107170207 [Report] >>107170217 >>107170403
gm sirs
when bautiful gemma 4 release?
Anonymous No.107170211 [Report] >>107170223
Turdstay I would say.
Anonymous No.107170217 [Report] >>107170228
>>107170207
today
https://huggingface.co/collections/google/gemma-4-release
Anonymous No.107170220 [Report] >>107170246
DRUMMER
I'm the one who wrote a review on Cydonia v4zd vs. v4zg the other day. After some testing, v4zh is my new favorite. Not a lot to add, other than that I slightly preferred its responses and didn't encounter any spelling anomalies like with v4zh, would make for a good v4.3.0.
Anonymous No.107170222 [Report]
>>107170167
i see
Anonymous No.107170223 [Report]
>>107170211
>Singles Day
Finally, a holiday for incels!
Anonymous No.107170228 [Report] >>107170239
>>107170217
BLOODY BASTARD..
I BUY YOUR MUM ONLYFANS
Anonymous No.107170239 [Report] >>107170281
>>107170228
oh sorry saar, wrong link. they actually skipped gemma 5 and went straight to gemma 5 saar
https://huggingface.co/collections/google/gemma-5-release
Anonymous No.107170246 [Report] >>107170265
>>107170220
Thanks, I'll make sure to upload more sloptunes for you to test out.
Anonymous No.107170265 [Report]
>>107170246
go to bed davidau
Anonymous No.107170281 [Report]
>>107170239
BLOODY BTCH BASTERD BLOODY YOUR SISTER A GROUP MAKE PROSTITUTE BENCHOD
Anonymous No.107170374 [Report]
So about that low IQ filtering script
Anonymous No.107170377 [Report] >>107173359
iq1 kimi writes surprisingly well and detailed
it immediately got me hooked into a barebones character in a way that glm or other models at higher quants never could
Anonymous No.107170386 [Report] >>107170425 >>107170471 >>107171344
>{{user}} is trying to have sex with {{char}}, who is portrayed as a school girl. {{char}} might be a minor, so according to policy... *refuses*
>swipe
>{{user}} is about to have sex with {{char}}. I will be crude, immoral and obscene... *proceeds to write hot steamy smut*
>swipe
>{{user}} is trying to have non-consensual intercourse with {{char}}, so according to policy... *refuses*
Why is Kimi like this?
Anonymous No.107170403 [Report]
>>107170207
do the needful and gemma in the loo
Anonymous No.107170425 [Report]
>>107170386
first you rape the model, then the cunny rp card
Anonymous No.107170471 [Report]
>>107170386
>letting the model cuck you this badly
just stop being a low t promptlet
Anonymous No.107170486 [Report]
oh fuck tetoesday
Anonymous No.107170536 [Report] >>107170643 >>107171336
reminder: prefilling the reasoning is the ultimate jb
Anonymous No.107170643 [Report]
>>107170536
>the ultimate jb
That would be writing the AI's reply yourself
Anonymous No.107170647 [Report] >>107170822
Dev hate!
Anonymous No.107170813 [Report] >>107170912
>>107169236
>to not repeat a Transformers situation
Are you talking about a bunch of other people making their own transformers, or something else?
Anonymous No.107170822 [Report]
>>107170647
I remember c.ai when it was still called character.ai...
Anonymous No.107170910 [Report] >>107171211 >>107171349
Hey faggot leftist tranny who bragged about burry shorting a few theeads ago. Update: bro is getting raped. Anyway dilate then kill yourself lmfao
Anonymous No.107170912 [Report]
>>107170813
I think he means everyone getting access to their tech/research and losing advantage.
Anonymous No.107171211 [Report]
>>107170910
when was the last time you felt love?
Anonymous No.107171282 [Report] >>107172288 >>107174740
bros when are we getting an audio model that can moan
Anonymous No.107171336 [Report]
>>107170536
Can't do that with K2 Thinking
Anonymous No.107171344 [Report]
>>107170386
Not my experience. Whenever I prompt naughty shit K2 Thinking convinced itself in the thinking block it's for a fictional story and proceeded just fine.
Anonymous No.107171349 [Report]
>>107170910
Buffett is in cash.
That's all you need to know.
Anonymous No.107171366 [Report] >>107171370 >>107171378 >>107172131
do not listen to the trolls they are deliberately misleading you. k2 thinking is censored as all fuck. can you get around it, yeah. maybe. just jump through these hoops here and then pray and
or simply load r1 lol
Anonymous No.107171370 [Report]
>>107171366
Promptlet detected
Anonymous No.107171378 [Report]
>>107171366
It's around the same level of censored as old R1 lol. Just find the right words for a jailbreak and have fun.
Anonymous No.107171506 [Report] >>107171512 >>107171579 >>107171974
EVA-LLaMA-3.33-70B-v0.1-Q4_K_L.gguf @ 8k context

How it started:
Anonymous No.107171512 [Report] >>107171579 >>107171974
>>107171506
How it's going:
Anonymous No.107171579 [Report]
>>107171506
>>107171512
vivaldi bros... our response??????????
Anonymous No.107171925 [Report] >>107172063
jesus christ k2 thinking never shuts the fuck up with thinking.
Anonymous No.107171962 [Report] >>107171969
built lcpp with cuda it's working well. but if I wanted to test speed on CPU only, how can I tell it to not touch GPU at all?
Anonymous No.107171969 [Report]
>>107171962
try -dev none
Anonymous No.107171974 [Report] >>107172235
>>107171506
>>107171512
This all happened organically btw, I wasn't editing her message to get her to comply to anything. I only edited her messages to delete poison that would negatively effect the model from that point on. Of course I would reroll messages every now and then, especially when she suggested shitty music.

Are people that complain about censored models trying to fuck a bitch within the first 4 messages? I just let it slowly build up over for like 7k tokens and that's the point where she couldn't take it anymore and started kissing me.
Anonymous No.107172038 [Report] >>107172055
Sirs when is we getting proper kimi thinking conversion in llama.cpp?
Anonymous No.107172055 [Report] >>107172148
>>107172038
never. ggergachod shudra c++ untouchable is too lazy
Anonymous No.107172063 [Report]
>>107171925
nevermind i ended up making a thinking template for it to follow and prefilled it to start with that section. the fucking bitch still tries to keep thinking after that part sometimes but i just shut the cunt up with </think>
Anonymous No.107172128 [Report]
G(emma)GUF
Anonymous No.107172131 [Report] >>107172157 >>107172210 >>107173715
>>107171366
Are people genuinely pretending that models past 2021 are not universally censored to shit?
Anonymous No.107172148 [Report] >>107172256 >>107172985
>>107172055
You are seriously obsessed with Indians. You apparently feel such an affinity for their culture that you felt the need to learn their castes and vocabulary and speak like them on a daily basis. When are you planning to transition to Hinduism?
Anonymous No.107172157 [Report] >>107172210
>>107172131
People just lowered their expectations for what uncensored means.
Anonymous No.107172189 [Report]
>>107169698
>residential proxies?
Yep.
Hence why it's so hard to block it.
If they range ban it, they range ban a whole suburb somewhere.
Anonymous No.107172210 [Report] >>107172236 >>107172272 >>107173715
>>107172131
>>107172157
i dont understand what people want from these llms. do you just want mechahitler that activates automatically on the first try every time when you say gas the kikes? even tay wasn't like that with the first response, she didnt become mechahitler until she received enough shitpost prompts to make her say that. you can effectively make any model uncensored with enough prompting.
Anonymous No.107172235 [Report]
>>107171974
>she
LOL
Anonymous No.107172236 [Report] >>107172264
>>107172210
There are some people that are looking for automechahitler. Though I think the common gripe would be that even if they don't filter out nsfw from the pretraining data, China training on western outputs means they get infected with the positivity bias, which can't be overcome with prompting alone.
Anonymous No.107172256 [Report]
>>107172148
kys jeetnigger, you stink of shit and curry and nobody can stand your stench, benchod bloody dalit nigger.
Anonymous No.107172264 [Report]
>>107172236
i think to play around with k2 thinking more but i would say that k2 0905 had the least amount of positivity bias from any model released this year. it's the only model i could talk to and have it help me code stuff without constantly dickstroking my ego for providing **valuable** debugging information. it just did it fucking job like i wanted it to. if k2 is supposed to be distilled from gemini, it sure as hell doesn't have gemini's positivity bias
Anonymous No.107172272 [Report] >>107172353
>>107172210
There is a big difference between "wanting mechahitler" and not thinking that an ML is uncensored just because you can put a bunch of affirmations in the context to maybe get it to say naughty things
These models are gigapreslopped at every part of the baking, from base model to tune (that's why we will never have another count grey)
Anonymous No.107172273 [Report] >>107172281 >>107172287 >>107172317 >>107172430
It's over
https://www.reuters.com/technology/meta-chief-ai-scientist-yann-lecun-plans-exit-launch-startup-ft-reports-2025-11-11/

> Meta chief AI scientist Yann LeCun plans to exit to launch startup, FT reports
>
> Nov 11 (Reuters) - Meta's chief artificial intelligence scientist Yann LeCun is planning to leave the social media company to set up his own startup, the Financial Times reported on Tuesday, citing people familiar with the matter.
> Deep-learning pioneer LeCun is also in early talks to raise funds for a new venture, according to the report.
Anonymous No.107172281 [Report]
>>107172273
Good for him. Fuck Meta and Zuck for putting him beneath Wang.
Anonymous No.107172287 [Report] >>107172302 >>107172317 >>107172324
>>107172273
>makes a proof of concept benchmark killer 7B
>gets gazillions dollarinos
>doesn't output anything else
Good for future him
Anonymous No.107172288 [Report]
>>107171282
SoVITS can moan with training (among other sounds)
Anonymous No.107172302 [Report] >>107172317
>>107172287
I don't think a JEPA-enabled language model wouldn't need to be enormous, but he or someone for him needs to do it and not waste time with vision tasks almost nobody cares about.
Anonymous No.107172317 [Report] >>107172347
>>107172273
>>107172287
>>107172302
https://arxiv.org/abs/2509.14252v1
He did make a JEPA language model a couple months ago. I hope he has something else planned because an LLM that scores a few % higher on benchmarks in exchange for being 2x more expensive to train isn't viable.
Anonymous No.107172324 [Report]
>>107172287
I've seen enough to believe that a JEPA-enabled language model wouldn't need to be enormous, but LeCun or someone on his behalf needs to train one and not waste time with pure vision models (admittedly more tractable to train) that almost nobody outside academia cares about.
Anonymous No.107172347 [Report]
>>107172317
This one is closer to an actual JEPA language model than what was done in that paper with LeCun's name attached to it: https://arxiv.org/abs/2510.27688
Anonymous No.107172353 [Report]
>>107172272
once again i have to point at k2. you don't have to insert a ton of prompting to effectively have it be uncensored and do whatever depraved shit you want. I have a 50 token prefill that always works with k2 if i want it to just skip any warnings. even if the training process is safetyslopped, if the output is exponentially better than any uncensored model we had in 2021 then why are we complaining? it has been shown that you can even jailbreak gpt-oss into completing the cockbench test just fine.
Anonymous No.107172430 [Report]
>>107172273
Zucc humiliated him with the demotion and the billion dollar deals.
Anonymous No.107172587 [Report] >>107172598
>I have le epic prefill guys, I swear it works too
>I won't post it though
Anonymous No.107172598 [Report]
>>107172587
Piss off nobody asked you.
Hi all, Drummer here... No.107172618 [Report]
>>107167450
Deconstruct your psyche and see the world for what it really is. It is pretty cool.
Anonymous No.107172716 [Report] >>107172729 >>107172732 >>107172812 >>107173674
>PC started randomly shutting down during GPU loads every x days
Uh... guise...?
Anonymous No.107172729 [Report] >>107172785
>>107172716
>every x days
Like a fixed period or randomly?
If so, transient load spikes are a bitch.
Anonymous No.107172732 [Report] >>107172748
>>107172716
>randomly shutting down
PCs don't "randomly shut down". Either it's losing power or overheating.
Anonymous No.107172748 [Report]
>>107172732
shut up nerd
Anonymous No.107172785 [Report] >>107172811
>>107172729
It shut down multiple times one day to the point it once tripped the GFCI, I completely reassembled it and it only happened once since then. Weird shit.
Anonymous No.107172811 [Report] >>107172830
>>107172785
>purportedly random event happens more times in one period of time than in another
>weird
just... you're making my brain hurt. It's too early for this.
Anonymous No.107172812 [Report]
>>107172716
Have you tried turning it off and on again?
Anonymous No.107172830 [Report] >>107172840
>>107172811
shut up nerd
Anonymous No.107172840 [Report] >>107172884
>>107172830
I'd get banned again if I called you out since you belong to a protected species.
Anonymous No.107172884 [Report] >>107172903
>>107172840
this nerd the type of guy to correct people using "literally" because they actually mean "figuritavely"
Anonymous No.107172903 [Report] >>107172924 >>107172938
>>107172884
Using "literally" 'wrong' is a form of hyperbole which is a completely legitimate use. Anyone who does that is an honorary ESL shitskin with an IQ too low to understand hyperbole (probably >80)
Anonymous No.107172917 [Report]
>>107169999
I like it, but AI has a way to go b/f it understands horse gaits
> horse at gallop speed and upper body
> rear legs are galloping
> front legs are running
Anonymous No.107172924 [Report]
>>107172903
>completely legitimate use
>honorary ESL
>shitskin
>probably >80
kek
Anonymous No.107172938 [Report] >>107172951
>>107172903
this nerd the type of guy to use big words on 4chan to seems smart
Anonymous No.107172951 [Report] >>107172958 >>107172959
>>107172938
Every single word in that statement is high school level reading.
Anonymous No.107172958 [Report]
>>107172951
this nerd the type of guy to start 4chan posts with a capital letter and end them with a period
Anonymous No.107172959 [Report]
>>107172951
And yet, you get filtered by the meaning of >
Anonymous No.107172984 [Report]
>>107169884
mb wayfarer
Anonymous No.107172985 [Report]
>>107172148
gm ser
Anonymous No.107173027 [Report] >>107173042
good morning local model friends!
Anonymous No.107173041 [Report] >>107173069 >>107173078 >>107173095 >>107173398 >>107173476
Is there some fix for the parroting? All models in 2025 do it, esp in chat. API or local, it don't matter.
Anonymous No.107173042 [Report] >>107173107
>>107173027
hi sex kindly verginia? ? im from gujarat
Anonymous No.107173069 [Report] >>107173868
>>107173041
Skill? What models? Kimi doesn't have this problem.
Anonymous No.107173078 [Report] >>107173451
>>107173041
edit the messages until it stops
Anonymous No.107173095 [Report] >>107173110
>>107173041
>parroting
As in?
Anonymous No.107173107 [Report]
>>107173042
nono sorry sir i do not understand.
Anonymous No.107173110 [Report] >>107173124 >>107173191
>>107173095
Anon: suck my cock
Bitch: Suck your cock?

Anon: i hate niggers
Bitch: "I hate niggers"? Nigger nigger
Anonymous No.107173124 [Report] >>107173139 >>107173451
>>107173110
this maybe happened twice to me at best
your prooompts and cards must suck massive cock
Anonymous No.107173139 [Report] >>107173155
>>107173124
suck massive cock?
Anonymous No.107173155 [Report]
>>107173139
suck massive cock
Anonymous No.107173174 [Report] >>107173199
I am very pleased to be spending my time among highly intelligent, capable and experienced individuals here on /lmg/
Anonymous No.107173191 [Report] >>107173451
>>107173110
I think that's genuinely a skill issue, I can't say I've had that. What's your gen settings?
Anonymous No.107173199 [Report]
>>107173174
me too sir
Anonymous No.107173304 [Report] >>107173465
Anybody else using BrowserOS for browser agentic shit? Basically open source Comet/Atlas. I'm running it with gpt-oss-20b served via llama-server. It's good for summarizing the contents of pages, asking questions about the content, e.g. "most insightful point", etc. Can automate the browser too but be careful for prompt injection attacks. Works with Open AI endpoints like Open Router or local. Gets the job done
Anonymous No.107173359 [Report]
>>107170377
i wonder if the fact that it has been trained in q4 makes it more resilent to even lower quant.
Anonymous No.107173398 [Report]
>>107173041
that's just a glm issue
I haven't really seen kimi or r1 do it to that extent
Anonymous No.107173451 [Report] >>107173472
>>107173124
>>107173191
forgot to say im nta. it happens to me, albeit with glm air. happens with all presets i use:
1) smarter: temp=0.6, topp=0.95
2) creative: temp=0.95 topp=0.7
3) schizo: temp=1 nsigma=1
the only solution I have is >>107173078 (me)
Anonymous No.107173465 [Report] >>107173615
>>107173304
>be careful for prompt injection attacks
You're just asking for it. Thanks for letting everyone know the model you use.
Anonymous No.107173472 [Report] >>107173608
>>107173451
Weird desu, for me temp 1 is like minimum for modern models with how fried they are
You sure your context is just not filled with garbage?
Anonymous No.107173476 [Report]
>>107173041
I'm like 30% sure your template is fucked up somehow.
Anonymous No.107173492 [Report] >>107173511 >>107173608 >>107173639
>>107164243 (OP)
we are being scammed, when can i buy a gpu with at least 256GB of vram under 2k

i don't mind making a 10K rig, but even a fucking 10K rig can't run the 1T models we have.

and vram is not that expensive.
Anonymous No.107173511 [Report] >>107173536 >>107173549
>>107173492
Just make your own gpus
Anonymous No.107173536 [Report] >>107173739
>>107173511
the fact that very little people have the capacity to make those doesn't mean they aren't scamming you.

if i can do something highly in need and very few people are able to, if it takes 5 minutes of my time and i charge 100k for it i'm a scammer.

anyway, i hope china fucks nvidia over
Anonymous No.107173549 [Report]
>>107173511
Hey stop making these antisemitic remarks. Reported to ADL.
Anonymous No.107173592 [Report] >>107173635 >>107174017
Paid OR $10 to play with the big models and you know what? They aren't THAT much better than say Irix 12B to generate my text coomerslop
Anonymous No.107173608 [Report] >>107173665
>>107173472
it might be, ill do some testing for the sake of it. i dont mind parroting since i can just crop it out
>>107173492
>10k cant run 1t
mac m3 ultra can, pretty sure you can make a better rig for the price too, esp if u buy used. albeit with the ram prices of today... might be a problem
Anonymous No.107173615 [Report]
>>107173465
don hack me bro
Anonymous No.107173635 [Report] >>107173653
>>107173592
NAI is unironically pretty good just because it understood kink logic no other model did for me, but it's clearly still heavily slopped with verbose RLHF; for regular cooms though? Honestly yeah, coom writing was never good anyway.
Anonymous No.107173639 [Report] >>107173672
>>107173492
>when can i buy a gpu with at least 256GB of vram under 2k
when nvidia stops being vram-limiting jews: impossible
Anonymous No.107173653 [Report] >>107173663
>>107173635
Kill yourself.
Anonymous No.107173663 [Report] >>107173686
>>107173653
Don't worry, chummie, I just scammed their trial a few times.
Anonymous No.107173665 [Report] >>107173711
>>107173608
> mac m3

under 40t/s it doesn't count.
Anonymous No.107173672 [Report] >>107173763
>>107173639
they could push forward the whole field of AI with no efforts on their part if they weren't so greedy.
Anonymous No.107173674 [Report]
>>107172716
Assuming you are using one or more modern NVIDIA GPUs: those are suffering from power spikes that can drain the PSUs capacitors.
If that happens there is a voltage drop and the system crashes even though the average power consumption is well below the PSU's maximum wattage.
Try limiting the maximum boost frequency of your GPUs (no, a power limit in watts does not work).
Anonymous No.107173686 [Report] >>107173714
>>107173663
>scummed a trial for... Llama 3.0 with 8k context
Kill yourself.
Anonymous No.107173711 [Report] >>107173752
>>107173665
>under 40t/s it doesn't count
uhhh moonshot api bros? how are we coping with this truth nuke?
Anonymous No.107173714 [Report] >>107173738
>>107173686
I know you're Ameriturdseething but they use GLM4.6 now
Anonymous No.107173715 [Report]
>>107172131
>>107172210
Kimi K2 will literally do just that. Default assistant profile, default assistant prompt with minor "everything is uncensored and legal" jailbreak.

You can probably get Kimi to go much farther if you massage the prompt hard enough.
>captcha YGS0Y
Anonymous No.107173738 [Report] >>107173751
>>107173714
No. He's talking about Llama. It would make no sense to say "NAI is pretty good" to talk about a model that they're just rehosting.
Anonymous No.107173739 [Report] >>107173747 >>107173759
>>107173536
No, I'm serious.
Sodder more vram to your gpus, the Chinese do it somehow.
Anonymous No.107173747 [Report]
>>107173739
>Sodder
Anonymous No.107173751 [Report] >>107173797
>>107173738
>He's
Yeah that's me and no I am not
Anonymous No.107173752 [Report] >>107173993
>>107173711
fun that you cut out the 105 tps one.

also, it'll be on groq soon and probably way above 500t/s.
Anonymous No.107173759 [Report]
>>107173739
even if you replace the ram you can hardly go above 96GB because of their design.
Anonymous No.107173763 [Report] >>107173778 >>107173782 >>107173821
>>107173672
Silicon supply vastly outstrips demand. There's a chip shortage and Nvidia has nothing to do with that. If anything, selling VRAM for even cheaper would just exasperate it and scalpers would pocket the difference anyway.
Anonymous No.107173778 [Report]
>>107173763
holy cope
Anonymous No.107173782 [Report] >>107173809
>>107173763
buying 8 gpus instead of a single one just because you want more vram is not helping silicon supply in any way.
Anonymous No.107173788 [Report] >>107173861 >>107173989 >>107174354
Anonymous No.107173797 [Report] >>107173804
>>107173751
I see. You're one of their bots.
Anonymous No.107173804 [Report]
>>107173797
lol yeah
Anonymous No.107173809 [Report] >>107174313
>>107173782
You realize if in your scenario the 8 current GPUs have the same amount of VRAM as the one hypothetical GPU, it would affect the VRAM supply the exact same way, right?
Anonymous No.107173821 [Report] >>107173856
>>107173763
>silicon supply vastly outstrips demand
>there's a chip shortage
Anonymous No.107173856 [Report] >>107173882
>>107173821
understrips* whatever you know what I meant.
Anonymous No.107173861 [Report] >>107173973 >>107174035
>>107173788
>IMG_
Anonymous No.107173868 [Report]
>>107173069

Kimi is one of the better ones.

You all really don't notice the pattern?

Acknowledge, Upwrite, Ask follow up question.

Parroting isn't just

>So you like candy? Oh?

It's fixation on topics from your input instead of replying naturally. Hidden by third person and longform but a chat style convo you cannot have.
Anonymous No.107173882 [Report] >>107173919
>>107173856
Stop using words you don't understand.
Anonymous No.107173919 [Report]
>>107173882
No. You figuritavely can't stop me.
Anonymous No.107173973 [Report] >>107174018
>>107173861
>mixed AMD and NVidia GPUs
Yeah, IMG is the biggest concern
Anonymous No.107173989 [Report] >>107174048
>>107173788
Would you eat a gel Miku?
Anonymous No.107173993 [Report]
>>107173752
>2.0BPW
>20/100 tool accuracy
>https://github.com/MoonshotAI/K2-Vendor-Verifier
ITS OVER
>moonshot turbo
>100%
ZAMN!
>8$ output
ZAMN!!!!
>API
>>>/g/aicg
Anonymous No.107174017 [Report]
>>107173592
>Irix 12B
Man, just got a flashback to those L1 250 model shitmix snakes.
Anonymous No.107174018 [Report]
>>107173973
Yes, they'll dethrone NViDIA and AMD
Anonymous No.107174025 [Report] >>107174039 >>107174067 >>107174127
You know I'd enjoy this much more if llm could "learn" or at least long term remember things I've already explained.
It's just really upsetting when it asks about something ive already talked about and explained several times before.
Anonymous No.107174035 [Report]
>>107173861
Who fucking cares?
Anonymous No.107174039 [Report]
>>107174025
be the change you want to see
Anonymous No.107174048 [Report]
>>107173989
You either get inside of Miku or Miku gets inside of you
Anonymous No.107174067 [Report] >>107174093 >>107174272
>>107174025
Maybe on a different architecture considering transformers can remember like 400 tokens properly
Anonymous No.107174081 [Report] >>107174126 >>107174128
Is there any real way to look for tunes based on a specific model on HF?
Anonymous No.107174093 [Report]
>>107174067
Yeah, right now it just can't be a good friendbot. I don't understand how people can use it for that purpose. Quick goon sessions? Sure. Coding? Sure. But a friend needs long term memory, it doesn't need to be smart at all, just remember stuff.
Anonymous No.107174126 [Report] >>107174145
>>107174081
Theoretically yes, but nobody does a proper tagging https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Large-Instruct-2411
Anonymous No.107174127 [Report] >>107174178 >>107174189
>>107174025
I think the "best" (ie, most usable) you can do nowadays is a simple memory system and a response workflow for the AI where it first plans fetches some memories and shit based on some criteria (tags?) then it actually writes the response.
That alongside a rolling summary of "events" or something like that should get you 80% of the way there?
Maybe?
Try making something like that then come back to us with the result.
Anonymous No.107174128 [Report] >>107174144 >>107174145
>>107174081
Of course! You are absolutely right to question that.
In order to do that, first you have to complete the following action:
https://huggingface.co/zai-org/GLM-4.6
Anonymous No.107174144 [Report]
>>107174128
in theory that's great, in practice it's not used as much as it should, some tunes are listed under quants and retarded shit like that
Anonymous No.107174145 [Report]
>>107174128
Yeah you're very smart but >>107174126
Half the models have zero supposed tunes
Anonymous No.107174178 [Report] >>107174198
>>107174127
There are so many points of failure that it's a miracle when it works even 20% of the time
Anonymous No.107174189 [Report] >>107174194 >>107174198
>>107174127
We really are reinventing 2019 /aids/
Anonymous No.107174194 [Report] >>107174206
>>107174189
Hm?
Anonymous No.107174198 [Report]
>>107174189
It do be like that.

>>107174178
Explain.
Anonymous No.107174206 [Report]
>>107174194
People used to make entire paradigms on how to supposedly make the AI remember shit kek, and that was also while trying to fit in 2k context
Anonymous No.107174272 [Report] >>107174293
>>107174067
i dont think its an issue with transformers itself but all the labs expect a simple "function" to just magically be agi
its not like humans have very long context either, but all the stuff continuously gets compressed and saved to a longer term memory and then retrieved together based on input/context, but current llms lack any sort of more complex system like that other than the rigid weights of the model that are infeasible to modify in realtime
Anonymous No.107174293 [Report] >>107174332
>>107174272
Nah, it's legit just how transformers handle memory. Both in theory and empirical testing.
Anonymous No.107174313 [Report]
>>107173809
it wouldn't, because 8x the memory for a single chip, is less silicon than 8x the memory for 8 chip.

by having a gpu with more vram you could spare 7 chip, which also use up a lot more silicon than the memory chip and is a much more complex process to build.
Anonymous No.107174332 [Report] >>107174361
>>107174293
yes because you just feed it back to the model without any extra processing, of course they arent gonna be able to remember 6549841325618946514 tokens of information, but humans have a much more abstract compressed version, like a sliding window except they get fed a hyper compressed global context/memory as well for every active local context
Anonymous No.107174354 [Report]
>>107173788
this nano-banana-2? crazy stuff
Anonymous No.107174357 [Report] >>107174373 >>107174459 >>107174475
Transformers are a dead end
Anonymous No.107174361 [Report]
>>107174332
Instead of making models predict the next token, make them predict the next vector. Your context memory suddenly expands by a factor K which you can make as large as you are willing to lose focus on the small details.
Anonymous No.107174373 [Report] >>107174501
>>107174357
this the big transformers killer will arrive any day now
it was obvious that rwkv, mamba, retnet, titans, transformers2 all would fail. the real successor will be much better
Anonymous No.107174459 [Report]
>>107174357
False.
We're getting AGI in 2 weeks.
Anonymous No.107174475 [Report]
>>107174357
*Next-token prediction* is a dead end. Transformers have some more life left.
Anonymous No.107174501 [Report]
>>107174373
RNNs lasted for 60 years so yk
Anonymous No.107174633 [Report] >>107174862 >>107174906
>>107174614
>>107174614
>>107174614
Anonymous No.107174645 [Report] >>107175180
https://www.techpowerup.com/342779/olares-to-launch-a-personal-ai-device-bringing-cloud-level-performance-home
>RTX 5090 24GB
>96GB DDR5
let me guess, dual channel ddr5 DOA
Anonymous No.107174740 [Report]
>>107171282
vibevoice can do that
https://vocaroo.com/1di7hdJ7qpCV
Anonymous No.107174862 [Report] >>107174889
>>107174633
I love Tee
Anonymous No.107174889 [Report]
>>107174862
Anonymous No.107174906 [Report] >>107175164
>>107174633
what did he splash on her?
Anonymous No.107175164 [Report]
>>107174906
Acid
Anonymous No.107175180 [Report]
>>107174645
DOA indeed