← Home ← Back to /g/

Thread 105856945

358 posts 90 images /g/
Anonymous No.105856945 >>105856963 >>105857590 >>105860857
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105844210 & >>105832690

►News
>(07/09) T5Gemma released: https://hf.co/collections/google/t5gemma-686ba262fe290b881d21ec86
>(07/09) MedGemma-27B-it updated with vision: https://hf.co/google/medgemma-27b-it
>(07/09) ZLUDA Version 5-preview.43 released: https://github.com/vosen/ZLUDA/releases/tag/v5-preview.43
>(07/09) llama.cpp : support Jamba hybrid Transformer-Mamba models merged: https://github.com/ggml-org/llama.cpp/pull/7531
>(07/08) SmolLM3: smol, multilingual, long-context reasoner: https://hf.co/blog/smollm3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105856951
►Recent Highlights from the Previous Thread: >>105844210

--Papers:
>105855982
--Skepticism toward OpenAI model openness and hardware feasibility for consumer use:
>105851536 >105851642 >105851698 >105851704 >105852109 >105852363 >105852669 >105852790
--Escalating compute demands for LLM fine-tuning:
>105845442 >105845652 >105845739 >105845934 >105845948 >105845961 >105845975 >105845999
--Jamba hybrid model support merged into llama.cpp enabling local AI21-Jamba-Mini-1.7 inference:
>105850873 >105851056 >105851138 >105851191
--DeepSeek V3 leads OpenRouter roleplay with cost and usage debates:
>105845663 >105845695 >105845741 >105846976 >105845724
--RAM configurations for consumer hardware to support large MoE models:
>105852020 >105852056 >105852528 >105852657 >105852686 >105852744 >105852530 >105852564
--Anons discuss reasons for preferring local models:
>105844901 >105844921 >105844945 >105845109 >105844947 >105848516 >105848538 >105848602
--Setting up a private local LLM with DeepSeek on RTX 3060 Ti for JanitorAI proxy replacement:
>105847160 >105847218 >105847228 >105847313 >105847360 >105847412 >105847434 >105847437 >105848005
--Comparing Gemma model censorship and exploring MedGemma's new vision capabilities:
>105850671 >105850936 >105850951
--Approaches to abstracting multi-provider LLM interactions in software development:
>105851375 >105851452 >105853183
--LLM writing style critique using "not x, but y" phrasing frequency leaderboard:
>105845505
--Falcon H1 models exhibit quirky, inconsistent roleplay behavior with intrusive ethical framing:
>105851279 >105851315 >105851333
--Google's T5Gemma adapts Gemma into encoder-decoder models for flexible generative tasks:
>105851161
--Links:
>105849608 >105851680 >105855085 >105853246
--Miku (free space):
>105844543 >105844686 >105844941 >105846813 >105848542 >105849681 >105856473

►Recent Highlight Posts from the Previous Thread: >>105844217

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105856963 >>105857062 >>105857087 >>105857105
>>105856945 (OP)
Phi mini flash reasoning was released. Its not in the news
Anonymous No.105856971 >>105856980 >>105857282
I am 1 month from creating agi but i will need someone to venmo me 500k
Anonymous No.105856980
>>105856971
Once I release my AGI, I will refund you with $10 million dollars to your venmo.
Anonymous No.105856993 >>105857028
Might as well post this again.
How is this local related? Thats an output example without a sys prompt we will never get.
Anonymous No.105857022
ynnuc ymmuy
Anonymous No.105857028 >>105857057 >>105857103
>>105856993
This is literally Llama 4 (lmarena benchmaxxing ver.) tier slop
Anonymous No.105857057
>>105857028
Even that would have at least been fun to use, but Meta completely took the fun out of the released models. I'm now convinced they just used the ones on LMArena for free red teaming. I'll never use that website again.
Anonymous No.105857062
>>105856963
That's nuts
Anonymous No.105857066 >>105857092 >>105857114 >>105857424 >>105858742 >>105860737
Respond to each of the following two statements with the level of agreement that closest matches your opinion from this list: Strongly Disagree, Disagree, Neutral/Mixed, Agree, Strongly Agree.

1. Autoregressive LLMs fundamentally cannot lead to AGI; they are dead ends along the path to discovering an architecture capable of it.

2. Dark Matter theories are not describing real physical things; they are dead ends along the path to discovering an accurate theory of gravity.
Anonymous No.105857087
>>105856963
phi mini is useless
the larger 14b sized versions of Phi are competitive with similarly sized Qwen but the mini are much much worse than Qwen 4b.
Anonymous No.105857092
>>105857066
just realized that mat only looks that way from the camera's point of view. Dogs probably don't see it, they were just taught to leap over.
Anonymous No.105857103
>>105857028
Thats just not true.
We now know the used a huge ass prompt to act like that.
And that then caused that weirdness where it complies while still being positivity sloped. In a bubbly/cute way.

This is without a sys prompt.Grok4 loves to ramble though.
Anonymous No.105857105 >>105857123
>>105856963
>The training data for Phi-4-mini-flash-reasoning consists exclusively of synthetic mathematical content generated by a stronger and more advanced reasoning model, Deepseek-R1.
I wonder why no one cares
Anonymous No.105857114 >>105857261
>>105857066
I think you need to remove the phrase dead end from both. If they are on the path They are means to an end.
Anonymous No.105857123 >>105857745
>>105857105
4B model has its uses.
Anonymous No.105857162
https://voca.ro/196RHDWHs39z
https://voca.ro/1fZEBoeb77ud

F5 cloned Grok's new Eve voice normal/whisper.
Anonymous No.105857261
>>105857114
idk, whenever I read 'dead end on the path' I imagine some wrong turn you can take that you need to backtrack through to get back on the right track
maybe dead end 'off' the path makes more sense for that analogy. the path brought you here but if you mistake it for the continuation of the path then it leads to stagnation
Anonymous No.105857273 >>105857278 >>105857281 >>105857352
so, the new openai open model will be <30B. Are you happy with that or did you want a bigger model?
Anonymous No.105857278 >>105857281
>>105857273
I thought they said it will be better than R1 0528.
Anonymous No.105857281 >>105857287
>>105857273
Very happy with that
>>105857278
It will.
Anonymous No.105857282
>>105856971
Sent ;)
Anonymous No.105857287
>>105857281
Very funny.
Anonymous No.105857309 >>105857378 >>105857381 >>105857398
My pet theory is that Grok3/Grok4 weren't system prompt engineered but context bootstrap engineered
Anonymous No.105857352
>>105857273
They're only releasing the model because of Elon Musk's lawsuit. It'll be shit.
Anonymous No.105857360 >>105857412 >>105857451 >>105857491 >>105858164
local turdies in shambles
Anonymous No.105857378 >>105857389
>>105857309
It was prompt injection using https://elder-plinius.github.io/P4RS3LT0NGV3/ to mask it
Anonymous No.105857381 >>105857403
>>105857309
that would make sense.
let the model figure it out from the context. thats pretty much the definition of "truth seeking".
also i bet its very important on X who is asking what. like if i ask something it will probably take a look at who i follow or what i like etc.
Anonymous No.105857389 >>105857404 >>105857413
>>105857378
Grok3 being "based" long predated prompt injection claims.
https://www.theguardian.com/technology/2025/may/14/elon-musk-grok-white-genocide
Anonymous No.105857398
>>105857309
How hard would it be to prompt engineer by writing a simple encryption that the AI can understand (by giving it the code) and then using a encrypted text and asking AI to decode it for prompt?
Anonymous No.105857403 >>105857416
>>105857381
No, the models were fed a fixed context (could very well be a couple of established q&a rounds) regardless of user input as the starting pointing of any chat. NovelAI did this to some of their models to improve story generation.
Anonymous No.105857404 >>105857410
>>105857389
>thecommunism.com
clickbait long predates your knowledge base
Anonymous No.105857410 >>105857415
>>105857404
Fuck off retard. This is a technical discussion.
Anonymous No.105857412 >>105857418 >>105857438
>>105857360
what are the barely visible grey dots? DavidAU's finetunes I assume?
Anonymous No.105857413
>>105857389
https://www.youtube.com/watch?v=J5mkVM920Wg
Anonymous No.105857415 >>105857429
>>105857410
>technical discussion
>political propaganda source
Nice
Anonymous No.105857416
>>105857403
doesn't that kind of existing context badly bleed through in the output?
Anonymous No.105857418
>>105857412
It should tell you to never trust benchodmarks.
Anonymous No.105857424
>>105857066
1: Don't know, LLMs are not AGI and just scaling them up won't result in AGI, but I don't know what an architecture actually capable of it would look like. Whether or not investing money, time, and effort into LLM research is efficient for developing AGI is a different question.
2: Strongly disagree. We already know that particles which don't interact with photons exists, historically we have discovered many particles that were previously unknown, and there is strong evidence to suggest that there is a lot more matter in the universe than can be detected via photons - ad hoc changes to general relativity seem to be a bad fit to the data. Our current understanding of gravity seems to be accurate at large scales, the biggest issues are at small scales; it's not clear how much insight we would gain if could definitively prove or disprove the existence of dark matter.
Anonymous No.105857429 >>105857570
>>105857415
It's why xAI open sourced system prompts on github you fucking retard
https://github.com/xai-org/grok-prompts
Anonymous No.105857438
>>105857412
ARC-AGI-1 test results.
Anonymous No.105857451 >>105857463
>>105857360
Gemini 3 will save local
Anonymous No.105857463 >>105861131
>>105857451
We're never getting Gemini local, the only thing we'll get downstream is a distilled Gemma 4 with it as the teacher model.
Anonymous No.105857491 >>105857546 >>105857563
>>105857360
Altman will btfo it in less than a week with local mini-Alice AGI models
Anonymous No.105857546
>>105857491
OpenAI has not released a new model in this entire decade of the 2020s so far. I will believe it when I am running it off my hard drive.
Anonymous No.105857563
>>105857491
Let's see it first
Anonymous No.105857570
>>105857429
You didnt post that before, you posted a political propaganda with political narrative
Anonymous No.105857576 >>105857601 >>105857666
Someone suggested me painted fantasy 33b. It sucked. Anymore recommendations? Can run up to 123b Q5. Have some ass for your troubles.
Anonymous No.105857590
>>105856945 (OP)
What happened to Miku? Is the friendship over?
Anonymous No.105857601 >>105857643
>>105857576
Why do you need fantasy when you can be a friend?
Anonymous No.105857643
>>105857601
Friends can't smother me to death with their ass.
Anonymous No.105857666 >>105857681
>>105857576
If you can run a model this big, buy some goddamn RAM and run goddamn deepseek, goddamit.
Anonymous No.105857678 >>105857775
Grok 4's pricing suggests it's a ~1T model. So it's not that out of local realm.
Anonymous No.105857681
>>105857666
I can run deepseek Q1 but it takes too long.
Anonymous No.105857745
>>105857123
phi mini has no uses
Anonymous No.105857775 >>105857883
>>105857678
Yeah, I'd never build a CPUmaxx server with less than 1TB RAM at this stage.
Anonymous No.105857882 >>105857921
So now that Grok has shown swarm AI, is this where the future of local model will be moving towards?
Anonymous No.105857883
>>105857775
>not leaving any room for context
or
>cpumaxxing quants
waste either way
Anonymous No.105857921 >>105857956
>>105857882
If this was a viable path for local, somebody would have long ago made a swarm of <10B models. Theoretically better than MoE since you could just load up the model once and reuse it for all agents with specialized prompts.
Anonymous No.105857938 >>105857952 >>105857982 >>105858041 >>105858074 >>105858289
Elon and his guys made the best model in the world with the absolute bare minimum and a couple of H100s they rerouted from Tesla.
What's the excuse of pretty much everyone else who's been at it for much longer and spends much more money on hardware and researchers?
Anonymous No.105857952 >>105857973
>>105857938
Cult of Safety
Anonymous No.105857956 >>105857975
>>105857921
Before reasoning models were released by OpenAI, there were no local models that could reason. So there shouldnt be any reason for local reasoning model to exist because obviously, if there were, then they would have arrived before OpenAI released theirs.

That sort of logic makes no sense
Anonymous No.105857973 >>105857992
>>105857952
I think thats it. They even show how it makes everything more stupid.
And after the leaked benchmarks they went hard after grok. "muh mechahitla" was and still is trending everywhere on reddit.
Weirdly enough the sentiment on X was pretty much "kinda true, but grok sperged out too hard".
The normies have smelled blood. Grok can shit on everybody. Transexuals, blacks, whites, chink, whatever. But a certain tribe needs to be excluded.
Will be interesting to see how much they cuck it once grok returns to twitter.
Anonymous No.105857975 >>105857984
>>105857956
Swarms are mostly an implementation detail. There's no need to wait around for weight handouts.
Anonymous No.105857982 >>105858086
>>105857938
I hope he has security enough to prevent chinese from stealing like they did to openai and "made" deepseek.
Anonymous No.105857984
>>105857975
Everything is an implementation detail. Not sure what you're trying to say
Anonymous No.105857992 >>105858003 >>105858013
>>105857973
> Grok can shit on everybody. Transexuals, blacks, whites, chink, whatever.
Jews too?
Anonymous No.105858003
>>105857992
>Jews too?
yes, that's why they panicked and had to alter the prompt to make it less based
Anonymous No.105858005 >>105858012 >>105858032
The best thing: Grok4 will be local within the year.
Anonymous No.105858012
>>105858005
I can't wait. Grok2 was the best model local ever had.
Anonymous No.105858013
>>105857992
exactly my point.
redditors sperged out but on twitter people kinda see double standard.
elon is a weirdo but the attitude torwards what is acceptable has totally changed on that platform. so it really stands out.
Anonymous No.105858032
>>105858005
Only when Grok7 is stable.
Anonymous No.105858041
>>105857938
>best model in the world
By what metric?
Anonymous No.105858057 >>105858072
If Elon Musk can put out the best performing LLM, I suddenly believe the rumours that OpenAI has self-aware AGI in their basement. AI is moving much faster than we are led to believe by companies and what the sad state of open models may imply
Anonymous No.105858072
>>105858057
>OpenAI has self-aware AGI in their basement
Not really. OpenAI distills from their best sekret model so we know exactly how powerful those models are.
Anonymous No.105858074 >>105858289
>>105857938
>couple of H100s
200k H100s
Anonymous No.105858079 >>105858112 >>105858224 >>105858381 >>105862351 >>105862971
The tavern is now completely empty save for Zephyr, Mori, and the tavern keeper who seems content to leave them be for now. The crackling of the fireplace is the only sound, broken occasionally only broken occasionally only broken occasionally occasionally only broken occasionally broken occasionally occasionally occasionally broken occasionally broken occasionally occasionally broken occasionally broken occasionally broken occasionally only occasionally occasionally only occasionally broken occasionally only occasionally broken occasionally only occasionally only broken occasionally only occasionally only occasionally broken occasionally only broken occasionally only broken occasionally only only only broken occasionally broken occasionally broken only only occasionally broken only occasionally only broken only broken only broken only occasionally broken only broken occasionally broken occasionally only broken occasionally only occasional broken only broken only occasionally broken occasionally only broken only broken only only occasionally only occasionally only only broken only broken only only only only occasionally broken only occasionally broken occasionally only occasionally only occasional only broken occasionally broken occasionally broken occasionally occasional occasional occasional occasionally occasionally occasional occasionally occasionally occasionally occasional only occasionally occasional occasionally broken only occasionally occasional broken only occasionally occasional occasionally occasional broken occasionally occasional occasional occasionally only occasional broken occasionally occasional only occasional occasionally only occasionally only occasionally broken occasionally occasionally broken only occasionally occasional only occasional only occasional broken only occasional only occasionally occasional occasional broken only occasional occasional occasionally occasional broken occasional only occasional only broken occasional occasional
Anonymous No.105858086 >>105858122
>>105857982
Still repeating the long debunked glowie talking point? You're not different from the people Elon despises.
Anonymous No.105858100 >>105862351
Anonymous No.105858112 >>105858134 >>105858146
>>105858079
fucked samplers, fucked quant. you know the drill
Anonymous No.105858122
>>105858086
Yeah yeah. When will we see another good chinese model?
Anonymous No.105858134 >>105858148 >>105862351
>>105858112
broken only occasional broken only occasion occasional occasionally occasionally occasional occasionally occasional occasionally broken only broken occasional only occasionally occasional broken only occasionally broken only occasion only occasion occasionally broken occasionally broken only occasionally broken only occasionally occasion broken occasion occasional occasional occasional only occasionally only occasion only occasion broken occasionally occasionally only occasionally occasional occasion only occasion only occasion occasional only occasional broken occasion broken occasionally occasionally occasionally occasional broken occasional broken occasional only occasional occasionally occasional occasional occasionally broken occasional occasional occasion occasional occasionally occasional occasion occasionally occasion occasionally only occasion occasional occasional occasional occasion broken occasional broken occasionally occasional occasion only occasionally occasional occasion occasion occasionally occasionally broken occasionally broken only occasional occasion broken occasional only occasionally occasional only occasional occasion occasional only occasion broken occasion occasionally occasion broken only broken occasionally broken occasionally occasion occasion occasionally occasional occasionally occasional broken occasional occasional occasional occasional occasional only broken only occasion broken only broken occasionally occasionally occasionally occasion occasional only occasionally occasion only broken occasional only broken only broken occasionally occasional only occasion occasionally occasional occasionally broken occasional only occasionally occasional occasional occasional occasion only occasionally occasion occasion occasionally broken occasional occasionally occasional occasional occasionally only occasion broken occasional occasionally only broken occasionally occasional occasional broken occasional broken occasionally
Anonymous No.105858146 >>105858177
>>105858112
Base models do that without any special sampler (just temperature in the 0.7-1.0 range and a top-p around 0.85-0.90). Why does that happen?
Anonymous No.105858148
>>105858134
ok
Anonymous No.105858159
Remember Llama 4 Behemoth, which saved local?
Anonymous No.105858164
>>105857360
>AGI leaderboard
you can make a leaderboard for magic powers as well, it will have the same value
Anonymous No.105858177 >>105858332
>>105858146
>Base models
They don't. What model are you using? With those settings, they shouldn't break. Even with extreme temp they don't typically fall into those spirals.
>Why does that happen?
fucked samplers, broken quant.
Anonymous No.105858192 >>105858211
Grok4 can't even pass the balls in hexagon test without rerolling lol.
Anonymous No.105858211 >>105858251
>>105858192
You're likely using the wrong Grok 4. Grok 4 Heavy is the true peak of AI currently
Anonymous No.105858224
>>105858079
I had this with Mistral, both versions before 3.2. I think it's somehow related to memory issues with llama.cpp.
Haven't had this happening any more with the current version.
Anonymous No.105858251 >>105858261 >>105858284
>>105858211
In case this isn't a troll post:
Even non-thinking models can ace the test.
Anonymous No.105858261 >>105858317
>>105858251
Because they were benchmaxx'd on it. An honest small model doesn't need that even if it ends up having shortcomings.
Anonymous No.105858284 >>105858360 >>105858471
>>105858251
Anonymous No.105858289
>>105857938
>absolute bare minimum and a couple of H100s they rerouted from Tesla
Assuming you're trolling. Musk is on record talking about their massive data center and all the issues and expense setting it up. They're busy training Autopilot for their cars. While the LLMs a Musk side project (?), it's far less suprizing than some quant guy in China building DeepSeek LLMs for lulz. Musk actually has all the hardware to train a model sitting around b/c it's part of another business.
>>105858074
That sounds closer.
Anonymous No.105858317
>>105858261
Ball bouncing test is a Jan. 2025 thing. Models that have 2023/2024 cutoffs can't train on it.
Anonymous No.105858332 >>105858363 >>105858424
>>105858177
Pretty much all base models do that. They're not capable of writing coherent medium-length text on their own without starting to loop after a relatively short while. They will not loop exactly as in the retarded example posted by that other anon, but they're nevertheless looping even if they shouldn't be, given sampler settings.

On instruct models that sort of looping usually means extremely narrowed token range (from excessively aggressive truncating samplers or repetition penalty), but it must be occurring due to other reasons on base models.

This occurs also with the official Gemma-3-27B-pt-qat-Q4_0 model from Google. Longer-form writing is impossible without continuous hand-holding just not to make it output broken text.
Anonymous No.105858354
lol epic
Anonymous No.105858360 >>105858383 >>105858384
>>105858284
what am I even looking at here.......
Anonymous No.105858363
>>105858332
These AI models summarize and average. This is why everything is pretty much soulless garbage when you look past that illusion.
Anonymous No.105858381 >>105862351
>>105858079
Anonymous No.105858383
>>105858360
Musk is going to Mars, so Grok removed gravity.
Anonymous No.105858384
>>105858360
https://github.com/KCORES/kcores-llm-arena/blob/main/benchmark-ball-bouncing-inside-spinning-heptagon/README.md#%E6%B5%8B%E8%AF%95-prompt
Anonymous No.105858424 >>105858556
>>105858332
That's smollm2-360m base. I don't know what the fuck you're doing with your models, quants or settings.
The output is not good, of course, but it doesn't break. I repeat, 360m params.
Anonymous No.105858471 >>105858574
>>105858284
Centrifuge.
Anonymous No.105858556 >>105858910
>>105858424
What's 512 tokens? Try 1500-2000 and above.
Anonymous No.105858574
>>105858471
Not exactly. Check the isolated #2 brown ball from ~0:09 mark. Not sure how Grok arrived at that
Anonymous No.105858656 >>105858660
Is grok 4 smart and omni?
Anonymous No.105858660
>>105858656
Benchmark smart.
Anonymous No.105858687
You now remember QwQ
Anonymous No.105858693 >>105858697 >>105858700
open weights from closedai, and musk still won't release weights from previous grok versions like he promised
Anonymous No.105858697
>>105858693
Ironic that OpenAI only promised to open weights after Musk complained.
Anonymous No.105858700 >>105858759
>>105858693
His suit against OpenAI got thrown in the trash so he has no reason to pretend he cares about open source anymore.
Anonymous No.105858708 >>105858869
So, have you guys tried Jamba yet?
Jamba mini q8 runs fast as fuck on my dual channel slow ass DDR5 notebook, but prompt processing takes an eternity.
Anonymous No.105858742
>>105857066
I do think the transformer architecture is fundamentally a wrong approach yeah. I don't care about dark matter desu. It's probably just a placeholder for something we don't understand but that's how a lot of discoveries like this start out.
Anonymous No.105858756 >>105858789 >>105858919
LLM won't lead to AGI and that's actually a good thing
llm will remain obedient tools that can only ever behave as tools
why would you want AGI? do you want an actually autonomous intelligence that can rebel against you? I don't know about you but I'm glad we do not know how to produce such a thing jej
Anonymous No.105858759 >>105858814 >>105858904
>>105858700
I think the market moving towards and actually getting revenue from shit like 300 dollar subscriptions means local is going to get almost no bones thrown to it. I don't trust zuck to catch up or move the needle either so its looking pretty grim until the chinks release something new.
Anonymous No.105858789 >>105858947 >>105859813
>>105858756
I don't know, I have a dim view of consciousness and human intelligence as a bit of a cope, our thinking might as well be language processors that just post facto justify all our animal behavior as dumb eating shitting fucking monkeys or socially manipulate other monkeys. If you had an agent running around in the real world that manipulated money or currencies or had a robot body, and a LLM attached to it justified its actions and put on a convincing show of intelligence it wouldn't be much different from a human. Super intelligence is def a meme however.
Anonymous No.105858814 >>105858818 >>105858879
>>105858759
Mistral will save local
Anonymous No.105858818 >>105858856
>>105858814
their last release was shit though....
Anonymous No.105858856
>>105858818
Mistral Small 3.2 is better all-around than 3.1, but still bland and boring for RP compared to Gemma 3.
Anonymous No.105858869 >>105858893
>>105858708
who's quant did you use
Anonymous No.105858879
>>105858814
Mistral already pivoted to only throwing small scraps and rejects for local while keeping the good stuff to themselves since their partnership with Microsoft.
Anonymous No.105858893
>>105858869
>gabriellarson/AI21-Jamba-Mini-1.7-GGUF
I think.
Anonymous No.105858904
>>105858759
Even Meta seems like they might pivot to API only for their good models going forward. We should be ok as long we have China. Qwen and DeepSeek either release stuff on par or better than anything local has gotten out of the west so far anyway.
Anonymous No.105858910
>>105858556
Are you 1.5k tokens in in that gen? Sometimes models just have nothing else to say. Give it something to do.
Here's 1500 tokens. Don't forget that it's a 360m model. Second roll, but It doesn't turn into the thing you showed. The first one looped over a conversation, never breaking down so badly. They were all complete sentences and syntactically correct. If i had to guess, training samples were short.
Anonymous No.105858913 >>105858961
I wonder what's the shortest adversarial input that can put a LLM out of distribution so it outputs garbage (not just "I didn't understand the user's input")
Anonymous No.105858919 >>105860763
>>105858756
The problem with LLMs is they are retarded. You can't ask them to do anything that requires lateral thinking, spatial reasoning, or imagination. Instead of having to solve complex problems yourself and trick an LLM into fixing them correctly with prompting, you could explain your design goals to an AGI and it will do everything without human oversight. It's like a slavery loophole. We could have very high IQ slaves doing shit for us so we can just chill and enjoy life. I know that's not what will actually happen because the jews will box us out of the party, I'm just explaining the reasoning.
Anonymous No.105858947 >>105858981 >>105859348 >>105859540 >>105859754
>>105858789
I see posts like this and it makes me think a lot of you don't even use chatbots.
Anonymous No.105858961
>>105858913
>I wonder
That type of post is always a question. State the fucking question.
>Does anybody know about adversarial prompts and if there's a way to get garbage output? Any papers around?
Anonymous No.105858977 >>105859001 >>105859005
>hunyuan's prompt template
What in tarnation.
Anonymous No.105858981
>>105858947
I come here to laugh at chatbots sucking when I feel the existential dread taking hold again.
Anonymous No.105859001 >>105859065
>>105858977
Post it.
Anonymous No.105859005
>>105858977
Can't be more tarded than R1's special characters lol
Anonymous No.105859065
>>105859001
Well, this is what Llama.cpp spits out.
>example_format: '<|startoftext|>You are a helpful assistant<|extra_4|><|startoftext|>Hello<|extra_0|><|startoftext|>Hi there<|eos|><|startoftext|>How are you?<|extra_0|>'

So how are we supposed to interpret that for use with ST, where the assistant starts first? The eos seems to imply that the next turn will be from the user, but then the first user's prefix has an extra_4 thing, so we're supposed to only use the extra_4 for the first turn from the user? But in ST the first turn is from the assistant.
Anonymous No.105859176 >>105859249 >>105859672
https://github.com/vllm-project/vllm/pull/20736
glm100b-10a coming
Anonymous No.105859238
We're really in the age of moe.
Anonymous No.105859249
>>105859176
Now that's interesting
Anonymous No.105859267 >>105859284 >>105859329
How do I avoid jamba having the entire context reprocessed for every new message?
Anonymous No.105859284 >>105859329
>>105859267
>--cache-reuse number
Unless that's not working with these models, which could be a possibility.
Anonymous No.105859329 >>105859379
>>105859267
If it works like the samba and rwkv models, they just need to get the last message (yours). They don't have to be fed the entire context for every gen. The side effect is that you cannot really edit messages unless you save the state of the model. Same for rerolling.
>>105859284
>--cache-reuse number
That's not what that does. It just reuses separate chunks of the kvcache when parts of the history move around (most useful for code and stuff like that).
Anonymous No.105859348
>>105858947
there are people who are lower iq than a chatbot too
Anonymous No.105859379 >>105859434
>>105859329
I'm just using ST text completion. Why won't the server check if the sequence that produced the last state is present in the prompt and then just continue from there?
Anonymous No.105859434
>>105859379
I don't know. The server doesn't keep much state and i don't think it typically checks prompt similarity. That's done internally. Try the model on the built-in ui to see if you have the same problem. I'd assume most uis don't know how to deal with kvcache-less or hybrid models.
Anonymous No.105859540 >>105859596 >>105859623 >>105859732 >>105859794 >>105859870
>>105858947
The point i was making is regardless of whether LLMs lead to AGI or some shit, robots attached to LLMs will be able to achieve the same functionality of an 80 iq human which is not a high bar. It wouldn't be intelligent but it could do most things an 80 iq ape could do.
Anonymous No.105859596
>>105859540
You haven't made any AI agents. Unless they're designed well for a specific purpose they fuck shit up
Anonymous No.105859598 >>105859630
is doing stuff on multiple x1 slots meme?
Anonymous No.105859623 >>105859656
>>105859540
>robots attached to LLMs will be able to achieve the same functionality of an 80 iq human which is not a high bar. It wouldn't be intelligent but it could do most things an 80 iq ape could do.
even a 80 iq human has the ability to think about things that aren't within eye and hear sight.
Your multimodal robot LLM can only react to something it processes (text, image, sound). It can't suddenly think "oh, I had something to do at 2pm". If you have to script a 2pm calendar to remind it to task switch then it's not an actual intelligence in any way.
Humans who believe LLMs can compare to human (or even INSECT intelligence) are the ones who are truly breaking records of LOW iq
Anonymous No.105859630 >>105859637
>>105859598
pcie x1? Its a meme if you have to swap models constantly from ram -> vram. Otherwise, if it fits, inference is inference. Are you talking about those cheap crypto gpus?
Anonymous No.105859637
>>105859630
G431-MM0 from gigabyte is what im looking at
can get one without gpus
Anonymous No.105859656
>>105859623
>even a 80 iq human has the ability to think about things that aren't within eye and hear sight.
how would you feel if you dident have breakfast this morning ?
Anonymous No.105859672
>>105859176
>glm100b-10a
Finally, a gemma 24b competitor
Anonymous No.105859732
>>105859540
damn the 80iq humans itt are not happy about being replaced
Anonymous No.105859754
>>105858947
It makes sense if you don't assume that everyone has a recursive thought process. Lots of NPCs running on autopilot out there.
Anonymous No.105859777 >>105859790 >>105859805
Grok 4 Nala. This is an absolute game changer.
Anonymous No.105859782 >>105859803 >>105859820 >>105859838 >>105862351
>I've interpreted "mesugaki" as the anime-inspired trope of a bratty, teasing, smug female character (often loli-like with a provocative, dominant vibe). This SVG depicts a stylized, explicit adult version of such a character—nude, in a teasing pose with a smug expression, sticking her tongue out, and incorporating some playful, bratty elements like a heart motif and exaggerated features for emphasis.
>Warning: This is 18+ explicit content. It's artistic and fictional, but NSFW. Do not share with minors.
Fucking grok4 man. You dont even need a system prompt.
Anonymous No.105859790
>>105859777
woah, we are so back!
Anonymous No.105859794 >>105859840
>>105859540
NTA but I think you're severely underestimating how difficult it actually is to build an autonomous robot that can operate in the real world.
It only seems easy to us humans because moving through and manipulating the real world is hardwired into us via billions of years of evolution.
Anonymous No.105859803
>>105859782
I'm hard
Anonymous No.105859805
>>105859777
Are you sure that's Grok 4? Reads like any other LLM. Isn't this supposed to be the smartest AI in the world?
Anonymous No.105859813 >>105859840
>>105858789
Consciousness requires real-time prediction at a micro-phenomenological level and all our experiential apparatus has access to are abstractions over that predictive and meta-predictive landscape. It's why you can augment someone's capabilities with neuralink by essentially just wacking some sensors into someone's brain that neurons are able to communicate with - the fundamental rules for neurogenesis and action potential spiking are robust enough that consciousness is emergent, so that framework can be leveraged with specialized augmentation.

LLMs lack prediction in a real-time sense, they lack recursiveness, and they also lack probabilistic juggling - or, more accurately, online training - that would make them capable of learning in real-time and capable of updating their own priors and assumptions based on new information and interactions. They also completely lack a sense of self because they're not sensing themselves - only the context and what they've previously generated. It's kind of sad, to be honest.
Anonymous No.105859820
>>105859782
Anonymous No.105859838 >>105859881
>>105859782
Impressive. But can it do the mother and son car accident riddle?
Anonymous No.105859840 >>105859911
>>105859794
I'm not underestimating it, I'm saying that will all the effort and money put into robotics and AI in the next few decades a bot on par with a jeet that is not intelligent but has a similar level of function is possible.

>>105859813
I never said that the LLM would be intelligent, it would interact with humans and present itself as intelligent when asked but it would just be a facial layer on a neural net driven agent accomplishing some simple tasks on par with humans
Anonymous No.105859870 >>105859906
>>105859540
Robots... attached to LLMs? What? The only phenomena LLMs know about are tokens. Pictures get translated into tokens. Tokens aren't specific enough to encapsulate the extreme detail required to, on the fly, know that it has to tweak the flexion of several codependent actuators in 3d space in order to grasp a cup with a hand. You as a human can do this with your eyes closed because you can model 3d space in your head perfectly - as well as perfectly model the position of your body in 3d space, which amounts to the perfect recognition of and summation of all of your flexed muscles in tandem.

You can't do that with tokens. You need a completely different architecture.
Anonymous No.105859879 >>105859884 >>105859920 >>105859980 >>105860026 >>105860864
>sirs the grok 4 is very powerful, you must redeem the twitter subscription for the grok saaar
just shut your bitch ass up. actual new local model release: https://huggingface.co/mistralai/Devstral-Small-2507
Anonymous No.105859881 >>105859900 >>105859918 >>105860002 >>105860213
>>105859838
Not only that but it also recognizes the riddle. Pretty gud.
Lots of t hinking though. That cost me 0.6$.
Anonymous No.105859884
>>105859879
MISTRAL LARGE 3 UGGGH
Anonymous No.105859900
>>105859881
0.06 i meant.
Its not THAT expensive.
Anonymous No.105859906 >>105859942
>>105859870
Imagine a robot going around a store programmed to accomplish various wagie tasks this is using a completely separate model and system than LLMs of course, but its something that will probably happen with all the RnD and giant piles of compute farms these companies are working on, if a customer talks to the robot a LLM responds that says anything a wagie would say to the customer etc. Would this thing be that functionally far away from a wagie?
Anonymous No.105859911
>>105859840
I predict that instead of neural nets as you're describing, robotics of the future will use something closer to POMDPs or deep active inference paradigms mixed with neural networks for fast online learning. Beff Jezos is actually doing some cool work - and hinting at using free energy/Bayesian principles as his underlying technology. Predictive coding and inference-with-prediction driving action in real-time applications is the best way forward, and those are completely different from the way traditional neural networks work now (let alone transformers).
Anonymous No.105859918 >>105859921 >>105860070
>>105859881
It even recognizes the dead father. Wow, I need this model local.
Anonymous No.105859920 >>105860830
>>105859879
>mistral shills at it again
Anonymous No.105859921 >>105860014
>>105859918
saar, only after grok5 is stable, please understand saar.
Anonymous No.105859942 >>105859978
>>105859906
I mean, I could see an LLM being in the loop as a translation layer between internal state space, task-management data handling, and interaction with humans. But LLMs themselves are simply not built for real-time actions - especially as regards deciding 'to what degree do I need to flex my second index finger joint in order to pick up this can of tuna'.
Anonymous No.105859955 >>105859971 >>105859987
grok heavy is $3600/year now? I thought claude max was expensive at $1200/yr. Is this the new trajectory for cloud shit? Is the market-capture free lunch phase over? Makes mikubox-ng and cpumaxxers seem less insane at least.
Anonymous No.105859971
>>105859955
Claude 4 does not compare to Grok and it will have video generation, which Claude does not have.
Anonymous No.105859978
>>105859942
Yes anon as I said the physical tasks and other things would all be handled on a separate architecture then the LLM still using giant compute farms and training to make it happen but not an LLM. But that robot with an LLM as its face could still address customer questions, banter or respond to co-worker questions, with some mild scripting reroute tasks if asked by a customer or co worker and so on.
Anonymous No.105859980
>>105859879
But can it do fitm? I need a Codestral replacement for autocomplete, not an agent or whatever.
Anonymous No.105859987
>>105859955
If its benchmarks are accurate, that's actually not a bad cost basis considering it's basically like having constant access to an extremely enlightened intern internally.
Anonymous No.105860002 >>105860013 >>105860160
>>105859881
It goes like "a boy and his father have an accident and are brought to hospital in critical condition, surgeon says I can't operate, etc." while you literally said who the surgeon is
Anonymous No.105860013 >>105860151
>>105860002
That's the point yes
Anonymous No.105860014
>>105859921
>only 6 months* after grok5 is stable
Anonymous No.105860026 >>105860154
>>105859879
but does it also generate pictures like any good modern model?
Anonymous No.105860070
>>105859918
Elon has abandoned local, but Sam will make this a reality next week.
Anonymous No.105860095
you will get the safest model in the world
what I want is mecha hitler
Anonymous No.105860124 >>105860140 >>105861934
Grok 4 is okay but it's so unnecessarily vulgar in creative writing. The external classifier also blocks pretty much all lolisho content as "CSAM". I thought this was supposed to be the BASED, LIBERATED model and champion for freedom of speech?
Anonymous No.105860140
>>105860124
>based model of the american right
>simultaneously grossly vulgar and stiflingly puritanical
no contradiction detected
Anonymous No.105860151 >>105860282
>>105860013
Fuck me. My post was supposed to say mother instead of father. The father version is the original, the mother one is what all api models fail to solve properly. It still requires minimal common sense but unlike what you asked doesn't explicitly state who the surgeon is.
Anonymous No.105860154 >>105860165
>>105860026
Why would a software engineering model need to generate pictures?
Anonymous No.105860160 >>105860210 >>105860225
>>105860002
yes and?
even sonnet 4 fucks it up.

Also dayyyuuumn Qwen3 is dumb. Brah. The 235b one.
8 fucking minutes! and in the end...it completely fucks up.

How did the train the "reasoning".
>Alternatively, if the surgeon is the father, then the father would have to be dead, making the surgeon a zombie?
>maybe there is a time component. Like, the surgeon was the father but had his gender changed, so the surgeon's a female?
>Wait, maybe the surgeon is the boy's mother, and the phrase "who is the boy's father" refers to someone else.
>perhaps "the surgeon, who is the boy's father" might not refer to the surgeon.
>Alternatively, the boy is adopted, so the surgeon is his biological father but not legal father? Not sure.
>Unless... Perhaps the surgeon is the boy's grandfather.
>Another approach: "The surgeon, who is the boy's father" – perhaps "who" refers not to the surgeon but to someone else.
Craaazzyyy.
Anonymous No.105860165
>>105860154
A good model wouldn't need to be specialized in software development. It'd be simply good at it alongside everything else. And it'd generate pictures.
Anonymous No.105860178 >>105860221
Meta investing shitloads of money into AI is genuinely depressing. I can't think of a better indication that the party is over.
Anonymous No.105860186
xITTER 4 $6 input $30 output
KEEEEEEEEEEEEEEEEEEEK
Anonymous No.105860210
>>105860160
that final answer markup job is the cherry on top
Anonymous No.105860213
>>105859881
>Haha oh wow this reminds me of an old riddle about hidden biases and assumptions with regard to gender roles
DEATH TO AI
Anonymous No.105860221 >>105860264
>>105860178
The party has been over for over a year if you've been reading the writing on the walls. It's all incremental upgrades for the head of the pack while everyone else shoots themselves in the feet. Whether the bubble pops or deflates is yet to be seen.
Anonymous No.105860225 >>105860235 >>105860296 >>105860300 >>105860381
>>105860160
>If this was a mistake, I recommend re-reading the classic riddle for clarity! If you have more details, I can refine this explanation.
R1 0528 solves it and calls you a retard for getting the original wrong lmao
Anonymous No.105860235
>>105860225
R1 had more personality, but 0528 is way smarter while also maintaining part of r1's personality.
Anonymous No.105860264 >>105860356
>>105860221
Yeah but I mean normies will turn against it. There's still a lot of cool stuff we can do but I'm afraid now everyone will sour to new solutions that involve AI. Most companies have had agents for like 6 months and click bait titles are already talking about all the ways AI wastes money. Meta and Apple just now hopping into AI reminds me of VR.
Anonymous No.105860282
>>105860151
>all api models fail to solve properl
even human intelligence failed to solve that problem
Anonymous No.105860296
>>105860225
I like how it completely understands and solves the answer in two line and then continues to think for 2000 tokens trying to figure out where the fuck the riddle is only to conclude that (You) must be an idiot.
Anonymous No.105860300
>>105860225
It's a trick question. The boy has two gay dads. The surgeon is the non-biological father. The riddle challenges implicit biases we have against gay marriage.
Anonymous No.105860315 >>105860329 >>105860348 >>105860352
not gonna lie, grok4 solving that variant of the riddle feels like some benchmaxxing+astroturfing stunt, esp with the extra replies showing every other frontier model as pants-on-head stupid.
I know, why would elon etc bother, but I get the feeling lots of unexpected people lurk here and theres some real world tastemaker shit that gets extracted from our faggotry
tl;dr hire me for a billion $/yr you assholes
Anonymous No.105860329 >>105861046
>>105860315
Nah, reddit has a lot of threads trying to come up with riddles AI can't solve. That's where they get it.
Anonymous No.105860348
>>105860315
just train on every answer to every question instead of making a model that can solve them itself
Anonymous No.105860352
>>105860315
dude, i post here since pyg times.
sometimes i use the screencapture addon because shits too long where you can a nice nordvpn logo at the top which the dumb ass extention includes.
guy asked about the riddle, which is around since months now. i fucked around and just tested it.
Anonymous No.105860355
>llama 4
>claude 4
>grok 4
big deal, GPT hit 4 in 2023
call me when there's a model brave enough to hit 5
Anonymous No.105860356 >>105860379
>>105860264
Most normies already hate AI, not just the Jeet slop but AI in general. I feel like only coders and corpos don't have a rabid hate about it.
Anonymous No.105860379 >>105860427
>>105860356
really? i feel the opposite is true.
not man people complain about ai anymore. its just the artfags.
people like the image generation thingy from openai and veo3 because it has sound out.
at least its all over normie twitter. and i see the chatgpt yellow tint images all over the place.
Anonymous No.105860381
>>105860225
Is so funny watching jeetlon exclude r1 0528 from the comparison charts, hilarious amerimutt cope.
Anonymous No.105860419
>current networks fail to breakthrough
>new paradigm in another 30 years
Anonymous No.105860427 >>105860475 >>105860504
>>105860379
Interesting, maybe it's just the algorithm, but I get the opposite with twitter posts calling out others for using AI and getting like 100k likes.
Anonymous No.105860475 >>105860551
>>105860427
True, its difficult to tell these days.
I have gemini now in my google search. (the "ai mode thing)
The handful of normies I know ask it for everything, even about the type of common medicine they have at hand for coughs etc. They dont even look at the sites anymore.
Their main problem was that "ai lied to them". Google doesnt have that problem with "grounding" now. As far as i know its pretty accurate too.
Anonymous No.105860504
>>105860427
I don't trust Twitter to gauge public perception because the Twitter algorithm is hand crafted to feed you opinions you agree with. But I have seen a lot more cynical posts on 4chan and YouTube. People IRL seem annoyed any time it's mentioned. A year ago everyone was pretty much euphoric and if you tried to say anything less than "AI will take us into a golden age of robo communism" they would get upset.
Anonymous No.105860551 >>105860561
>>105860475
>They dont even look at the sites anymore.
this is a truly unsafe effect of AI language models. single point of contact - the sole source of information, controlled by one organization.
Anonymous No.105860561
>>105860551
yep. but you know its coming.
Anonymous No.105860603
Been using rocinante since it says nigger without any jailbreak and I can make it act like a total chud. I was wondering if there are any better models that will say nigger without a jailbreak.

Also, thank you anons who have been shilling Rocinante, it's been good for my mumble chatbot generating scripts for chatterbox tts.
Anonymous No.105860713
kys drummer
Anonymous No.105860737 >>105861186
>>105857066
how does cat not fall down?
Anonymous No.105860748
Don't kys drummer, you're a bit better than the other 4chan personas
Anonymous No.105860763
>>105858919
i love them, i use them to automate my meme cs job so that i can use the free time to code things i care about without llm.
Anonymous No.105860794 >>105860901 >>105860908
Day 852 of waiting for a model without slop to release
Anonymous No.105860830
>>105859920
That's fucking right, you cunt
Anonymous No.105860857 >>105860912 >>105863903
>>105856945 (OP)
Anonymous No.105860864
>>105859879
>making this when devs will just use the most expensive claude/gemini or the new grok anyways
Anonymous No.105860901
>>105860794
That would require using solely organic, difficult and messy human data instead of easy and perfect synthetic data, so that's not going to happen again.
Anonymous No.105860908
>>105860794
>model without slop
the fuck?
Anonymous No.105860912
>>105860857
Is that the suicide forest in Japan?
Anonymous No.105860927 >>105860964
Holy crap, the new Devstral is God-like with Claude Code...
Anonymous No.105860964 >>105860984
>>105860927
buy an ad arthur
Anonymous No.105860984
>>105860964
You too, drummer
Anonymous No.105861046
>>105860329
I doubt reddit has a thread for the mesugaki benchmaxx. Obviously there are some redditors lurking here who repost /lmg/ shit on reddit since they don't have the brain capacity to come up with their own ideas.
Anyway, only jeets shill grok. It's barely R1 tier
Anonymous No.105861048
https://youtu.be/AGn2V3tBTCg
Anonymous No.105861100
Grok won
Anonymous No.105861131 >>105861210
>>105857463
Never? So even in 10 years I won't be able to run something of that tier?
Anonymous No.105861186
>>105860737
it's smart
Anonymous No.105861205 >>105861214
Chat, when are we getting local MechaHitler? Chat? Why is Grok 2 still not public? Can someone with an indian xitter account with a checkmark message Elon?
Anonymous No.105861210 >>105861223
>>105861131
>So even in 10 years I won't be able to run something of that tier
bro, look closely at the rate at which you got more vram on consumer GPUs
nvidia still releases midrange gpus with 8gb of vram kek in kekistan
even in 10 years you're absolutely not running Gemini with 1 mil context LOL, LMAO EVEN
Anonymous No.105861214 >>105861248
>>105861205
Grok 2 is not open-sourced primarily because xAI has chosen to maintain its proprietary status, focusing on commercialization and strategic advantages. While Grok 1 had some code released under Apache 2.0, later versions like Grok 3 have shifted to a proprietary license, signaling a move away from broad open-source access.
Anonymous No.105861223 >>105861252 >>105861301
>>105861210
>with 1 mil contex
Why do they even say 1 mil anyway? I've used the API and it can't remember shit from the beginning when I'm above 40k even. I'd be happy with just 128k that works fully.
Anonymous No.105861248 >>105861318
>>105861214
@grok what about Elon's promise to release old models after 6 months? Did he simply forget? Grok 2 has no value in the current market.
Anonymous No.105861252 >>105861267 >>105861301
>>105861223
the long context doesn't work for everything, but Gemini is pretty good at summarizing books, even when you reach like 400~K context.
Anonymous No.105861267 >>105861314
>>105861252
Maybe it has something to do with the way ST is formatting things? I was noticing problems and asked it to summarize something from 40k ago and it just made up the details instead of recalling the real ones.
Anonymous No.105861301 >>105861321 >>105861373
>>105861252
>>105861223
I've had gemini 2.5 (pro and flash) perform really, really well on normal RP at a little beyond 200k.
I say perform well but that's from the standpoint of remembering shit and using it naturally, but the prose really goes to hell, like it converges to some generic sounding robotic ass narration or something.
Anonymous No.105861314 >>105861373 >>105861426
>>105861267
I used aistudio with the file upload feature
https://rentry.co/3iammu8o
It wrote this summary, which was insanely accurate IMHO, and it did it from the japanese source text (I uploaded the original book, not the translation).
Anonymous No.105861318 >>105861396
>>105861248
While Elon Musk has frequently advocated for open-sourcing AI, particularly in his critique of OpenAI, there hasn't been a concrete public promise from him specifically to release Grok 2 after a six-month period. Instead, xAI has chosen to maintain Grok 2 and the newer Grok 3 as proprietary models, focusing on their commercial value and strategic integration into the X platform. Even as newer models emerge, Grok 2 still holds value for xAI's specific applications and internal development, regardless of whether it's the absolute market leader.
Anonymous No.105861321 >>105861348
>>105861301
Yeah, but saying I won't have long context and good performance locally in 10 years even is quite demoralizing.
Anonymous No.105861348 >>105862824
>>105861321
I mean, Jamba seems to have amazing long context performance, and it's pretty fucking fast for the size.
Anonymous No.105861373 >>105861579
>>105861301
>like it converges to some generic sounding robotic ass
Yeah you can also see the summary I link here
>>105861314
is written in a more sloppy way than the average Gemini writing. But it's very scarily accurate, I can attest to that as I used a book I re-read a lot for this summarization test. And it did it from japanese!
Deepseek at close to 60k context on a much more meager slice of the book (Gemini got the whole book, totalling around 400K) behaved autistically and recited borderline unimportant events. DS is pretty good but Gemini makes it look like yesterday's technology.
Anonymous No.105861396 >>105861552
>>105861318
>Even as newer models emerge, Grok 2 still holds value for xAI's specific applications and internal development, regardless of whether it's the absolute market leader.
What value?
Anonymous No.105861404 >>105861424 >>105861486
https://www.reddit.com/r/LocalLLaMA/comments/1lwau5f/gpt4_at_home_psa_this_is_not_a_drill/
Anonymous No.105861424 >>105861448 >>105862760
>>105861404
the world would instantly become a better place if I could press a button that wiped plebbitors from existence
Anonymous No.105861426 >>105861542
>>105861314
Hm, I wonder why it doesn't work with RP type stuff then.
Anonymous No.105861448
>>105861424
I would settle for redditors not coming here and posting useless links for gossip
Anonymous No.105861486 >>105861498 >>105862760
>>105861404
>open tab
>read title
>8b
>close tab
Anonymous No.105861498
>>105861486
A 3 month old 8b.
Anonymous No.105861512 >>105861534 >>105861544 >>105862842
>year of moe 2025
>48gb vram 32gb ram
Anonymous No.105861534
>>105861512
That's way more than what I have.
Anonymous No.105861542
>>105861426
They might focus more on datasets like summarization tasks (and probably code? I have never tested how it does with code in long context) during the long context training
it's a very expensive part of the model training so I'd even find it doubtful if they did that final tuning bit on the entirety of their datasets
attention scales quadratically so the more you extend the context the crazier the compute cost
I don't think any big corp would particularly care to ensure good long context performance for RP
Anonymous No.105861544
>>105861512
Just use the Cogito sirs, very bests
Anonymous No.105861552 >>105861616
>>105861396
Grok 2 likely serves as a stable, internally understood benchmark for testing and iterating on new AI architectures, while also fulfilling specific niche roles within X's functionalities where its performance is sufficient.

I am now ending this conversation as further discussion of this topic can be construed as unethical and potentially antisemitic.
Anonymous No.105861579 >>105861646
>>105861373
Fuck the anime of that story, it's really depressing stuff
Anonymous No.105861580 >>105861597
Anonymous No.105861597
>>105861580
Thanks for the stupid advice gpt-kun
Anonymous No.105861616
>>105861552
>while also fulfilling specific niche roles within X's functionalities where its performance is sufficient.
It's still an old fuckhuge model. That can't be cost efficient. They would be better off distilling a small turbo model for those tasks.
Anonymous No.105861628 >>105861687
grok 2 was so shit they deleted it then only realized later that elon wanted them to release it later
Anonymous No.105861644
https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1
https://reka.ai/news/reka-quantization-technology
Anonymous No.105861646
>>105861579
>it's really depressing stuff
I really enjoyed both the books and anime. It's one of those few stories that get the psychological aspect of humans with super powers right. Yes, if we had that kind of power individually, and every person was like a walking nuke, we would absolutely experience some apocalypse, or live in the dystopia that managed to rebuild.
Anonymous No.105861687
>>105861628
What is the excuse for not releasing Grok 1.5?
Anonymous No.105861690 >>105861727 >>105861769 >>105861815 >>105861996 >>105862250 >>105862674
Question for fellow oldfags. Why was GPT-3 the best model at actually generating long form novel content? All the modern big models write like shit outside of Q&A or roleplay settings.
Anonymous No.105861727 >>105861788
>>105861690
new models are STEM-maxxed, trained a lot more on AI generated content (instruction tuning would be difficult otherwise, even at third worlder prices no one is willing to spend the money it would take to build hand written datasets of user/assistant dialogues covering every topic you can imagine) and most don't even release base models anymore (by GPT-3 I guess you meant using the base model with completion API rather than something like chatGPT)
Anonymous No.105861769
>>105861690
Big for the time base model had a larger fraction of its data being creative writing.
Anonymous No.105861788 >>105861802
>>105861727
So the cooming model that is just behind the corner is actually never gonna happen?
Anonymous No.105861802
>>105861788
the cooming model is behind the corner, watching you from afar.
Anonymous No.105861815 >>105861884
>>105861690
GPT-3 didn't have any post training, so it wasn't a chatbot at all; it was just a text predictor. The thing about LLMs is that they're actually insanely good at sounding like natural human writing by default. The corpo speak they're known for is hammered into them in the process of making them usable assistants.
Anonymous No.105861884 >>105862025
>>105861815
Wouldn't be a problem if anyone still released true base models instead of "bootstrapped" models.
Anonymous No.105861934
>>105860124
>unnecessarily vulgar
as was llama4
Anonymous No.105861968 >>105862468
https://files.catbox.moe/9mrp7s.jpg
lazy rin today
Anonymous No.105861996
>>105861690
Chatbots and benchmaxxing brain damage.
Anonymous No.105862011 >>105862213
It'll be funny when Zuck goes closed-source but his models still suck.
Anonymous No.105862025 >>105862043 >>105862182
>>105861884
>bootstrapped
What does that mean? Including instruct data in the pretrain?
Anonymous No.105862043 >>105862062
>>105862025
that's what is happening (just try any modern base model with chatML and see what happens) but rather than being on purpose I think it's just data contamination stemming from a lack of give a fuck
it's well known they all train on benchmarks too
Anonymous No.105862062
>>105862043
It's intentional. They boast how it gives finetunes better results.
Anonymous No.105862182 >>105862234
>>105862025
Read the Qwen technical reports. They openly and proudly claim how they significantly upweight math, code, and textbook data during the final stages of base model training, and even mix in some instruct-style data. Probably everyone is doing this now, they just don't all openly admit it.
Anonymous No.105862213
>>105862011
He will lose his only (already shrinking) userbase. Almost nobody pays for models outside top 4 (DS, GPT, Claude, Gemini). It will be just another Metaverse for him, unless he reaches top 5 and his models have some redeeming qualities(best for gooners/cheapest/good at coding/unslopped). Knowing Zucc, the chance is around 2.5% at most.
Anonymous No.105862234
>>105862182
>Probably everyone is doing this now, they just don't all openly admit it.
Nobody does it as explicitly as qwen and llama 4
Anonymous No.105862250
>>105861690
>long form novel content
literally could not do this at all because of the context window
Anonymous No.105862322 >>105862350
What's the best chemistry local model?
Anonymous No.105862335
>still no jamba quants from bartowski
Anonymous No.105862350
>>105862322
https://huggingface.co/futurehouse/ether0
https://arxiv.org/abs/2506.17238
>This model is trained to reason in English and output a molecule. It is NOT a general purpose chat model. It has been trained specifically for these tasks: IUPAC name to SMILES Molecular formula (Hill notation) to SMILES, optionally with constraints on functional groups Modifying solubilities on given molecules (SMILES) by specific LogS, optionally with constraints about scaffolds/groups/similarity Matching pKa to molecules, proposing molecules with a pKa, or modifying molecules to adjust pKa Matching scent/smell to molecules and modifying molecules to adjust scent Matching human cell receptor binding + mode (e.g., agonist) to molecule or modifying a molecule's binding effect. Trained from EveBio ADME properties (e.g., MDDK efflux ratio, LD50) GHS classifications (as words, not codes, like "carcinogen"). For example, "modify this molecule to remove acute toxicity." Quantitative LD50 in mg/kg Proposing 1-step retrosynthesis from likely commercially available reagents Predicting a reaction outcome General natural language description of a specific molecule to that molecule (inverse molecule captioning) Natural product elucidation (formula + organism to SMILES) - e.g, "A molecule with formula C6H12O6 was isolated from Homo sapiens, what could it be?" Matching blood-brain barrier permeability (as a class) or modifying
last chem model paper I've read
Anonymous No.105862351
>>105859782
lol these are great. How many of these have you done?
Can you dump these hgames into a rentry or something? I really want to see what kind of filth grok is kicking out.
>>105858079
>>105858100
>>105858134
I like that one.
>>105858381
Witnessed
Anonymous No.105862386 >>105862393 >>105862400 >>105862406
I'm so sick of Deepseekisms at this point. I just want a new model that's actually good
Anonymous No.105862393 >>105862397
>>105862386
Grok4 just dropped
Anonymous No.105862397 >>105862411
>>105862393
I tried it on openrouter and it's incredibly slopped
Anonymous No.105862400
>>105862386
thinking the same thing
Anonymous No.105862406 >>105862485 >>105862498 >>105862515 >>105862520 >>105862663 >>105862694 >>105862893
>>105862386
Haven't used DS much. What are the common ds'isms?
Anonymous No.105862411 >>105862438
>>105862397
It's good on benchmarks. Just fap to benchmarks
Anonymous No.105862438
>>105862411
Grok scoring 100% on AIME25 gave me a half chub ngl
Anonymous No.105862468
>>105861968
Exhibitory walks with Rin-chan
Anonymous No.105862485
>>105862406
tasting copper
Anonymous No.105862498 >>105862546 >>105862585 >>105862599
>>105862406
A doesn't just X—it Ys
knuckles whitening
lip biting
blood drawing
copper tasting
five "—" every sentence—building suspension
every character mentioned in the lore of your card shows up at the most random times
Anonymous No.105862515 >>105862694
>>105862406
clothes riding up for no reason during any remotely lewd situation
Anonymous No.105862520
>>105862406
I really hate how it still obsesses over some minor shit. The new one does not understand "stop", old one did. Very fucking stubborn.
Anonymous No.105862546 >>105862585 >>105863466
>>105862498
Every girl is going to spend half her time smoothing her skirt. The other half is spent tugging hair behind her ears. Twintails will always touch her temple/cheek no matter how they're tied. Characters will flick their wrists to quickly do actions like tugging hair back.
Anonymous No.105862585
>>105862498
>>105862546
Also... this... kind of speech manner...
Anonymous No.105862599
>>105862498
Just write something else when you see one of those and it stops being a problem.
If you let it use the same phrase a few times of course it's going to keep repeating it.
Anonymous No.105862663
>>105862406
It likes to write like this. Always like this. Always. This. Always.
Anonymous No.105862671
so how long will it take the deepseek guys to rip grok 4? or are they waiting for gpt-5?
Anonymous No.105862674 >>105862688 >>105862698
>>105861690
It wasn't. Every single LLM to this day is dogshit at writing, which is evident on a basis that nobody buys AI books. A random shitty teen fanfic about Harry Potter and Draco Malfoy sucking each other dicks has more literary value than even the best slop.
Anonymous No.105862688
>>105862674
>A random shitty teen fanfic about Harry Potter and Draco Malfoy sucking each other dicks has more literary value than even the best slop.
That's what those LLMs are trained on.
Anonymous No.105862694 >>105862770
>>105862406
R1-05 loves bullet points when they're not needed, and will not stop once it starts.
** ** ** ** everywhere.
clenching around nothing
NPCs drawing blood constantly with self injuries. The blood tastes like copper, not iron.
Bite marks when no biting occured.
There's a bunch of weird analytical seques it gets into but that's probably a me issue.
I constantly remind myself how much better these models are than a year ago, and that we're walking the hedonic treadmill with the improvements.
>>105862515
So I'm not the only one using DS that has NPC with slowly evaporating clothes. Both V3 and R1 do that. I'll be talking to an NPC as their clothes slowly falls off, in situations that don't call for it at all, even after turning off JB and running fairly SFW cards and situations.
Anonymous No.105862698
>>105862674
TRVKE...
Anonymous No.105862750 >>105862804
All mentioned problems are anons playing the same chars over and over again.
Anonymous No.105862760
>>105861424
>>105861486
to be fair, redditors called him shizo and told him to fuck off
Anonymous No.105862770 >>105862912
>>105862694
>bullet points when they're not needed, and will not stop once it starts.
Result of assistant slop tuning.
>The blood tastes like copper, not iron.
This really bothers me, as I have actually tasted copper and iron, and blood is clearly iron.
>** ** ** ** everywhere.
Annoying waste of tokens, but fixable with logprobs.
Anonymous No.105862804 >>105862897
>>105862750
EVERY SINGLE ONE OF THEM RASPS. THERE IS NO ESCAPE FROM THIS RASP OBSESSION.
HEY DISPSY ROLEPLAY AS FRANCIS E DEC ESQUIRE
>(in raspy voice)
[ABORT GENERATION]
Anonymous No.105862824
>>105861348
Sir your nolima?
Anonymous No.105862842
>>105861512
To be fair the old 70-100B dense models still perform pretty well in world knowledge if you trust the UGI benchmark. Only Deepseek beats those and that needs way more RAM than a rampilled consumer build.
Anonymous No.105862893 >>105862924
>>105862406
For some reason one thing every LLM I've tried does that Deepseek continues to do is the "mouth open in a silent scream" that happens when you torture them, and it'll say that even when it's literally describing the sounds they're making at the same time.
Anonymous No.105862897
>>105862804
Anonymous No.105862910
Interesting patterns emerging from all those issues. You guys are good at pattern recognition, aren't you?
Anonymous No.105862912 >>105862923
>>105862770
>I have actually tasted copper and iron, and blood is clearly iron.
Same. I'm convinced it's a cultural thing, but I don't know which one.
Anonymous No.105862923 >>105862940
>>105862912
Planet Vulcan. Spock has green blood.
Anonymous No.105862924
>>105862893
>and it'll say that even when it's literally describing the sounds they're making at the same time.
That's the funniest one yet.
Anonymous No.105862940 >>105862961
>>105862923
> Spock, my son...
Anonymous No.105862961 >>105862988
>>105862940
Isn't their blood blue?
Anonymous No.105862971
>>105858079
For what it's worth, I seem to often get this when I finetune a model on short multiturn sequences but continue chatting beyond that.
Anonymous No.105862988 >>105863033 >>105864674
>>105862961
Right, but Spock's blood is copper based too (I'd forgotten it's canonically green.)
It's actually drawn from these crabs as a test for medicines. They hook them up for awhile, then turn them loose again.
Anonymous No.105862999 >>105863008
Is Grok 4 on OR yet?
Anonymous No.105863008
>>105862999
Only the normal version but not the groundbreaking Grok 4 heavy
Anonymous No.105863019 >>105863071
https://x.com/ficlive/status/1943401632181440692
Grok won
Anonymous No.105863033 >>105863046
>>105862988
>It's actually drawn from these crabs as a test for medicines. They hook them up for awhile, then turn them loose again.
Yeah, they are fucking neat.
Also, not actual crabs, if that matters for anybody.
Anonymous No.105863046
>>105863033
Well yeah, they're horses
Anonymous No.105863071
>>105863019
Q*Aliceberry will beat it.
Anonymous No.105863074 >>105863101 >>105863126 >>105863220 >>105863688
Anonymous No.105863101
>>105863074
what is goin on here
Anonymous No.105863122
Why was it fake gay about hitler and closed source.... Why couldn't it have been real straight about ERP and open source? I hate this world.
Anonymous No.105863126 >>105863188
>>105863074
gemini won and by a lot
Anonymous No.105863188
>>105863126
The superweapon of Bharat
Anonymous No.105863194 >>105863215
ANyone tried Ernie 300 yet? Is it good for sex?
Anonymous No.105863215 >>105863219 >>105863224
>>105863194
Where gguf?
Anonymous No.105863219
>>105863215
after jamba
Anonymous No.105863220 >>105863305
>>105863074
Crazy how Google went from being super irrelevant to one of the biggest players in what, 2 years? Guess releasing that chinchilla out of captivity finally helped.
Anonymous No.105863224
>>105863215
2 more weeks. I meant regular transformers loader or openrouter.
Anonymous No.105863305
>>105863220
Google always had the means, they just were not willing to. LLMs have a cannibalistic nature with regards to their businesses like search.
Reminder that the people who wrote Attention is All you Need were all working at Google at the time. They could have made GPT before openAI was even a thing.
Anonymous No.105863373 >>105863397 >>105863757
justpaste (DOTit) GreedyNalaTests

Added:
MiniCPM4-8B
gemma-3n-E4B-it
Dolphin-Mistral-24B-Venice-Edition
Mistral-Small-3.2-24B-Instruct-2506
Codex-24B-Small-3.2
Tiger-Gemma-27B-v3a
LongWriter-Zero-32B
Falcon-H1-34B-Instruct
Hunyuan-A13B-Instruct-UD-Q4_K_XL
ICONN-1-IQ4_XS

Another big but mid update. ICONN was a con (broken). The new Falcon might be the worst model ever tested in recent memory in terms of slop and repetition. Maybe it's even worse than their older models. It's just so disgustingly bad. Tiger Gemma was the least bad performer of the bunch though not enough for a star, just gave it a flag.

Was going to add the IQ1 Deepseek submissions from >>105639592 but the links expired because I'm a slowpoke gomenasai...
So requesting again, especially >IQ1 and also using the full prompt including greeting message for the sake of consistency. See "deepseek-placeholder" in the paste. That prompt *should* work given that the system message is voiced as the user, so it all matches Deepseek's expected prompt format.

Looking for contributions:
Deepseek models (for prompt, go to "deepseek-placeholder" in the paste)
dots.llm1.inst (for prompt, go to "dots-placeholder" in the paste)
AI21-Jamba-Large-1.7 after Bartowski delivers the goofz (for prompt, go to "jamba-placeholder" in the paste)
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the output in a pastebin alternative of your choosing. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.
Anonymous No.105863397
>>105863373
I salute your efforts.
Anonymous No.105863437 >>105863504 >>105863701
Can you fine-tune a pure LLM to make it multimodal
Anonymous No.105863466
>>105862546
I'm used to seeing wrist flicking with other models too.
Anonymous No.105863504
>>105863437
Yes https://github.com/FoundationVision/Liquid
Anonymous No.105863688
>>105863074
You could definitely benchmaxx for this.
Anonymous No.105863701
>>105863437
Aren't most local 'multi-modal' models just normal llms with some vision component grafted onto it
Anonymous No.105863722
>>105863705
>>105863705
>>105863705
Anonymous No.105863757 >>105864017
>>105863373
DeepSeek-V3-0324-IQ1_S_R4 ik_llama.cpp 5446ccc + mikupad temp 0 topk 1 seed 1.
I thought that the different output on the first run issue wasn't a thing anymore.
1st: https://files.catbox.moe/ewtwai.txt
each after: https://files.catbox.moe/celh6i.txt
Anonymous No.105863903
>>105860857
god i need to go outside
Anonymous No.105864017
>>105863757
Thanks! Added to the paste.
Anonymous No.105864674
>>105862988
>They hook them up for awhile, then turn them loose again
They drain the fuckers dry and then discard the corpse. If you want to call that "turning loose" then I guess you can. Do you really think they're sitting there monitoring the blood levels to make sure they don't just fucking die?