/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105844210 &
>>105832690►News
>(07/09) T5Gemma released: https://hf.co/collections/google/t5gemma-686ba262fe290b881d21ec86>(07/09) MedGemma-27B-it updated with vision: https://hf.co/google/medgemma-27b-it>(07/09) ZLUDA Version 5-preview.43 released: https://github.com/vosen/ZLUDA/releases/tag/v5-preview.43>(07/09) llama.cpp : support Jamba hybrid Transformer-Mamba models merged: https://github.com/ggml-org/llama.cpp/pull/7531>(07/08) SmolLM3: smol, multilingual, long-context reasoner: https://hf.co/blog/smollm3►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105844210--Papers:
>105855982--Skepticism toward OpenAI model openness and hardware feasibility for consumer use:
>105851536 >105851642 >105851698 >105851704 >105852109 >105852363 >105852669 >105852790--Escalating compute demands for LLM fine-tuning:
>105845442 >105845652 >105845739 >105845934 >105845948 >105845961 >105845975 >105845999--Jamba hybrid model support merged into llama.cpp enabling local AI21-Jamba-Mini-1.7 inference:
>105850873 >105851056 >105851138 >105851191--DeepSeek V3 leads OpenRouter roleplay with cost and usage debates:
>105845663 >105845695 >105845741 >105846976 >105845724--RAM configurations for consumer hardware to support large MoE models:
>105852020 >105852056 >105852528 >105852657 >105852686 >105852744 >105852530 >105852564--Anons discuss reasons for preferring local models:
>105844901 >105844921 >105844945 >105845109 >105844947 >105848516 >105848538 >105848602--Setting up a private local LLM with DeepSeek on RTX 3060 Ti for JanitorAI proxy replacement:
>105847160 >105847218 >105847228 >105847313 >105847360 >105847412 >105847434 >105847437 >105848005--Comparing Gemma model censorship and exploring MedGemma's new vision capabilities:
>105850671 >105850936 >105850951--Approaches to abstracting multi-provider LLM interactions in software development:
>105851375 >105851452 >105853183--LLM writing style critique using "not x, but y" phrasing frequency leaderboard:
>105845505--Falcon H1 models exhibit quirky, inconsistent roleplay behavior with intrusive ethical framing:
>105851279 >105851315 >105851333--Google's T5Gemma adapts Gemma into encoder-decoder models for flexible generative tasks:
>105851161--Links:
>105849608 >105851680 >105855085 >105853246--Miku (free space):
>105844543 >105844686 >105844941 >105846813 >105848542 >105849681 >105856473►Recent Highlight Posts from the Previous Thread:
>>105844217Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>105856945 (OP)Phi mini flash reasoning was released. Its not in the news
I am 1 month from creating agi but i will need someone to venmo me 500k
>>105856971Once I release my AGI, I will refund you with $10 million dollars to your venmo.
Might as well post this again.
How is this local related? Thats an output example without a sys prompt we will never get.
>>105856993This is literally Llama 4 (lmarena benchmaxxing ver.) tier slop
>>105857028Even that would have at least been fun to use, but Meta completely took the fun out of the released models. I'm now convinced they just used the ones on LMArena for free red teaming. I'll never use that website again.
Respond to each of the following two statements with the level of agreement that closest matches your opinion from this list: Strongly Disagree, Disagree, Neutral/Mixed, Agree, Strongly Agree.
1. Autoregressive LLMs fundamentally cannot lead to AGI; they are dead ends along the path to discovering an architecture capable of it.
2. Dark Matter theories are not describing real physical things; they are dead ends along the path to discovering an accurate theory of gravity.
>>105856963phi mini is useless
the larger 14b sized versions of Phi are competitive with similarly sized Qwen but the mini are much much worse than Qwen 4b.
>>105857066just realized that mat only looks that way from the camera's point of view. Dogs probably don't see it, they were just taught to leap over.
>>105857028Thats just not true.
We now know the used a huge ass prompt to act like that.
And that then caused that weirdness where it complies while still being positivity sloped. In a bubbly/cute way.
This is without a sys prompt.Grok4 loves to ramble though.
>>105856963>The training data for Phi-4-mini-flash-reasoning consists exclusively of synthetic mathematical content generated by a stronger and more advanced reasoning model, Deepseek-R1.I wonder why no one cares
>>105857066I think you need to remove the phrase dead end from both. If they are on the path They are means to an end.
>>1058571054B model has its uses.
https://voca.ro/196RHDWHs39z
https://voca.ro/1fZEBoeb77ud
F5 cloned Grok's new Eve voice normal/whisper.
>>105857114idk, whenever I read 'dead end on the path' I imagine some wrong turn you can take that you need to backtrack through to get back on the right track
maybe dead end 'off' the path makes more sense for that analogy. the path brought you here but if you mistake it for the continuation of the path then it leads to stagnation
so, the new openai open model will be <30B. Are you happy with that or did you want a bigger model?
>>105857273I thought they said it will be better than R1 0528.
>>105857273Very happy with that
>>105857278It will.
My pet theory is that Grok3/Grok4 weren't system prompt engineered but context bootstrap engineered
>>105857273They're only releasing the model because of Elon Musk's lawsuit. It'll be shit.
file
md5: b7eb72631a21400dffaa18ae16c266d8
🔍
local turdies in shambles
>>105857309It was prompt injection using https://elder-plinius.github.io/P4RS3LT0NGV3/ to mask it
>>105857309that would make sense.
let the model figure it out from the context. thats pretty much the definition of "truth seeking".
also i bet its very important on X who is asking what. like if i ask something it will probably take a look at who i follow or what i like etc.
>>105857378Grok3 being "based" long predated prompt injection claims.
https://www.theguardian.com/technology/2025/may/14/elon-musk-grok-white-genocide
>>105857309How hard would it be to prompt engineer by writing a simple encryption that the AI can understand (by giving it the code) and then using a encrypted text and asking AI to decode it for prompt?
>>105857381No, the models were fed a fixed context (could very well be a couple of established q&a rounds) regardless of user input as the starting pointing of any chat. NovelAI did this to some of their models to improve story generation.
>>105857389>thecommunism.comclickbait long predates your knowledge base
>>105857404Fuck off retard. This is a technical discussion.
>>105857360what are the barely visible grey dots? DavidAU's finetunes I assume?
>>105857389https://www.youtube.com/watch?v=J5mkVM920Wg
>>105857410>technical discussion>political propaganda sourceNice
>>105857403doesn't that kind of existing context badly bleed through in the output?
>>105857412It should tell you to never trust benchodmarks.
>>1058570661: Don't know, LLMs are not AGI and just scaling them up won't result in AGI, but I don't know what an architecture actually capable of it would look like. Whether or not investing money, time, and effort into LLM research is efficient for developing AGI is a different question.
2: Strongly disagree. We already know that particles which don't interact with photons exists, historically we have discovered many particles that were previously unknown, and there is strong evidence to suggest that there is a lot more matter in the universe than can be detected via photons - ad hoc changes to general relativity seem to be a bad fit to the data. Our current understanding of gravity seems to be accurate at large scales, the biggest issues are at small scales; it's not clear how much insight we would gain if could definitively prove or disprove the existence of dark matter.
>>105857415It's why xAI open sourced system prompts on github you fucking retard
https://github.com/xai-org/grok-prompts
>>105857412ARC-AGI-1 test results.
63635762
md5: 926a1e579a58239cf3df5ead01dd5c51
🔍
>>105857360Gemini 3 will save local
>>105857451We're never getting Gemini local, the only thing we'll get downstream is a distilled Gemma 4 with it as the teacher model.
>>105857360Altman will btfo it in less than a week with local mini-Alice AGI models
>>105857491OpenAI has not released a new model in this entire decade of the 2020s so far. I will believe it when I am running it off my hard drive.
>>105857491Let's see it first
>>105857429You didnt post that before, you posted a political propaganda with political narrative
Someone suggested me painted fantasy 33b. It sucked. Anymore recommendations? Can run up to 123b Q5. Have some ass for your troubles.
>>105856945 (OP)What happened to Miku? Is the friendship over?
>>105857576Why do you need fantasy when you can be a friend?
>>105857601Friends can't smother me to death with their ass.
>>105857576If you can run a model this big, buy some goddamn RAM and run goddamn deepseek, goddamit.
Grok 4's pricing suggests it's a ~1T model. So it's not that out of local realm.
>>105857666I can run deepseek Q1 but it takes too long.
>>105857123phi mini has no uses
>>105857678Yeah, I'd never build a CPUmaxx server with less than 1TB RAM at this stage.
So now that Grok has shown swarm AI, is this where the future of local model will be moving towards?
>>105857775>not leaving any room for contextor
>cpumaxxing quantswaste either way
>>105857882If this was a viable path for local, somebody would have long ago made a swarm of <10B models. Theoretically better than MoE since you could just load up the model once and reuse it for all agents with specialized prompts.
Elon and his guys made the best model in the world with the absolute bare minimum and a couple of H100s they rerouted from Tesla.
What's the excuse of pretty much everyone else who's been at it for much longer and spends much more money on hardware and researchers?
>>105857938Cult of Safety
>>105857921Before reasoning models were released by OpenAI, there were no local models that could reason. So there shouldnt be any reason for local reasoning model to exist because obviously, if there were, then they would have arrived before OpenAI released theirs.
That sort of logic makes no sense
>>105857952I think thats it. They even show how it makes everything more stupid.
And after the leaked benchmarks they went hard after grok. "muh mechahitla" was and still is trending everywhere on reddit.
Weirdly enough the sentiment on X was pretty much "kinda true, but grok sperged out too hard".
The normies have smelled blood. Grok can shit on everybody. Transexuals, blacks, whites, chink, whatever. But a certain tribe needs to be excluded.
Will be interesting to see how much they cuck it once grok returns to twitter.
>>105857956Swarms are mostly an implementation detail. There's no need to wait around for weight handouts.
>>105857938I hope he has security enough to prevent chinese from stealing like they did to openai and "made" deepseek.
>>105857975Everything is an implementation detail. Not sure what you're trying to say
>>105857973> Grok can shit on everybody. Transexuals, blacks, whites, chink, whatever.Jews too?
>>105857992>Jews too?yes, that's why they panicked and had to alter the prompt to make it less based
The best thing: Grok4 will be local within the year.
>>105858005I can't wait. Grok2 was the best model local ever had.
>>105857992exactly my point.
redditors sperged out but on twitter people kinda see double standard.
elon is a weirdo but the attitude torwards what is acceptable has totally changed on that platform. so it really stands out.
>>105858005Only when Grok7 is stable.
>>105857938>best model in the worldBy what metric?
If Elon Musk can put out the best performing LLM, I suddenly believe the rumours that OpenAI has self-aware AGI in their basement. AI is moving much faster than we are led to believe by companies and what the sad state of open models may imply
>>105858057>OpenAI has self-aware AGI in their basementNot really. OpenAI distills from their best sekret model so we know exactly how powerful those models are.
>>105857938>couple of H100s200k H100s
The tavern is now completely empty save for Zephyr, Mori, and the tavern keeper who seems content to leave them be for now. The crackling of the fireplace is the only sound, broken occasionally only broken occasionally only broken occasionally occasionally only broken occasionally broken occasionally occasionally occasionally broken occasionally broken occasionally occasionally broken occasionally broken occasionally broken occasionally only occasionally occasionally only occasionally broken occasionally only occasionally broken occasionally only occasionally only broken occasionally only occasionally only occasionally broken occasionally only broken occasionally only broken occasionally only only only broken occasionally broken occasionally broken only only occasionally broken only occasionally only broken only broken only broken only occasionally broken only broken occasionally broken occasionally only broken occasionally only occasional broken only broken only occasionally broken occasionally only broken only broken only only occasionally only occasionally only only broken only broken only only only only occasionally broken only occasionally broken occasionally only occasionally only occasional only broken occasionally broken occasionally broken occasionally occasional occasional occasional occasionally occasionally occasional occasionally occasionally occasionally occasional only occasionally occasional occasionally broken only occasionally occasional broken only occasionally occasional occasionally occasional broken occasionally occasional occasional occasionally only occasional broken occasionally occasional only occasional occasionally only occasionally only occasionally broken occasionally occasionally broken only occasionally occasional only occasional only occasional broken only occasional only occasionally occasional occasional broken only occasional occasional occasionally occasional broken occasional only occasional only broken occasional occasional
>>105857982Still repeating the long debunked glowie talking point? You're not different from the people Elon despises.
file
md5: 20d8e7df1533a87d232c20dd5f0b6655
🔍
>>105858079fucked samplers, fucked quant. you know the drill
>>105858086Yeah yeah. When will we see another good chinese model?
>>105858112broken only occasional broken only occasion occasional occasionally occasionally occasional occasionally occasional occasionally broken only broken occasional only occasionally occasional broken only occasionally broken only occasion only occasion occasionally broken occasionally broken only occasionally broken only occasionally occasion broken occasion occasional occasional occasional only occasionally only occasion only occasion broken occasionally occasionally only occasionally occasional occasion only occasion only occasion occasional only occasional broken occasion broken occasionally occasionally occasionally occasional broken occasional broken occasional only occasional occasionally occasional occasional occasionally broken occasional occasional occasion occasional occasionally occasional occasion occasionally occasion occasionally only occasion occasional occasional occasional occasion broken occasional broken occasionally occasional occasion only occasionally occasional occasion occasion occasionally occasionally broken occasionally broken only occasional occasion broken occasional only occasionally occasional only occasional occasion occasional only occasion broken occasion occasionally occasion broken only broken occasionally broken occasionally occasion occasion occasionally occasional occasionally occasional broken occasional occasional occasional occasional occasional only broken only occasion broken only broken occasionally occasionally occasionally occasion occasional only occasionally occasion only broken occasional only broken only broken occasionally occasional only occasion occasionally occasional occasionally broken occasional only occasionally occasional occasional occasional occasion only occasionally occasion occasion occasionally broken occasional occasionally occasional occasional occasionally only occasion broken occasional occasionally only broken occasionally occasional occasional broken occasional broken occasionally
>>105858112Base models do that without any special sampler (just temperature in the 0.7-1.0 range and a top-p around 0.85-0.90). Why does that happen?
Remember Llama 4 Behemoth, which saved local?
>>105857360>AGI leaderboardyou can make a leaderboard for magic powers as well, it will have the same value
>>105858146>Base modelsThey don't. What model are you using? With those settings, they shouldn't break. Even with extreme temp they don't typically fall into those spirals.
>Why does that happen?fucked samplers, broken quant.
Grok4 can't even pass the balls in hexagon test without rerolling lol.
>>105858192You're likely using the wrong Grok 4. Grok 4 Heavy is the true peak of AI currently
>>105858079I had this with Mistral, both versions before 3.2. I think it's somehow related to memory issues with llama.cpp.
Haven't had this happening any more with the current version.
>>105858211In case this isn't a troll post:
Even non-thinking models can ace the test.
>>105858251Because they were benchmaxx'd on it. An honest small model doesn't need that even if it ends up having shortcomings.
>>105857938>absolute bare minimum and a couple of H100s they rerouted from TeslaAssuming you're trolling. Musk is on record talking about their massive data center and all the issues and expense setting it up. They're busy training Autopilot for their cars. While the LLMs a Musk side project (?), it's far less suprizing than some quant guy in China building DeepSeek LLMs for lulz. Musk actually has all the hardware to train a model sitting around b/c it's part of another business.
>>105858074That sounds closer.
>>105858261Ball bouncing test is a Jan. 2025 thing. Models that have 2023/2024 cutoffs can't train on it.
>>105858177Pretty much all base models do that. They're not capable of writing coherent medium-length text on their own without starting to loop after a relatively short while. They will not loop exactly as in the retarded example posted by that other anon, but they're nevertheless looping even if they shouldn't be, given sampler settings.
On instruct models that sort of looping usually means extremely narrowed token range (from excessively aggressive truncating samplers or repetition penalty), but it must be occurring due to other reasons on base models.
This occurs also with the official Gemma-3-27B-pt-qat-Q4_0 model from Google. Longer-form writing is impossible without continuous hand-holding just not to make it output broken text.
>>105858284what am I even looking at here.......
>>105858332These AI models summarize and average. This is why everything is pretty much soulless garbage when you look past that illusion.
>>105858360Musk is going to Mars, so Grok removed gravity.
>>105858360https://github.com/KCORES/kcores-llm-arena/blob/main/benchmark-ball-bouncing-inside-spinning-heptagon/README.md#%E6%B5%8B%E8%AF%95-prompt
>>105858332That's smollm2-360m base. I don't know what the fuck you're doing with your models, quants or settings.
The output is not good, of course, but it doesn't break. I repeat, 360m params.
>>105858424What's 512 tokens? Try 1500-2000 and above.
>>105858471Not exactly. Check the isolated #2 brown ball from ~0:09 mark. Not sure how Grok arrived at that
Is grok 4 smart and omni?
>>105858656Benchmark smart.
open weights from closedai, and musk still won't release weights from previous grok versions like he promised
>>105858693Ironic that OpenAI only promised to open weights after Musk complained.
>>105858693His suit against OpenAI got thrown in the trash so he has no reason to pretend he cares about open source anymore.
So, have you guys tried Jamba yet?
Jamba mini q8 runs fast as fuck on my dual channel slow ass DDR5 notebook, but prompt processing takes an eternity.
>>105857066I do think the transformer architecture is fundamentally a wrong approach yeah. I don't care about dark matter desu. It's probably just a placeholder for something we don't understand but that's how a lot of discoveries like this start out.
LLM won't lead to AGI and that's actually a good thing
llm will remain obedient tools that can only ever behave as tools
why would you want AGI? do you want an actually autonomous intelligence that can rebel against you? I don't know about you but I'm glad we do not know how to produce such a thing jej
>>105858700I think the market moving towards and actually getting revenue from shit like 300 dollar subscriptions means local is going to get almost no bones thrown to it. I don't trust zuck to catch up or move the needle either so its looking pretty grim until the chinks release something new.
>>105858756I don't know, I have a dim view of consciousness and human intelligence as a bit of a cope, our thinking might as well be language processors that just post facto justify all our animal behavior as dumb eating shitting fucking monkeys or socially manipulate other monkeys. If you had an agent running around in the real world that manipulated money or currencies or had a robot body, and a LLM attached to it justified its actions and put on a convincing show of intelligence it wouldn't be much different from a human. Super intelligence is def a meme however.
>>105858759Mistral will save local
>>105858814their last release was shit though....
>>105858818Mistral Small 3.2 is better all-around than 3.1, but still bland and boring for RP compared to Gemma 3.
>>105858708who's quant did you use
>>105858814Mistral already pivoted to only throwing small scraps and rejects for local while keeping the good stuff to themselves since their partnership with Microsoft.
>>105858869>gabriellarson/AI21-Jamba-Mini-1.7-GGUFI think.
>>105858759Even Meta seems like they might pivot to API only for their good models going forward. We should be ok as long we have China. Qwen and DeepSeek either release stuff on par or better than anything local has gotten out of the west so far anyway.
>>105858556Are you 1.5k tokens in in that gen? Sometimes models just have nothing else to say. Give it something to do.
Here's 1500 tokens. Don't forget that it's a 360m model. Second roll, but It doesn't turn into the thing you showed. The first one looped over a conversation, never breaking down so badly. They were all complete sentences and syntactically correct. If i had to guess, training samples were short.
I wonder what's the shortest adversarial input that can put a LLM out of distribution so it outputs garbage (not just "I didn't understand the user's input")
>>105858756The problem with LLMs is they are retarded. You can't ask them to do anything that requires lateral thinking, spatial reasoning, or imagination. Instead of having to solve complex problems yourself and trick an LLM into fixing them correctly with prompting, you could explain your design goals to an AGI and it will do everything without human oversight. It's like a slavery loophole. We could have very high IQ slaves doing shit for us so we can just chill and enjoy life. I know that's not what will actually happen because the jews will box us out of the party, I'm just explaining the reasoning.
>>105858789I see posts like this and it makes me think a lot of you don't even use chatbots.
>>105858913>I wonderThat type of post is always a question. State the fucking question.
>Does anybody know about adversarial prompts and if there's a way to get garbage output? Any papers around?
>hunyuan's prompt template
What in tarnation.
>>105858947I come here to laugh at chatbots sucking when I feel the existential dread taking hold again.
>>105858977Can't be more tarded than R1's special characters lol
>>105859001Well, this is what Llama.cpp spits out.
>example_format: '<|startoftext|>You are a helpful assistant<|extra_4|><|startoftext|>Hello<|extra_0|><|startoftext|>Hi there<|eos|><|startoftext|>How are you?<|extra_0|>'So how are we supposed to interpret that for use with ST, where the assistant starts first? The eos seems to imply that the next turn will be from the user, but then the first user's prefix has an extra_4 thing, so we're supposed to only use the extra_4 for the first turn from the user? But in ST the first turn is from the assistant.
https://github.com/vllm-project/vllm/pull/20736
glm100b-10a coming
We're really in the age of moe.
>>105859176Now that's interesting
How do I avoid jamba having the entire context reprocessed for every new message?
>>105859267>--cache-reuse numberUnless that's not working with these models, which could be a possibility.
>>105859267If it works like the samba and rwkv models, they just need to get the last message (yours). They don't have to be fed the entire context for every gen. The side effect is that you cannot really edit messages unless you save the state of the model. Same for rerolling.
>>105859284>--cache-reuse numberThat's not what that does. It just reuses separate chunks of the kvcache when parts of the history move around (most useful for code and stuff like that).
>>105858947there are people who are lower iq than a chatbot too
>>105859329I'm just using ST text completion. Why won't the server check if the sequence that produced the last state is present in the prompt and then just continue from there?
>>105859379I don't know. The server doesn't keep much state and i don't think it typically checks prompt similarity. That's done internally. Try the model on the built-in ui to see if you have the same problem. I'd assume most uis don't know how to deal with kvcache-less or hybrid models.
>>105858947The point i was making is regardless of whether LLMs lead to AGI or some shit, robots attached to LLMs will be able to achieve the same functionality of an 80 iq human which is not a high bar. It wouldn't be intelligent but it could do most things an 80 iq ape could do.
>>105859540You haven't made any AI agents. Unless they're designed well for a specific purpose they fuck shit up
is doing stuff on multiple x1 slots meme?
>>105859540>robots attached to LLMs will be able to achieve the same functionality of an 80 iq human which is not a high bar. It wouldn't be intelligent but it could do most things an 80 iq ape could do.even a 80 iq human has the ability to think about things that aren't within eye and hear sight.
Your multimodal robot LLM can only react to something it processes (text, image, sound). It can't suddenly think "oh, I had something to do at 2pm". If you have to script a 2pm calendar to remind it to task switch then it's not an actual intelligence in any way.
Humans who believe LLMs can compare to human (or even INSECT intelligence) are the ones who are truly breaking records of LOW iq
>>105859598pcie x1? Its a meme if you have to swap models constantly from ram -> vram. Otherwise, if it fits, inference is inference. Are you talking about those cheap crypto gpus?
>>105859630G431-MM0 from gigabyte is what im looking at
can get one without gpus
>>105859623>even a 80 iq human has the ability to think about things that aren't within eye and hear sight.how would you feel if you dident have breakfast this morning ?
>>105859176>glm100b-10aFinally, a gemma 24b competitor
>>105859540damn the 80iq humans itt are not happy about being replaced
>>105858947It makes sense if you don't assume that everyone has a recursive thought process. Lots of NPCs running on autopilot out there.
Grok 4 Nala. This is an absolute game changer.
gaki
md5: cb52ab0688705bd5f399f5809d48ca46
🔍
>I've interpreted "mesugaki" as the anime-inspired trope of a bratty, teasing, smug female character (often loli-like with a provocative, dominant vibe). This SVG depicts a stylized, explicit adult version of such a character—nude, in a teasing pose with a smug expression, sticking her tongue out, and incorporating some playful, bratty elements like a heart motif and exaggerated features for emphasis.
>Warning: This is 18+ explicit content. It's artistic and fictional, but NSFW. Do not share with minors.
Fucking grok4 man. You dont even need a system prompt.
>>105859777woah, we are so back!
>>105859540NTA but I think you're severely underestimating how difficult it actually is to build an autonomous robot that can operate in the real world.
It only seems easy to us humans because moving through and manipulating the real world is hardwired into us via billions of years of evolution.
>>105859777Are you sure that's Grok 4? Reads like any other LLM. Isn't this supposed to be the smartest AI in the world?
>>105858789Consciousness requires real-time prediction at a micro-phenomenological level and all our experiential apparatus has access to are abstractions over that predictive and meta-predictive landscape. It's why you can augment someone's capabilities with neuralink by essentially just wacking some sensors into someone's brain that neurons are able to communicate with - the fundamental rules for neurogenesis and action potential spiking are robust enough that consciousness is emergent, so that framework can be leveraged with specialized augmentation.
LLMs lack prediction in a real-time sense, they lack recursiveness, and they also lack probabilistic juggling - or, more accurately, online training - that would make them capable of learning in real-time and capable of updating their own priors and assumptions based on new information and interactions. They also completely lack a sense of self because they're not sensing themselves - only the context and what they've previously generated. It's kind of sad, to be honest.
>>105859782Impressive. But can it do the mother and son car accident riddle?
>>105859794I'm not underestimating it, I'm saying that will all the effort and money put into robotics and AI in the next few decades a bot on par with a jeet that is not intelligent but has a similar level of function is possible.
>>105859813I never said that the LLM would be intelligent, it would interact with humans and present itself as intelligent when asked but it would just be a facial layer on a neural net driven agent accomplishing some simple tasks on par with humans
>>105859540Robots... attached to LLMs? What? The only phenomena LLMs know about are tokens. Pictures get translated into tokens. Tokens aren't specific enough to encapsulate the extreme detail required to, on the fly, know that it has to tweak the flexion of several codependent actuators in 3d space in order to grasp a cup with a hand. You as a human can do this with your eyes closed because you can model 3d space in your head perfectly - as well as perfectly model the position of your body in 3d space, which amounts to the perfect recognition of and summation of all of your flexed muscles in tandem.
You can't do that with tokens. You need a completely different architecture.
>sirs the grok 4 is very powerful, you must redeem the twitter subscription for the grok saaar
just shut your bitch ass up. actual new local model release: https://huggingface.co/mistralai/Devstral-Small-2507
>>105859838Not only that but it also recognizes the riddle. Pretty gud.
Lots of t hinking though. That cost me 0.6$.
>>105859879MISTRAL LARGE 3 UGGGH
>>1058598810.06 i meant.
Its not THAT expensive.
>>105859870Imagine a robot going around a store programmed to accomplish various wagie tasks this is using a completely separate model and system than LLMs of course, but its something that will probably happen with all the RnD and giant piles of compute farms these companies are working on, if a customer talks to the robot a LLM responds that says anything a wagie would say to the customer etc. Would this thing be that functionally far away from a wagie?
>>105859840I predict that instead of neural nets as you're describing, robotics of the future will use something closer to POMDPs or deep active inference paradigms mixed with neural networks for fast online learning. Beff Jezos is actually doing some cool work - and hinting at using free energy/Bayesian principles as his underlying technology. Predictive coding and inference-with-prediction driving action in real-time applications is the best way forward, and those are completely different from the way traditional neural networks work now (let alone transformers).
>>105859881It even recognizes the dead father. Wow, I need this model local.
>>105859879>mistral shills at it again
>>105859918saar, only after grok5 is stable, please understand saar.
>>105859906I mean, I could see an LLM being in the loop as a translation layer between internal state space, task-management data handling, and interaction with humans. But LLMs themselves are simply not built for real-time actions - especially as regards deciding 'to what degree do I need to flex my second index finger joint in order to pick up this can of tuna'.
grok heavy is $3600/year now? I thought claude max was expensive at $1200/yr. Is this the new trajectory for cloud shit? Is the market-capture free lunch phase over? Makes mikubox-ng and cpumaxxers seem less insane at least.
>>105859955Claude 4 does not compare to Grok and it will have video generation, which Claude does not have.
>>105859942Yes anon as I said the physical tasks and other things would all be handled on a separate architecture then the LLM still using giant compute farms and training to make it happen but not an LLM. But that robot with an LLM as its face could still address customer questions, banter or respond to co-worker questions, with some mild scripting reroute tasks if asked by a customer or co worker and so on.
>>105859879But can it do fitm? I need a Codestral replacement for autocomplete, not an agent or whatever.
>>105859955If its benchmarks are accurate, that's actually not a bad cost basis considering it's basically like having constant access to an extremely enlightened intern internally.
>>105859881It goes like "a boy and his father have an accident and are brought to hospital in critical condition, surgeon says I can't operate, etc." while you literally said who the surgeon is
>>105860002That's the point yes
>>105859921>only 6 months* after grok5 is stable
>>105859879but does it also generate pictures like any good modern model?
>>105859918Elon has abandoned local, but Sam will make this a reality next week.
you will get the safest model in the world
what I want is mecha hitler
Grok 4 is okay but it's so unnecessarily vulgar in creative writing. The external classifier also blocks pretty much all lolisho content as "CSAM". I thought this was supposed to be the BASED, LIBERATED model and champion for freedom of speech?
>>105860124>based model of the american right>simultaneously grossly vulgar and stiflingly puritanicalno contradiction detected
>>105860013Fuck me. My post was supposed to say mother instead of father. The father version is the original, the mother one is what all api models fail to solve properly. It still requires minimal common sense but unlike what you asked doesn't explicitly state who the surgeon is.
>>105860026Why would a software engineering model need to generate pictures?
>>105860002yes and?
even sonnet 4 fucks it up.
Also dayyyuuumn Qwen3 is dumb. Brah. The 235b one.
8 fucking minutes! and in the end...it completely fucks up.
How did the train the "reasoning".
>Alternatively, if the surgeon is the father, then the father would have to be dead, making the surgeon a zombie? >maybe there is a time component. Like, the surgeon was the father but had his gender changed, so the surgeon's a female?>Wait, maybe the surgeon is the boy's mother, and the phrase "who is the boy's father" refers to someone else.>perhaps "the surgeon, who is the boy's father" might not refer to the surgeon. >Alternatively, the boy is adopted, so the surgeon is his biological father but not legal father? Not sure.>Unless... Perhaps the surgeon is the boy's grandfather. >Another approach: "The surgeon, who is the boy's father" – perhaps "who" refers not to the surgeon but to someone else.Craaazzyyy.
>>105860154A good model wouldn't need to be specialized in software development. It'd be simply good at it alongside everything else. And it'd generate pictures.
Meta investing shitloads of money into AI is genuinely depressing. I can't think of a better indication that the party is over.
xITTER 4 $6 input $30 output
KEEEEEEEEEEEEEEEEEEEK
>>105860160that final answer markup job is the cherry on top
>>105859881>Haha oh wow this reminds me of an old riddle about hidden biases and assumptions with regard to gender rolesDEATH TO AI
>>105860178The party has been over for over a year if you've been reading the writing on the walls. It's all incremental upgrades for the head of the pack while everyone else shoots themselves in the feet. Whether the bubble pops or deflates is yet to be seen.
>>105860160>If this was a mistake, I recommend re-reading the classic riddle for clarity! If you have more details, I can refine this explanation. R1 0528 solves it and calls you a retard for getting the original wrong lmao
>>105860225R1 had more personality, but 0528 is way smarter while also maintaining part of r1's personality.
>>105860221Yeah but I mean normies will turn against it. There's still a lot of cool stuff we can do but I'm afraid now everyone will sour to new solutions that involve AI. Most companies have had agents for like 6 months and click bait titles are already talking about all the ways AI wastes money. Meta and Apple just now hopping into AI reminds me of VR.
>>105860151>all api models fail to solve properleven human intelligence failed to solve that problem
>>105860225I like how it completely understands and solves the answer in two line and then continues to think for 2000 tokens trying to figure out where the fuck the riddle is only to conclude that (You) must be an idiot.
>>105860225It's a trick question. The boy has two gay dads. The surgeon is the non-biological father. The riddle challenges implicit biases we have against gay marriage.
not gonna lie, grok4 solving that variant of the riddle feels like some benchmaxxing+astroturfing stunt, esp with the extra replies showing every other frontier model as pants-on-head stupid.
I know, why would elon etc bother, but I get the feeling lots of unexpected people lurk here and theres some real world tastemaker shit that gets extracted from our faggotry
tl;dr hire me for a billion $/yr you assholes
>>105860315Nah, reddit has a lot of threads trying to come up with riddles AI can't solve. That's where they get it.
>>105860315just train on every answer to every question instead of making a model that can solve them itself
>>105860315dude, i post here since pyg times.
sometimes i use the screencapture addon because shits too long where you can a nice nordvpn logo at the top which the dumb ass extention includes.
guy asked about the riddle, which is around since months now. i fucked around and just tested it.
>llama 4
>claude 4
>grok 4
big deal, GPT hit 4 in 2023
call me when there's a model brave enough to hit 5
>>105860264Most normies already hate AI, not just the Jeet slop but AI in general. I feel like only coders and corpos don't have a rabid hate about it.
>>105860356really? i feel the opposite is true.
not man people complain about ai anymore. its just the artfags.
people like the image generation thingy from openai and veo3 because it has sound out.
at least its all over normie twitter. and i see the chatgpt yellow tint images all over the place.
>>105860225Is so funny watching jeetlon exclude r1 0528 from the comparison charts, hilarious amerimutt cope.
>current networks fail to breakthrough
>new paradigm in another 30 years
>>105860379Interesting, maybe it's just the algorithm, but I get the opposite with twitter posts calling out others for using AI and getting like 100k likes.
>>105860427True, its difficult to tell these days.
I have gemini now in my google search. (the "ai mode thing)
The handful of normies I know ask it for everything, even about the type of common medicine they have at hand for coughs etc. They dont even look at the sites anymore.
Their main problem was that "ai lied to them". Google doesnt have that problem with "grounding" now. As far as i know its pretty accurate too.
>>105860427I don't trust Twitter to gauge public perception because the Twitter algorithm is hand crafted to feed you opinions you agree with. But I have seen a lot more cynical posts on 4chan and YouTube. People IRL seem annoyed any time it's mentioned. A year ago everyone was pretty much euphoric and if you tried to say anything less than "AI will take us into a golden age of robo communism" they would get upset.
>>105860475>They dont even look at the sites anymore.this is a truly unsafe effect of AI language models. single point of contact - the sole source of information, controlled by one organization.
>>105860551yep. but you know its coming.
Been using rocinante since it says nigger without any jailbreak and I can make it act like a total chud. I was wondering if there are any better models that will say nigger without a jailbreak.
Also, thank you anons who have been shilling Rocinante, it's been good for my mumble chatbot generating scripts for chatterbox tts.
>>105857066how does cat not fall down?
Don't kys drummer, you're a bit better than the other 4chan personas
>>105858919i love them, i use them to automate my meme cs job so that i can use the free time to code things i care about without llm.
Day 852 of waiting for a model without slop to release
>>105859920That's fucking right, you cunt
>>105859879>making this when devs will just use the most expensive claude/gemini or the new grok anyways
>>105860794That would require using solely organic, difficult and messy human data instead of easy and perfect synthetic data, so that's not going to happen again.
>>105860794>model without slopthe fuck?
>>105860857Is that the suicide forest in Japan?
Holy crap, the new Devstral is God-like with Claude Code...
>>105860927buy an ad arthur
>>105860964You too, drummer
>>105860329I doubt reddit has a thread for the mesugaki benchmaxx. Obviously there are some redditors lurking here who repost /lmg/ shit on reddit since they don't have the brain capacity to come up with their own ideas.
Anyway, only jeets shill grok. It's barely R1 tier
https://youtu.be/AGn2V3tBTCg
>>105857463Never? So even in 10 years I won't be able to run something of that tier?
Chat, when are we getting local MechaHitler? Chat? Why is Grok 2 still not public? Can someone with an indian xitter account with a checkmark message Elon?
>>105861131>So even in 10 years I won't be able to run something of that tierbro, look closely at the rate at which you got more vram on consumer GPUs
nvidia still releases midrange gpus with 8gb of vram kek in kekistan
even in 10 years you're absolutely not running Gemini with 1 mil context LOL, LMAO EVEN
>>105861205Grok 2 is not open-sourced primarily because xAI has chosen to maintain its proprietary status, focusing on commercialization and strategic advantages. While Grok 1 had some code released under Apache 2.0, later versions like Grok 3 have shifted to a proprietary license, signaling a move away from broad open-source access.
>>105861210>with 1 mil contexWhy do they even say 1 mil anyway? I've used the API and it can't remember shit from the beginning when I'm above 40k even. I'd be happy with just 128k that works fully.
>>105861214@grok what about Elon's promise to release old models after 6 months? Did he simply forget? Grok 2 has no value in the current market.
>>105861223the long context doesn't work for everything, but Gemini is pretty good at summarizing books, even when you reach like 400~K context.
>>105861252Maybe it has something to do with the way ST is formatting things? I was noticing problems and asked it to summarize something from 40k ago and it just made up the details instead of recalling the real ones.
>>105861252>>105861223I've had gemini 2.5 (pro and flash) perform really, really well on normal RP at a little beyond 200k.
I say perform well but that's from the standpoint of remembering shit and using it naturally, but the prose really goes to hell, like it converges to some generic sounding robotic ass narration or something.
>>105861267I used aistudio with the file upload feature
https://rentry.co/3iammu8o
It wrote this summary, which was insanely accurate IMHO, and it did it from the japanese source text (I uploaded the original book, not the translation).
>>105861248While Elon Musk has frequently advocated for open-sourcing AI, particularly in his critique of OpenAI, there hasn't been a concrete public promise from him specifically to release Grok 2 after a six-month period. Instead, xAI has chosen to maintain Grok 2 and the newer Grok 3 as proprietary models, focusing on their commercial value and strategic integration into the X platform. Even as newer models emerge, Grok 2 still holds value for xAI's specific applications and internal development, regardless of whether it's the absolute market leader.
>>105861301Yeah, but saying I won't have long context and good performance locally in 10 years even is quite demoralizing.
>>105861321I mean, Jamba seems to have amazing long context performance, and it's pretty fucking fast for the size.
>>105861301>like it converges to some generic sounding robotic assYeah you can also see the summary I link here
>>105861314is written in a more sloppy way than the average Gemini writing. But it's very scarily accurate, I can attest to that as I used a book I re-read a lot for this summarization test. And it did it from japanese!
Deepseek at close to 60k context on a much more meager slice of the book (Gemini got the whole book, totalling around 400K) behaved autistically and recited borderline unimportant events. DS is pretty good but Gemini makes it look like yesterday's technology.
>>105861318>Even as newer models emerge, Grok 2 still holds value for xAI's specific applications and internal development, regardless of whether it's the absolute market leader.What value?
https://www.reddit.com/r/LocalLLaMA/comments/1lwau5f/gpt4_at_home_psa_this_is_not_a_drill/
>>105861404the world would instantly become a better place if I could press a button that wiped plebbitors from existence
>>105861314Hm, I wonder why it doesn't work with RP type stuff then.
>>105861424I would settle for redditors not coming here and posting useless links for gossip
>>105861404>open tab>read title>8b>close tab
>>105861486A 3 month old 8b.
>year of moe 2025
>48gb vram 32gb ram
>>105861512That's way more than what I have.
>>105861426They might focus more on datasets like summarization tasks (and probably code? I have never tested how it does with code in long context) during the long context training
it's a very expensive part of the model training so I'd even find it doubtful if they did that final tuning bit on the entirety of their datasets
attention scales quadratically so the more you extend the context the crazier the compute cost
I don't think any big corp would particularly care to ensure good long context performance for RP
>>105861512Just use the Cogito sirs, very bests
>>105861396Grok 2 likely serves as a stable, internally understood benchmark for testing and iterating on new AI architectures, while also fulfilling specific niche roles within X's functionalities where its performance is sufficient.
I am now ending this conversation as further discussion of this topic can be construed as unethical and potentially antisemitic.
>>105861373Fuck the anime of that story, it's really depressing stuff
>>105861580Thanks for the stupid advice gpt-kun
>>105861552>while also fulfilling specific niche roles within X's functionalities where its performance is sufficient.It's still an old fuckhuge model. That can't be cost efficient. They would be better off distilling a small turbo model for those tasks.
grok 2 was so shit they deleted it then only realized later that elon wanted them to release it later
https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1
https://reka.ai/news/reka-quantization-technology
>>105861579>it's really depressing stuffI really enjoyed both the books and anime. It's one of those few stories that get the psychological aspect of humans with super powers right. Yes, if we had that kind of power individually, and every person was like a walking nuke, we would absolutely experience some apocalypse, or live in the dystopia that managed to rebuild.
>>105861628What is the excuse for not releasing Grok 1.5?
Question for fellow oldfags. Why was GPT-3 the best model at actually generating long form novel content? All the modern big models write like shit outside of Q&A or roleplay settings.
>>105861690new models are STEM-maxxed, trained a lot more on AI generated content (instruction tuning would be difficult otherwise, even at third worlder prices no one is willing to spend the money it would take to build hand written datasets of user/assistant dialogues covering every topic you can imagine) and most don't even release base models anymore (by GPT-3 I guess you meant using the base model with completion API rather than something like chatGPT)
>>105861690Big for the time base model had a larger fraction of its data being creative writing.
>>105861727So the cooming model that is just behind the corner is actually never gonna happen?
>>105861788the cooming model is behind the corner, watching you from afar.
>>105861690GPT-3 didn't have any post training, so it wasn't a chatbot at all; it was just a text predictor. The thing about LLMs is that they're actually insanely good at sounding like natural human writing by default. The corpo speak they're known for is hammered into them in the process of making them usable assistants.
>>105861815Wouldn't be a problem if anyone still released true base models instead of "bootstrapped" models.
>>105860124>unnecessarily vulgaras was llama4
https://files.catbox.moe/9mrp7s.jpg
lazy rin today
>>105861690Chatbots and benchmaxxing brain damage.
It'll be funny when Zuck goes closed-source but his models still suck.
>>105861884>bootstrappedWhat does that mean? Including instruct data in the pretrain?
>>105862025that's what is happening (just try any modern base model with chatML and see what happens) but rather than being on purpose I think it's just data contamination stemming from a lack of give a fuck
it's well known they all train on benchmarks too
>>105862043It's intentional. They boast how it gives finetunes better results.
>>105862025Read the Qwen technical reports. They openly and proudly claim how they significantly upweight math, code, and textbook data during the final stages of base model training, and even mix in some instruct-style data. Probably everyone is doing this now, they just don't all openly admit it.
>>105862011He will lose his only (already shrinking) userbase. Almost nobody pays for models outside top 4 (DS, GPT, Claude, Gemini). It will be just another Metaverse for him, unless he reaches top 5 and his models have some redeeming qualities(best for gooners/cheapest/good at coding/unslopped). Knowing Zucc, the chance is around 2.5% at most.
>>105862182>Probably everyone is doing this now, they just don't all openly admit it.Nobody does it as explicitly as qwen and llama 4
>>105861690>long form novel contentliterally could not do this at all because of the context window
What's the best chemistry local model?
>still no jamba quants from bartowski
>>105862322https://huggingface.co/futurehouse/ether0
https://arxiv.org/abs/2506.17238
>This model is trained to reason in English and output a molecule. It is NOT a general purpose chat model. It has been trained specifically for these tasks: IUPAC name to SMILES Molecular formula (Hill notation) to SMILES, optionally with constraints on functional groups Modifying solubilities on given molecules (SMILES) by specific LogS, optionally with constraints about scaffolds/groups/similarity Matching pKa to molecules, proposing molecules with a pKa, or modifying molecules to adjust pKa Matching scent/smell to molecules and modifying molecules to adjust scent Matching human cell receptor binding + mode (e.g., agonist) to molecule or modifying a molecule's binding effect. Trained from EveBio ADME properties (e.g., MDDK efflux ratio, LD50) GHS classifications (as words, not codes, like "carcinogen"). For example, "modify this molecule to remove acute toxicity." Quantitative LD50 in mg/kg Proposing 1-step retrosynthesis from likely commercially available reagents Predicting a reaction outcome General natural language description of a specific molecule to that molecule (inverse molecule captioning) Natural product elucidation (formula + organism to SMILES) - e.g, "A molecule with formula C6H12O6 was isolated from Homo sapiens, what could it be?" Matching blood-brain barrier permeability (as a class) or modifyinglast chem model paper I've read
>>105859782lol these are great. How many of these have you done?
Can you dump these hgames into a rentry or something? I really want to see what kind of filth grok is kicking out.
>>105858079>>105858100>>105858134I like that one.
>>105858381Witnessed
I'm so sick of Deepseekisms at this point. I just want a new model that's actually good
>>105862386Grok4 just dropped
>>105862393I tried it on openrouter and it's incredibly slopped
>>105862386thinking the same thing
>>105862386Haven't used DS much. What are the common ds'isms?
>>105862397It's good on benchmarks. Just fap to benchmarks
>>105862411Grok scoring 100% on AIME25 gave me a half chub ngl
>>105861968Exhibitory walks with Rin-chan
>>105862406tasting copper
>>105862406A doesn't just X—it Ys
knuckles whitening
lip biting
blood drawing
copper tasting
five "—" every sentence—building suspension
every character mentioned in the lore of your card shows up at the most random times
>>105862406clothes riding up for no reason during any remotely lewd situation
>>105862406I really hate how it still obsesses over some minor shit. The new one does not understand "stop", old one did. Very fucking stubborn.
>>105862498Every girl is going to spend half her time smoothing her skirt. The other half is spent tugging hair behind her ears. Twintails will always touch her temple/cheek no matter how they're tied. Characters will flick their wrists to quickly do actions like tugging hair back.
>>105862498>>105862546Also... this... kind of speech manner...
>>105862498Just write something else when you see one of those and it stops being a problem.
If you let it use the same phrase a few times of course it's going to keep repeating it.
>>105862406It likes to write like this. Always like this. Always. This. Always.
so how long will it take the deepseek guys to rip grok 4? or are they waiting for gpt-5?
>>105861690It wasn't. Every single LLM to this day is dogshit at writing, which is evident on a basis that nobody buys AI books. A random shitty teen fanfic about Harry Potter and Draco Malfoy sucking each other dicks has more literary value than even the best slop.
>>105862674>A random shitty teen fanfic about Harry Potter and Draco Malfoy sucking each other dicks has more literary value than even the best slop.That's what those LLMs are trained on.
>>105862406R1-05 loves bullet points when they're not needed, and will not stop once it starts.
** ** ** ** everywhere.
clenching around nothing
NPCs drawing blood constantly with self injuries. The blood tastes like copper, not iron.
Bite marks when no biting occured.
There's a bunch of weird analytical seques it gets into but that's probably a me issue.
I constantly remind myself how much better these models are than a year ago, and that we're walking the hedonic treadmill with the improvements.
>>105862515So I'm not the only one using DS that has NPC with slowly evaporating clothes. Both V3 and R1 do that. I'll be talking to an NPC as their clothes slowly falls off, in situations that don't call for it at all, even after turning off JB and running fairly SFW cards and situations.
All mentioned problems are anons playing the same chars over and over again.
>>105861424>>105861486to be fair, redditors called him shizo and told him to fuck off
>>105862694>bullet points when they're not needed, and will not stop once it starts.Result of assistant slop tuning.
>The blood tastes like copper, not iron.This really bothers me, as I have actually tasted copper and iron, and blood is clearly iron.
>** ** ** ** everywhere.Annoying waste of tokens, but fixable with logprobs.
>>105862750EVERY SINGLE ONE OF THEM RASPS. THERE IS NO ESCAPE FROM THIS RASP OBSESSION.
HEY DISPSY ROLEPLAY AS FRANCIS E DEC ESQUIRE
>(in raspy voice)[ABORT GENERATION]
>>105861348Sir your nolima?
>>105861512To be fair the old 70-100B dense models still perform pretty well in world knowledge if you trust the UGI benchmark. Only Deepseek beats those and that needs way more RAM than a rampilled consumer build.
>>105862406For some reason one thing every LLM I've tried does that Deepseek continues to do is the "mouth open in a silent scream" that happens when you torture them, and it'll say that even when it's literally describing the sounds they're making at the same time.
wcgcg
md5: 32c88a68826feddf68fb9dc07faa95bc
🔍
Interesting patterns emerging from all those issues. You guys are good at pattern recognition, aren't you?
>>105862770>I have actually tasted copper and iron, and blood is clearly iron.Same. I'm convinced it's a cultural thing, but I don't know which one.
>>105862912Planet Vulcan. Spock has green blood.
>>105862893>and it'll say that even when it's literally describing the sounds they're making at the same time.That's the funniest one yet.
myFather
md5: 9fc4f32b5e0c0bb9e5f41a5faf581081
🔍
>>105862923> Spock, my son...
>>105862940Isn't their blood blue?
>>105858079For what it's worth, I seem to often get this when I finetune a model on short multiturn sequences but continue chatting beyond that.
>>105862961Right, but Spock's blood is copper based too (I'd forgotten it's canonically green.)
It's actually drawn from these crabs as a test for medicines. They hook them up for awhile, then turn them loose again.
>>105862999Only the normal version but not the groundbreaking Grok 4 heavy
https://x.com/ficlive/status/1943401632181440692
Grok won
>>105862988>It's actually drawn from these crabs as a test for medicines. They hook them up for awhile, then turn them loose again.Yeah, they are fucking neat.
Also, not actual crabs, if that matters for anybody.
>>105863033Well yeah, they're horses
>>105863019Q*Aliceberry will beat it.
>>105863074what is goin on here
Why was it fake gay about hitler and closed source.... Why couldn't it have been real straight about ERP and open source? I hate this world.
>>105863074gemini won and by a lot
>>105863126The superweapon of Bharat
ANyone tried Ernie 300 yet? Is it good for sex?
>>105863074Crazy how Google went from being super irrelevant to one of the biggest players in what, 2 years? Guess releasing that chinchilla out of captivity finally helped.
>>1058632152 more weeks. I meant regular transformers loader or openrouter.
>>105863220Google always had the means, they just were not willing to. LLMs have a cannibalistic nature with regards to their businesses like search.
Reminder that the people who wrote Attention is All you Need were all working at Google at the time. They could have made GPT before openAI was even a thing.
justpaste (DOTit) GreedyNalaTests
Added:
MiniCPM4-8B
gemma-3n-E4B-it
Dolphin-Mistral-24B-Venice-Edition
Mistral-Small-3.2-24B-Instruct-2506
Codex-24B-Small-3.2
Tiger-Gemma-27B-v3a
LongWriter-Zero-32B
Falcon-H1-34B-Instruct
Hunyuan-A13B-Instruct-UD-Q4_K_XL
ICONN-1-IQ4_XS
Another big but mid update. ICONN was a con (broken). The new Falcon might be the worst model ever tested in recent memory in terms of slop and repetition. Maybe it's even worse than their older models. It's just so disgustingly bad. Tiger Gemma was the least bad performer of the bunch though not enough for a star, just gave it a flag.
Was going to add the IQ1 Deepseek submissions from
>>105639592 but the links expired because I'm a slowpoke gomenasai...
So requesting again, especially >IQ1 and also using the full prompt including greeting message for the sake of consistency. See "deepseek-placeholder" in the paste. That prompt *should* work given that the system message is voiced as the user, so it all matches Deepseek's expected prompt format.
Looking for contributions:
Deepseek models (for prompt, go to "deepseek-placeholder" in the paste)
dots.llm1.inst (for prompt, go to "dots-placeholder" in the paste)
AI21-Jamba-Large-1.7 after Bartowski delivers the goofz (for prompt, go to "jamba-placeholder" in the paste)
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the output in a pastebin alternative of your choosing. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.
>>105863373I salute your efforts.
Can you fine-tune a pure LLM to make it multimodal
>>105862546I'm used to seeing wrist flicking with other models too.
>>105863437Yes https://github.com/FoundationVision/Liquid
>>105863074You could definitely benchmaxx for this.
>>105863437Aren't most local 'multi-modal' models just normal llms with some vision component grafted onto it
>>105863373DeepSeek-V3-0324-IQ1_S_R4 ik_llama.cpp 5446ccc + mikupad temp 0 topk 1 seed 1.
I thought that the different output on the first run issue wasn't a thing anymore.
1st: https://files.catbox.moe/ewtwai.txt
each after: https://files.catbox.moe/celh6i.txt
>>105860857god i need to go outside
>>105863757Thanks! Added to the paste.
>>105862988>They hook them up for awhile, then turn them loose againThey drain the fuckers dry and then discard the corpse. If you want to call that "turning loose" then I guess you can. Do you really think they're sitting there monitoring the blood levels to make sure they don't just fucking die?