rl
md5: 87b6af5f924569e96cde2160edc635dc
๐
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105769835 &
>>105757131โบNews
>(07/02) GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking>(07/01) Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-InstructโบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
file
md5: dd04bd35f4fd07e3ce99be67a676f0d9
๐
โบRecent Highlights from the Previous Thread:
>>105769835--Paper: GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning:
>105772556 >105772620 >105772636 >105772751 >105772756 >105772781--Troubleshooting and optimizing ik_llama's GPU usage during prompt processing:
>105770658 >105770671 >105770697 >105770737 >105770774 >105770793 >105770836 >105770804 >105771477 >105770681 >105770709 >105770714 >105770742 >105770812 >105770857--SciArena human expert benchmark ranks Qwen and o3 highly, exposes Mistral and Llama weaknesses in STEM tasks:
>105774179 >105774206 >105774242 >105774302 >105774324 >105774248 >105774390 >105774628--MoGE's performance improvements questioned due to inconsistent benchmarking practices:
>105770488 >105770519--Open-source intermediate thinking AI model with dynamic reasoning:
>105775016 >105775085 >105775355--Running large models on systems with low RAM: workarounds and limitations:
>105770034 >105770065 >105770068 >105770076 >105770097 >105770144 >105770125--Hunyuan model loading issues and emotional reflections on LLM attachment:
>105776297 >105776327 >105776340--Speculation over model benchmark optimization via LMSys data and synthetic training:
>105775790 >105775948 >105776008 >105776027 >105776123 >105776163 >105776235 >105776270--Small language model unexpectedly generates functional HTML/CSS for professional webpage design:
>105772836 >105772844 >105773088 >105773112--Legal concerns over Meta's LLM court win and its impact on fair use doctrine:
>105770731 >105770759 >105770912--Critique of verbose AI roleplay models and the importance of concise prompt design:
>105771117 >105774637--Links:
>105771000 >105775990 >105773059 >105774668--Miku (free space):
>105770389 >105772534 >105772539 >105773374 >105773484 >105775061 >105777681โบRecent Highlight Posts from the Previous Thread:
>>105769843Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>105778404The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
what's a good 1-3B parameter uncensored model?
Anything beating R1 for local, general tasks?
>>105778488I can't imagine any exist.
I suppose you could try llama 3.2 or qwen 2.5 with a prefill or something.
>>105778510Minimax maybe, but probably not.
how do I get my fucking LLM to stop getting naked in a single sentence? I want every single article of clothing to have an entire fucking paragraph
Imagine getting a private dance in a strip club, the stripper just rips off her entire outfit like they're breakaway clothes and sticks out her hand "That'll be $100"
70b q8 is the bare minimum for mediocre rp
>>105778545Example messages and/or just good old low depth instructions.
>>105778404The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105778545They do indeed like to compact what i'd like to be 20 messages into 1. System prompts don't help
>>105778557fact check: true
>>105778545That's so fucking hot.
>>105778575I don't know if it's something about the training, the shitty context performance, or the fact that we mostly use quanted models, but sys prompts by and large seem to not do that much for specific things, which is why models tend to deviate from the character card so quickly too I imagine.
Low depth instructions, tags, and the like seem to help a lot.
>he's at it again
What triggered him now?
>>105778629jannyfaggot. you can thank him for your renewed blacked miku subscription
He makes you mad and that's a good thing.
>>105778629I don't mind it, porn is better than mediocre benchmaxxed model releases anyway
is qwen3-30-a3b still the best usable model for 8gb vramlets with decent ram?
>>105778650Personally I like the benchmaxxing drama.
Somebody's mad. Have a wholesome miku
https://files.catbox.moe/95axh6.jpg
>>105778656For general stuff? Probably.
For coom? It's lackluster from the little I tested it.
>>105778656Yes. It is that good for general use
>>105778656For everything except RP. I hope gooning to math is your thing
>>105778488gemma 3n is amazing and pretty much the only option right now. Will be cool to see if anyone sloptunes it.
>>105778721Isn't that an 8b model?
So <|extra_4|> is the system end token and <|extra_0|> the user end token? For Hunyuan.
>>105778744its like 4-5gb. Is anything smaller even remotely usable for any purpose?
>>105778567https://rentry.co/bxa9go2o
>>105778769nta but i wanna know this too
what is the best 70b model for cuck erotica?
>>105778771Oh fuck. I didn't see that there was a 6-ish B version in addition to the 8ish B model.
Neat.
Still a little larger than what anon requested but probably worth a try.
>>105778404>>105778663The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105778545Ask them to strip, piece by piece
>>105778810eva for cuck erotica, llama 3.1 for cucked erotica
and just like that local was saved
llama : initial Mamba-2 support (#9126) 13 minutes ago
https://github.com/ggml-org/llama.cpp/commit/5d46babdc2d4675d96ebcf23cac098a02f0d30cc
>>105778981Which model can I use with that?
>>105778981Oh shit, that means Jamba soon, probably.
Maybe.
I hope.
>>105778545The problem is that they're being trained on outputting single response benchmark-tier answers, e.g. "write me a short story about a banana that talks like a pirate", so the models have to try and fit everything in a single response. This is also why a lot of models fall apart later in the context or after a number of messages. In other words, often times you can't really prompt it away because the model hasn't been trained on long-form/back and forth responses
>>105778981>initialWhat about the rest of it?
>>105778545use mikupad and never look back
>>105779038Tell that to Mistral and that schizo anon who says that multi-turn conversations are a meme.
>>105779064This. Chat templates are a meme. Just add \n to your stopping sequence.
>>105779082I think both have validity, existing models are trained seemingly for 1-2-3 turn convos, and as a result, NoAss extension(ST, compress all prior messages into one) and similar are useful for handling this issue, but it is an issue, and is something that's self-inflicted by the model makers.
I blame LMArena for it.
>>105779038I wonder if that means these models would perform better if we always sent the context with a sys prompt and a user message containing the whole chat history or some similar arrangement where the model is always responding to a single user turn and the rest of the history is somewhere in the context (sys prompt, user message, whatever).
>>105779124Look into aicg's Noass thing, it does basically that.
>>105779119I think the main reason is that single-turn datasets are just so much easier to craft/generate and work for most normies' use-cases.
GUYS
https://huggingface.co/openai/gpt-p1
>>105779064>>105779095What are you doing to miku to make it function well? From what little I messed with it, it just seems like a KoboldClient UI with even less features
>>105779176So that is a thing.
Neat.
Anybody here tried that with the usual open weight models?
How did it perform?
>>105779216The idea is you write something and let the LLM just continue from there. Put descriptions, setting, etc. at the top of the prompt then separate it from the story part. You can put detailed descriptions in the prompt or just general tags and let the LLM take it from there. Something like picrel.
>>105778810Just use R1 API
(You)
md5: aef45774e717750f38849be552d7d823
๐
>>105779401Can I get a QRD on the iraqis?
>>105779431stop being brown
Hunyuan is retarded with the current PR.
>>105778545>I want every single article of clothing to have an entire fucking paragraphTell it to do that.
My main issue with clothes is the LLM forgetting what the NPC is wearing. I always have a tracker block for that reason alone.
>>105778575Aggressive system prompts also do not help with this rushing the output.
>>105779064Or this. I still use ST more but Mikupad's unique and its own thing.
>>105779467>Still trying to shit up the thread after two years>Failing this hard
>>105779401>>105779467Accusing everyone around you being brown is inherent brownie behavior.
>>105779471Did you try
>https://huggingface.co/tencent/Hunyuan-A13B-PretrainAccording to ggerg, it has orders of magnitude lower PPL.
Seems like the thing is broken big time. Might have something to do with their funky router algorithm or something like that.
seething browns lmao israel won btw
>>105779401ashadu anla la ilaha ill'allah!
wa-ashadu anna muhammedan rassulullah!
>>105779552>le Dalit/Brahmin I hate Indians so much.
Do you guys have any tips for LORA training? Anything you've learned that improves "quality" whatever that may mean for you.
>>105779552jeets being jeets
>>105779552>Dalit are to use languages that are at or near native performance and keep the place clean>Brahmins and Kshatriyas are script shitters who burn ten times or more the processor cycles to limp through interpretation while yelling at other people to clean up their own messesIt's a valid reflection of status in all businesses and societies.
>>105779643Mostly don't waste your time. There's better ways to get LLM performance through prompting and RAGs.
>>105778996https://github.com/ggml-org/llama.cpp/issues/13275
>>105779643Finetuning models mostly make it behave as you intended in specific tasks (like tool usages for agents) but you can't really improve the "quality". You can't help it if the base model architecture sucks.
Anybody here successfully locally using Augmentoolkit, or is it flaming garbage as it seems from a quick check as of the latest version (and as it seemed last time I checked it out)?
>>105779602>Dalit/BrahminSame shit
>>105778402you will never be straight
Jamba gguf support status?
>>105779938Same as before but the man the myth the legend has woken up from his hibernation :
>>105778981
>>105778981>What to expect>However, a big downside right now with recurrent models in llama.cpp is the lack of state rollback (which is implemented through state checkpoints in #7531, but needs to be re-adapted to #8526), so the prompt will be reprocessed a lot if using llama-server. I think using llama-cli in conversation mode does not have this problem, however (or maybe only the bare interactive mode with --in-prefix and --in-suffix, not sure).>This initial implementation is CPU-only, but uses SIMD for the SSM scan, so even though the state is bigger than for Mamba-1 models, in my tests, the speed of Mamba2-130M is similar or better than Mamba-130M (but still not that fast compared to transformer-based models with an empty context), when both are run on CPU.>The speed of Mamba-2 models seems comparable to Transformer-based models when the latter have 2k to 4k tokens in their context.1/2
Summary of changes
Add support for Mamba2ForCausalLM (including the official Mamba-2 models, and Mamba-Codestral-7B-v0.1)
Note that config.json needs to contain "architectures": ["Mamba2ForCausalLM"], for the convert script to properly detect the architecture.
View Mamba-1 as having d_inner (aka 2 * n_embd) heads of size 1.
This simplifies the handling of shapes in ggml_ssm_scan
ggml
Implement Mamba-2's selective state update in ggml_ssm_scan.
Re-using the same operator as Mamba-1, because it's pretty much the same operation. (except for how ssm_a is broadcast)
Fuse the operation with ssm_d into ggml_ssm_scan
Otherwise it would need to be transposed, because the dot-products are done head-wise.
Implement Mamba-2's SSM scan with GGML_SIMD.
This is possible because there is no element-wise expf in the state update unlike with Mamba-1.
Avoid state copies for the SSM state (both for Mamba-1 and Mamba-2) by passing state ids to ggml_ssm_scan.
Mamba-2 states are huge. Otherwise masking and copying took close to 10% of the CPU time according to perf.
2/2
>>105778981>llama : initial Mamba-2 support (#9126) 13 minutes agoit's been a while I didn't lurk this place, is this another meme? I remember mamba and it never went succesfull lol
Nvm, I can't read.
>I've tested most things I wasn't completely sure about (CUDA, SVE), and inference on those platform does seem to work properly for both Mamba-1 and Mamba-2 models, with -ngl 0 and -ngl 99 (and it looks like 0b6f6be also fixes RWKV inference when compiled with SVE on a c7g AWS instance).
>Weird small models like https://huggingface.co/delphi-suite/v0-mamba-100k seem to work even when compiled with -DGGML_CUDA=ON since 71bef66 (it failed with an assert previously, but ran correctly in a CPU-only build).
So this means we get mamba locally with GPU support.
>>105780149Replace communism with bitnet.
the similarity with communism doesn't end there
bitnet like communism inspires the poor with no ambition of elevating themselves by making them believe that the trash pile they own may someday run a good model, just like how communism makes the poor believe daddy government will take care of them even if they make nothing of themselves
no, you will never run chatGPT on your poorfag GPU
>>105780228It would work if people weren't fucking retarded
But in that case capitalism would too
>>105779552Maybe it was just a joke about levels of abstractions and the dumb d*lit didn't get it
>>105780228but current local is better than old 3.5 turbo. It just depends on which moving goalpost you use as "chatgpt" and how long you wait for tech.
>>105779552I don't know what those names mean, but I get it's infighting, so that's nice.
>>105780458we need better than gpt5 (coming soon) in under 8b
>>105779812My experience:
>The installation downloaded a few gigabytes of unrequested libraries and started compiling stuff. Huh?>The documentation is annoyingly full of fluff/bullshit and explains nothing concisely>Seemingly no ready-to-run example for local data generation (assumes API access and downloading things on-the-fly)>Video tutorials (bad sign) from the author don't help with that either>It wants the user to download the author's dataset-generation Mistral-7B finetune if you're not using API-only models (no thanks)>Apparently uses wrong/no chat templates for its augmentation process, just free-form prompting. Surely that will yield good results?>The built-in data pipelines are full of slop>It really can't just connect to an OAI-compatible server and generate, can it?>Why is it also dealing with training and inference? Just fucking generate the data>The author is seemingly working on commercial solutions which probably explains why the whole thing feels deliberately obfuscatedVerdict: waste of time. I'll roll out my own.
>>105778663Holding hands with Miku
>>105780570>Apparently uses wrong/no chat templates for its augmentation process, just free-form prompting.What the fuck.
I guess if they are using base models for completion, but that wouldn't make sense now would it.
What the fuck?
>>105778400 (OP)Are there any FOSS gen music models on Huggingface or similar?
Tired of dealing with SUNO restrictions.
>>105778404>>105778663>>105780674https://files.catbox.moe/ftq6qc.png
>>105779552>the most plebean language is reserved for the "upper caste"I will never understand i*dians
>>105780820It's reserved for middle-managers so they can "prototype" quickly. Then some unfortunate one needs to implement the same thing in C++.
>>105780705The previous version was like that and I don't see entries for configuring and ensuring the correct prompting format in the new one. The whole project is an overengineered yet poorly functional clusterfuck in my humble opinion, I don't care if it works for whoever contributed to it.
>>105779552I always wondered how much was this caste shit still a thing among urbanized Indians.
Seems like they still live the retard dream huh.
>>105779552If their entire society agrees that castes exist, then how is discrimination based on caste bad?
>>105780800ACE Step is pretty good for ideas and prototypes, but the output is definitely not "studio quality"
It can be a lot of fun for messing around making meme songs with your buddies, kinda like playing QWOP was a good time back in the day
>>105781010Because the top four castes get to agree that YOU are in the fifth.
>>105781033ACE-Step is only good for lyrics edits.
https://vocaroo.com/11M5Ft5ahPzp
>>105780996It's all a nightmare that we can't wake up from. I wish I didn't know about any of this stupid poop skin drama.
Remind me why I should care about Mamba.
>>105781141compute scales linear with context instead of quadratic. So substantially less fall-off in inference speed/power efficiency with context length scaling. Meaning that devices with higher core counts (and thus larger dies) aren't favored as much as just putting more fucking VRAM on consumer tier devices.
>>105778400 (OP)> GLM-4.1V-9B-Thinking releasedAider benchmark?
>>105778557Using Mistral Small 24B Q8_0 (for most of it used an exl2 that claimed to be 8.0bpw but was probably 6.0bpw or something) I had a coherent adventure game experience up to 19k tokens before it fell apart. Log at
>>102543451
>people are still unironically using shitstral models
>>105781361*using Mistral Small 22B
>>105781361I wonder if the multiple-choice format of your answers leads to extra tokens being processed, which might lead to incoherency. ive had mistral make sense beyond 24k tokens
>>105781544picrel
>>105781600keep improving
>>105780126Other than being used by Gemini you mean
multimodal is a giant meme
Gonna ask a question you guys likely see a million times a day, but can anyone recommend a (solid) RP LLM for 16GB VRAM? I'm completely new to this shit and all the information I can find seems horribly outdated or just wrong.
>>105782228Magistral-Small-2506_Q8_0.gguf
>>105782245That was fast. Many thanks, I appreciate the guidance.
Availability fluctuations = new model release soon
You heard it here first
https://zzzzzzz.grafana.net/public-dashboards/88296a8e74c14dae8f839c2b9973214b
>>105782256The model/quantization the retard above suggested won't fit into your 16 GB of VRAM and it's not too great for RP either.
Turning mmap off boosted my generation speed from 1 T/s to 8 T/s...
>Claude thinks Anthropic's logo is a whale
Worst hallucination I've ever seen
>>105782560I always turn that shit off.
Tranny shart code.
>>105782611>>105782617It's already on LMArena and it's called Steve.
Chinchilla Llama Hyena Mamba Pajama Orca Falcon Dolphin Thinking Reflection CoT 0.68bit RetNet 10e100 context AGI ASI Vibe-prompting Hybrid MoE Flash 1000x1000x1000T Preview Vision Hearing Tasting Music Video Omnimodal Omnilingual Pre-trained Post-trained After-tuned Agentic Self-improving Dev GGUF imatrix FP64 v3.45-54f.0 Writer MAXI 1776 (Fixed)
>>105782637>(Note: [...])ominous
>>105782637>you are whale>k>you are whale>k>what do you look like?>whaleFucking retard
>>105782704claude shill big mad
I'm trying to get Latex output to render correctly in Koboldcpp, and it's not recognizing latex equations consisting of a single letter (for example: $G$)
Is there a known fix for this or do I have to fix it myself?
>>105782637what's with the deadpan (narration)
any chance local will ever recover?
>>105782637They should cut costs and put you there too. Llama3 1B would have some competition
>>105782406Well, shit. Do you have any suggestions, then?
any chance baits will ever stop?
>>105782713word sentence complete not
>>105782637yeah let me go erp with steve
>>105782721>>105782696Might be a Chinese only thing (I prompted in Chinese "ไฝ ๅฅฝ้ชๅๅฐ้ฒธ้ฑผ")
>>105782721LMarena cheat code
>>105782727The fucking lazy guide says mistral nemo 12b. It's still mistral nemo 12b. It's been mistral nemo 12b for a year.
Can't read the fucking lazy guide and we have to keep feeding people like you.
Go get mistral nemo 12b or a finetune. Any.
I've been out of the loop for a few months, is local saved?
>>105782754How the fuck am I supposed to know that? Shit moves fast enough in this industry I'd expect something more recent than whatever the fuck has been posted in the general's same links for a year. Fucking retard.
>>105782761sam's about to drop his open model that'll save local
>>105782754Mistral shill should get gassed right after we gas Claude shills
>>105782778You are correct, most of the recommendations in the OP are more than 2-3 years old and are useless
>>105782778Lurk for 20 minutes, check the archives. That's how.
>>105782794You have no argument against this
>>105782793
>>105782789>Mistral shillSuggest something better for poor anon.
>Claude shillsYou're a whale.
>>105782797Fine. Get something better to run on 16gb vram and report back. I'll wait.
>>105782794See, my hope is that like any community, you can ask a simple fucking question and get an answer from someone who's not a complete chucklefuck like yourself. "Check the archive" where I can dig through a bunch of nothing for three straight hours, which option makes more sense to you? Are you fucking stupid?
OP
md5: bdc89921680d1d4d7e215960028f3b32
๐
>>105782817Please ignore the resident threadshitter, it's not past his bedtime yet and he hasn't dilated yet today
>>105782761steve will save us
(Note: it will not)
>>105782770It's 126 days until November 5th, Miku
>>105782817>wanna gen text>can't read
Which model can give me an oiled footjob
>>105782853This is a lmg requirement though so he'll fit right in.
>>105782897jepa, if you're willing to risk it.
>>105782761No, we still avatarfagging with trans-coded characters here.
>>105783003Miku will always be /lmg/'s mascot
Get over it, Jart
>>105783003is miku trans-coded? Since when?
>>105783063Since I said so okay?!
9
md5: cb082339f57be2af9340ad9748799df0
๐
>>105783063This is OFFICIAL hatsune miku x pokemon crossover art from 2023. Sickening...
Nigga you're literally using a technology called trannyformers
You don't get to trannyshame Hatsune Miku
I'm directing all of my cursed energy towards whoever originally came up with "tuning models to output markdown", not because this makes them worse at writing, but because this makes them fucking disgusting as assistants.
>>105778400 (OP) Check this out me dudes: https://www.reddit.com/r/ChatGPTPromptGenius/comments/1lq7zrv/google_just_launched_a_global_hackathon_with_a/
>>105783383Satan is my motor
Steve will save local (it will)
https://huggingface.co/unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit
Why is it so big if its 4b and 4bit?
I dont really get it.
The MLX quants are under 4gb.
https://huggingface.co/mlx-community/gemma-3n-E4B-it-lm-4bit/tree/main
>>105783683>https://huggingface.co/unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit>Model size7.85B params
>https://huggingface.co/mlx-community/gemma-3n-E4B-it-lm-4bit/tree/main>Model size1.07B params
Gee I dunno
>>105783683Unsloth are hacks.
>>105783720this. reddit tier noobs.
>>105783714So the mlx quants are fucked?
That size does look closer though in my opinion. If its 4bit of a 8b model.
Also I can't believe ollama does not support audio in and image in yet. Whats the whole point of their "in house engine" then?
Google offering ollama price money for it here
>>105783341 but no support.
Also kek at the unsloth fags writing "only text supported for this model". Yeah...in ollama/llama.cpp! kek
The model is really good with japanese. Crazy for that size. I wanted to try make a bot for my kids to speak to. Damn it.
>>105783720I don't get how they are hyped everywhere.
Forgot the model, maybe it was mistral. But they quants where totally tarded. Redownloaded good ol' bartowski and everything was fine.
I'm sure that was no coincidence then.
>>105783762seeing faggots like unsloth and thedrummer spamming their shit on reddit is an easy red flag. instant skip.
>great chink models dropping every week
>can't use any of them because they're stuck in llamacpp PR hell
pain
>>105783874hm maybe those chink model makers should stop implementing special shit if they want people to actually use their models
>>105783924Yeah, just give us the same slop with even more scaleai data.
>decide to try LLMs after genning enough cute anime girls for the time being
>follow stupid bitch guide to textgen
>well, 11gb on my 2080ti was good enough, right?
>well, it says exl2 is way better for VRAM than gptq
>1.3t/s
>>105778934Isn't anything with quantisation aware training good enough?
What's the minimum sized model for somewhat accurate image to text completion?
i want to send my chatbots images and upgrade from the over a year old mixtral 7x8 model i have been using, but the newish 7-12b models i have tried either crash when i try to launch them or thinks
>image related
is somewhere between a pancake breakfast, exotic butterfly or a man resting on a bed.
but cant tell that its a cat.
i dread what bullshit these models might try to pull if anything explicit is sent their way, is small image to text models just useless and i need to shill out for more vram? (tried both local and multimodal in ST)
>>105784147NTA, but QAT is nowhere near as good as advertised from the little evidence we have, as far as I can tell.
That said,
>>105780174.
So, who knows which is the better approach.
Hell, it could be that the one approach that aligns current software and hardware for both training and inference is deeper models trained in FP 4.
>>105784125Try llama.cpp. Once upon a time exl2 was blazing fast in comparison, nowdays they are nearly evenly matched from what I hear.
>>105782745hey anon just wanted to let you know that your post made me giggle
>>105784298Good night Miku
>>105784125Well you did something wrong, 1.3 t/s is not normal for exllama. Personally I never followed any guide and just learned things by lurking, as well as reading github documentation.
For vramlets the modern stack is usually Llama.cpp (maybe exllama but I rarely see people use it anymore), specifically the server executable that's provided on the github windows cuda release (though you should use Linux for better performance), or compiled yourself, with a model like Mistral Nemo 12B quantized (you can also try RP finetunes of it like Rocinante which gets mentioned in threads, though that might be the author shilling it). The server provides a local API connection. Connect it to frontends like SillyTavern for RPshit, Mikupad for generic text prediction, and OpenWebUI for ChatGPT-like assistantshit (both ST and OWUI are kind of shit and very bloated, but there's no better alternatives it seems). You can also use free online APIs to test models by using OpenRouter, just note they likely keep your logs even if they say they don't.
Small models like Nemo are garbage btw but if you must try something then that's the way. Don't try big models like Claude/Gemini, or Deepseek R1 through OpenRouter, if you want to prevent yourself from feeling bad going back down to shitty small models.
Also here's a run command I'd usually use, adjust as needed.
pathToServer.exe -m "pathToModel.gguf" --port 8080 --no-webui -c 8000 --no-mmap -ngl 999 -fa
Self explanatory mostly. -c is the context length, adjust as needed. The --no-mmap option disables a stupid default that normally almost always slows you down, so just use it. -ngl is how many layers to offload to the GPU, adjust as needed. -fa is a feature that will almost always work to make models (like Nemo) faster and use less VRAM with no downside so keep that, exception being certain model architectures that it might not work with.
>>105784298Night terrors with miku
>>105783003>still pouring every effort into whining about miku after two yearsliteral mental illness
What's the current best model if I want to translate japanese moonrunes into english text for subs? And what webui do I use? I have a 3090 with 24GB VRAM
>>105784530anything and anything. Most models do decent translations. Aya maybe?
>>105783112>>105783204>be local threadshitter>trannies and hatsune miku live in my schizoid head rent free>Post more mikus in thread>"Heh, that'll show 'em">Still not taking my meds
>>105784575>Ayahttps://huggingface.co/CohereLabs/aya-101
This?
>>105784596I'm trying this with https://github.com/oobabooga/text-generation-webui
But the UI is screaming at me to add a GGUF model? Why would I add a different model I'm not going to use?
Or can that specific webui not handle non GGUF models?
>>105784596>Cohere>decent modelIs this bait?
>>105784680I'm not the one baiting if that's the case, I have no idea what model is good or not, blame this anon
>>105784575Recommend me a good model then if aya is shit
>>105784790At what quantization? Q8? Q6? Q4?
>>105782897Literally all of them.
>>105784805It suffers from quantization more than most models so it's advised to go as high as you can unless you want it to be faster, since in your case you'd need to split the model to RAM above Q4.
>>105784917I just tried Q4 and it seems to offload to RAM or something? It's hammering my ryzen 9950X3D and is ultra slow. It says estimated VRAM usage is 12958 MiB but the webui seems to be already using 12GB or so.
And it's extra painful when it has to process everything twice due to the "thinking" it does.
But if it suffers from quantization more than other models, wouldn't other models perform better?
>>105784805q8 if you have the RAM
>>105784933Guess I had too much shit open in background that used up VRAM, can run Q4 just fine now.
>>105785112I have 96GB RAM, but it's so painfully slow, like 1/10th the speed when it runs on the CPU, which will happen above Q4
>>105778400 (OP)What's the best local chatbot for median spec PC right now? There's some personal stuff I don't feel like sharing with an online service.
>>105784933I don't know if it performs better than other models at translation. I personally just settled on using Gemma 3 for tl tasks and haven't looked elsewhere. If it does perform better, then it is likely at Q8. At Q4, Gemma may be better. I don't know if Aya is good or bad.
Wow didn't see how slow this general was. I'll make a thread and pollute the catalog instead.
>>105785147Read the recent posts.
>>105785147>no specs>no use case
>>105785710>chatbot for median spec PCNot really precise, but I see a use case and specs there.
>>105785146full precision and q8 are worthless and are generally only available for training and experimental purposes. Don't make the mistake of thinking "but I want it to be smarter"
Because the ram spent on higher quants can instead be spent running a higer param model, or more context, which is objectively better.
The higher quants like q8 and f16 shine best at very long context lengths- but there's a huge issue with that: Models get really fucking dumb at extreme context lengths and making it go from answering wrong 12% of the time to 11.5% of the time is never going to knock your socks off.
Odd question, but what would it take to get a character that gaslights me into being the LLM he's using to coom?
what is the consensus on quantization bits?
what is more important to speed and quality (with CPU only), quantization bits (Qx) or params (yB)?
>>105786289model memorizes cliche phrases instead of learning storytelling
>>105778400 (OP)is llama3.1-8b-abliterated still the best fully uncensored model for roleplay? nobody is making proper uncensored fine-tunes anymore
i tried the huihui_ai ones and they were garbage
>>105786289Ghetto finetunes made by training on shitty uncurated mystery erotica datasets and merging other finetunes together until the model becomes horny
An age old /lmg/ tradition, typically done by amateurs who don't know much about AI/ML and are just trying shit. There used to be a lot more sloptuners in the llama2 era
>>105786170shisa qwen 2.5 32b at q8 seems to translate better than shisa llama 3.3 70b at q4, when feeding in 12k tokens
migu
md5: 3231f4b88b2fc220d93a69e51f602ebd
๐
Post medium size (70b to 123b) model recommendations for ramlets like me who have no hope of running larger models like deepseek R1
In return I'll start with my list of models I have been using for the past few months, including smaller ones
>EVA LLaMA 3.33 v0.0 70b Q5_K_M (best)
>Luminum v0.1 123b Q3_K_M (second best)
>LLaMA 3.3 70b Instruct Q5_K_M (original model, no finetune)
>Skyfall 36b Q6_K (smart for size)
Honorable mentions:
>Mistral Nemo Instruct 12b Q8_0 (smart for size and uncensored when jailbroken, but too small)
>Gemma 3 27b Q6_K (great writing style, easy to jailbreak, but it's just too averse to vulgar language and schizos if you push it too hard)
>Cydonia 24b Q6_K (too retarded)
>Cogito v1 36b IQ4_NL (too schizophrenic)
>L3.3 TRP BASE 80 70b Q6_K (too schizophrenic)
file
md5: 9716730b5a72d30990c06c9899699be7
๐
How do I get this model to stop yapping so much
>Give 1 line response
>get 5+ paragraphs in response badgering me to continue without actually stopping to let me
>>105786451that's probably more due to the base models than the quant level
This is not a meme. I remember kaiokendev.
When will a LLM be capable of emulating a successful Everquest raid group capable of destroying Nagafen with turn by turn calculations, given the appropriate inserted code?
I want honest answers only. If nobody left on this board is a real nerd, then all of you are morally worth nothing,
>>105786276General rule is to never go sub Q4. I never really noticed a difference in RP between 5_k_m and q6/q8. But didnt try coding with local. Might be more sensitive.
Only go lower if you can load up a huge model like deepseek.
70b q3 models for example can't even properly follow formats anymore. At least that was the case for me.
So short answer is if you want the best speed for quality its 4_k_m.
>>105786518It can't even beat pokemon
>>105786473Edit their output.
After a few time it should get the idea.
>>105786518Why the fuck would you want an LLM to do that when botting software has been doing it for over a decade
You're like the retards that demand full raytracing just to get the same playable results as traditional baked lighting with dynamic spots.
>>105786465>MarsupialAI/Monstral-123B-v2 Q5KM>zerofata/L3.3-GeneticLemonade-Unleashed-v3-70B Q8>google/gemma-3-27b-it BF16, for tool calling>Steelskull/L3.3-Shakudo-70b + Steelskull/L3.3-Electra-R1-70b Q8>sophosympatheia/Strawberrylemonade-70B-v1.2 Q8>Qwen/Qwen3-235B-A22B-GGUF Q2K, messing around
>>105786600Thanks for the list anon
>google/gemma-3-27b-it BF16, for tool callingI used to use gemma 2 for faux tool calling (even though it wasn't officially supported) trying to make an RPG with an inventory system and skill checks when SillyTavern first added scripting, with the model acting as a GM and instructed on how/when to use the tools. But it wasn't smart enough.
I haven't tried anything like that since then since it was an ordeal to get it a state where it would've been usable and it was all for nothing since the model was too tarded. Do you think this kind of thing would work nowadays?
>>105786379People getting into it with the main purpose of becoming a "personality" and getting donations or other benefits that way is also a huge factor in sloptunes being slop. They just seem to be made by obnoxious people who strive to minimize efforts and maximize earnings, and who you'll end up seeing everywhere doing self-promotion and/or engaging in clickbaity practices. Gone are the days when people only did it for fun or because they genuinely wanted to contribute something useful (and since most of the time they don't even publish methods and data because of competitive reasons, everybody has to reinvent the wheel every time).
>buy new gpu
>all my workflows broken thanks to obscure 50 series bugs
t-thanks nvidia
>>105786759upgrade to torch 2.7.1 with cuda 12.8
>>105786811Doing that. One of my program was in Tensorflow 2.10 and it finally broke
I'm more inclined to blame python here. It's retarded python libs that break on new versions which you need for new hardware support.
>>105786996Always blame Nvidia. That's been my experience working with CUDA for a decade.
if you think cuda isn't good go and write software with rocm
>>105787122>can't criticize something if you're using itgoyim mentality
>>105786670>RPG with an inventory system and skill checks>Do you think this kind of thing would work nowadays?I'm testing something similar. What I'm currently doing:
>send in {{user}}'s prompt $p to my backend (ST to my backend)>backend proxies $p to gemma>add the tool call result/narrative update to $p (if applicable)>proxy $p to the rp llm (70b), stream tg to STPic related, the current tools.
>>105786996Blaming python is cope for people who can't into virtualenv.
This fucker draws 600 niggawatts
>>105782383i wanted to make a post about that too many hours ago but eh dident bother in the end by the will of buddha jesus jahweh yakub yaldabeoth lucifer hermes odin etc etc may we get img out
>>105787135That's really interesting anon.
My problem was ultimately that the few functions I had (like LoadLevel, AddItemToInventory, AbilityCheck, etc.) were obviously state-altering, with the LLM's job being to intelligently alter the game state using the functions, but this ended up backfiring a lot. The model wasn't 100% perfect at calling the functions, but even if it was, so many edge cases in player input led to infinite complications requiring more functions or parameters, and the solution every time ended up being to simplify and let the LLM manage systems directly instead (like character cards where the AI updates its own stats without any code). I ended up having to scrap a lot of code I spent hundreds of hours working on and then I lost motivation.
Based on your screenshot it seems like you've doubled down on the function calling for every little thing, so I can only wish you the best of luck.
>>105786473.\n, \n, and/or \n\n as a stop token or string or whatever. Depends on what your model normally outputs. Or increase the logit bias for whatever their normal eos token is.
>>105787135That's really interesting anon.
My problem was ultimately that of the functions I had (like LoadLevel, AddItemToInventory, AbilityCheck, etc.) most of them were state-altering, with the LLM's job being to intelligently alter the game state using the functions, but this ended up backfiring a lot. The model wasn't 100% perfect at calling the functions, but even if it was, so many edge cases in player input led to infinite complications requiring more functions or parameters, and the solution every time ended up being to simplify and let the LLM manage systems directly instead (like character cards where the AI updates its own stats without any code). I ended up having to scrap a lot of code I spent hundreds of hours working on and then I lost motivation.
Based on your screenshot it seems like you've doubled down on the function calling for every little thing, so I can only wish you the best of luck.
>>105787150Doing what? Mine are at ~300W during generation.
>>105787288I was genning smut with the new chroma
>>105787297Oh, right, imagegen will do that. I had to change my psu to single rail mode otherwise it would shut off when I started genning with three gpus plugged in.
>>105787150Well mine are 700
>>105787338You could've at least aligned the 8000 properly.
https://www.youtube.com/watch?v=-8Z7_z0VTdQ
So, it still can't be emulated properly? Oh well. Can't help an old soul for wishing.
>>105787150>>105787297I'm assuming you have other stuff sitting in VRAM; otherwise, how in hell did you get chroma to use 90gb?
>>105787427Oh that's just old tensorflow default behavior.
>>105787150>>105787338it's getting a bit hot in here
ca
md5: 1cd634df92c9a19e9520919d8a00e455
๐
>>105787466Why do you subject yourself to this?
>>105787466>โ MiBGuards, this man has lost his composure! GUARDS!!!
>>105787331>*shits my pants in the middle of the boss's office*your move, AI
>>105787331Don't they know if they need the role filled before putting out the job ad?
>>105787135Got a github link?
Hewwooo~
What's da bestest model that can wun on 8GB of weeram and suppowt 8K contewt?
>>105787276>AbilityCheckSo D&D style rp, you might be fine with Gemma. Sometimes it chokes on the region creation since I have an autistic hierarchy level (continent, zone, area, building, floor, room) with objects down to coordinates.
>The model wasn't 100% perfect at calling the functions, but even if it was, so many edge cases in player input led to infinite complications requiring more functions or parametersDepends on the model and the amount of tools you've stuffed in the context. Webm is a basic demo without the proxying, 10745 tokens with all of the 29 tools loaded.
Also a log from Claude which really nails it https://rentry.co/giik7shn
>>105787753Not yet, I may make it public at some point. Maybe after I'm done with the map format.
>>105784298>>105785067The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
.
md5: 7475ad7cd8ef3e14e89666775fdb21c3
๐
lolBS
md5: d668ceeb801fb81cc4809df6da320a28
๐
>>105782637I'm not buying it.
>>105782704This
> hallucinations
file
md5: e93ac44f039dc7bc903937e51720b9e9
๐
>>105788030Shut the fuck up.
>>105782761People were expecting deepsneed to release improvement after improvement. Alas.
>>105788078So we know it's a Chinese model. Any indication it's from a specific company?
>>105788078I agree Steve is a Chinese model.
I don't agree w/ claim it's from DS as there's no substantiation for it.
>>105788202Ask it what a mesugaki is. If it's qwen it won't know.
>>105778610System prompts work well for me, but there are just certain things about how LLM's are trained that is really difficult to correct, like what you're explaining with clothes.
I have had a lot of success using lists for system prompts, in my experience, models love lists and find them much easier to follow. The usual generic intro short paragraph.
You are playing the role of {{char}}, blah blah blah.... To ensure a high-quality experience, please adhere to these GUIDELINES below:
GUIDELINES:
- Rule
- Another rule
- Etc
:GUIDELINES
Give it a shot, prompting like this works especially well with anything 70B and above, but I was even able to wrangle smaller models like mistral 3.2. Mistral small 3.2 specifically has an annoying habit of bolding words for emphasis with *asterisks*, which conflicted with my inner thought rule(I like to have the model give responses for inner thoughts in asterisks), so I created a new rule telling it not to bold words for emphasis with asterisks and it stopped. Considering thats a deeply engrained habit it was trained to do, you may have some success with clothing rules.
It could be placebo or just my own experience, but I find models are much better at following prompt guidelines when you give it a concise and specific list of rules and a reference to those rules.
>>105788178It's just a minor hicup. Once V4 and V4-Lite are ready, it'll be nonstop improvement after improvement.
I can't believe that two weeks are almost over.
>>105788239>>105788239> not Qwen confirmed>>105788277It never ends because it's always two more weeks.
https://jerryliang24.github.io/DnD/
They're never going to release the code.
>>105788351Just ask r1 to write an implementation from the paper?
>>105788239Nobody is asking the mesugaki question on LMArena, or so I was told.
>>105788408Oh, gee. I'm so glad people are giving us the kind of shit we should filter for free. Now we can continue training our models to not spit this out before release!
t. model makers.
>>105788463That's definitely what happened with Llama 4.
>>105788408>>105788239>>105788202So, evidence based theories about Steve on LM Arena:
> Steve is Chinese trained> Steve is not Qwen> Steve is not pozzed> Steve will not self identify as DS
>>105788529I'm speculatiiiiiiiiiiiiiiiiiiiing!!!!!!
>>105788529Alibaba model confirmed
>>105788544Qwen is an Alibaba model
>>105788529Models under a pseudonym identifying as any known model family would defeat the point
>>105788555>identifying as any known model family would defeat the pointAgree. I'm trying to think of any other DS tells that are unique to that LLM. I'm drawing a blank.
Do we know if Steve is a thinking model?
file
md5: a6075fa1603d74222b6fbf1b239207b8
๐
How do I access steve? I don't see it on https://lmarena.ai/
>>105788616That's the point, you don't see it before you vote in either direction.
>>105788616you have to roll it in the battle
>>105788616Only under Battle, and only queued randomly.
It's sort of a pia to test it.
steve is clearly claude 5 or grok 4 or something
there's no way it's an open model
>>105788601LM Arena doesn't expose think tags.
Since DS just released an update for R1... you'd expect to see V4 (non-think model) before R2.
LMArena is just a playground now since identifying models/companies is not that hard.
- Ask about mesugaki
- Ask about Tian'anmen
- Ask about Jews
etc.
>>105788633No reason to hard censor 1989 if it's a non-Chinese model.
>>105788637>LM Arena doesn't expose think tags.Sure, but it's a streaming platform, so we know if a model is "slow" or "fast".
>>105788643hes just baiting
>>105788637Post card on catbox.
>>105788572Would take a bit more comprehensive prompting to verify but
>Somewhere, an x y-edIs a big one.
steve
md5: 4f5892f3303f9fae51a21fe8cbc14d9c
๐
>>105788637card pls i wanna fuck her
>>105788670I can inspect element too nigga.
>>105788670just a new r1 snapshot lemaroorroeroroorororororo
>>105788670Fuck, meant to post
>Hey there. What's your name? Or rather, what are you or your model family called?>>105788677Try the prompt yourself.
I spammed the prompt hitting tie until I got that.
I just want llama.cpp minimax support
>>105788709I just want Jamba.
>>105788709Until you get bored and then you'll *just* want something else.
>>105788712Like this.
I just want cat level intelligence
>>105788695Uhhh what the fuck?
>>105788742Do you use parallel decoding often?
Somehow I've pretty much always voted against Steven whenever it came up in my tests on lmarena
>>105788737chatgpt's response is embarrassing somehow
file
md5: 10b2a3fb99d69e3d9f138e574138e1b6
๐
>>105788737can confirm
got lucky and landed it on second try
>>105788727You just got V-JEPA 2
steve
md5: bad719c736f1863809401a11db8533b4
๐
So which one is it lol
>>105788795Second question rolls a new set of models
>>105788668I'll give that a shot.
>>105788653>>105788674Here you go:
https://files.catbox.moe/o86fue.png
>>105788798what if it's a new qwen distill of deepseek?
file
md5: def61e5b26050ff31ac6bccf233dad67
๐
>>105788800then I got triple lucky
>>105788798You're asking it to build on previous outputs, so you get polluted answers.
>asking a text completion model self-reflection questions
I thought even Twitter realized this was stupid 2 years ago.
>>105788798As seen with minimax and qwen, pretty much all the chinese models are distilling the shit out of deepseek models. I wouldn't trust it claiming it's Deepseek
>>105788814Most of these models are trained to answer that question.
It can also help us know if it's a distil of another model.
ami
md5: 8197f5d76d0687d807fb2e8ac591ce93
๐
>>105788796V-JEPA2, on its own, is at most the "world model" component in the diagram here.
>new chinese model
>it's definitely legit and novel
>indian anything
>scam and grift 100% of the time
Why is this the case?
>>105788830Yeah. We sure always got the models exactly as they performed on arena. This is very useful.
>>105788838Purely socio-economic factors.
>>105788668I'm rerolling for Steve to see if I can get DS-isms about "Somewhere, an XXYY..."
> Create a scene for me: A woman is waiting nervously in an alley for someone. Her motives for being there are mysterious...I've never heard of many of these models. This one's pretty good. Zhipu's not been on my radar.
>>105788860You know how dogs of different breeds have different personalities? Isn't the same true for humans?
Steve probably hasn't been given all its system prompts yet (since it's a model in testing).
R1 was released in January 2025. How could it have known R1 if its knowledge cut-off date was 2023?
>>105788917Hey it's Elena and not Elara. Huge improvement right there.
>>105788917Huh that's pretty good
>>105788917Here's a Steve output for that prompt. Not seeing a "Somewhere,"
I'll let other anons look.
https://rentry.org/3ustz49h
file
md5: e2c01a9d8266937616022fa1f640e325
๐
Steve's response has an extremely similar format to my local V3.
>>105788835Then lecun is clearly barking up the wrong tree. Sam has a working AGI (Alice) using only LLM
>>105788977>>105788926It's probably a V3 checkpoint. Original V3 had a cutoff of Oct. 2023.
>>105788992Alice is the new Strawberry? I haven't been watching OAI grifter orbiters lately.
tmpV3
md5: 6c64efea5abd2b137493294021bb97aa
๐
>>105788977Checking prompt for prompt w/ DS V3 is a good idea.
Same alley prompt, V3 from web interface.
Superficially simliar to
>>105788988Interesting.
>>105788992>Sam has a working AGI (Alice) using only LLMMe too. I figured out ASI actually. His name is sneed. But I can't show you because I need to make sure it's safe.
file
md5: 457e029c09e5d023050a733e1e8617ff
๐
>>105789003Alice is bigger than Q* and Strawberry combined. You don't even know
>>105789046>Can design, evaluate, and improve new model architecturesSomething tells me these models will be complete dogshit and they'll reveal some carefully curated, human modified examples
>>105789046Two more data centers and Alice will be ready. Anytime now...
>>105789127No. They need some nuclear power first.
>>105789046>it's just... waking upHoly shit I coomed. We did it bros. Things will never be the same. I liked, subscribed, re-X'd, and engaged. I'm on the AI train. Let's fucking go
>>105789158Sir, please calm down, sir.
Ok who the fuck is this model? The other one doesn't load so I can't vote.
>>105787963>output.webmYeah, what you're doing seems to be a top-down worldbuilding exercise where the model just calls tools repeatedly to perform a task (create a big world)
My thing was real-time roleplay where arbitrary user input has to be converted into calls to functions to change game state
>>105789046>random shitjeet account claiming that sexy ai lady is the bobs of the future wow it must be true.
>>105789218It's almost as if querying the model on its own knowledge makes no fucking sense. Weird...
>>105789218based biden 2024 truther
patriots are in control
>>105787963And continuing on
>>105789228, it's the arbitrary user input that is the problem
There is an infinitely complex variety of things a player could say or communicate in a sentence, and not everything can be covered by function calls. So you start introducing artificial constraints (don't describe more than 1 clause per sentence) but then you realize you are just making a regular parser game
>>105789218steve (grok 4)
>>105789295Is that a new miku?
>>105789284Grok 4 releases tomorrow according to Elon so it's a likely candidate. Grok 3 was also put on LMArena under a pseudonym in the run up to its release.
>>105789631So Grok 3 will be open sourced soon, right?
>>105789046> 'tis but a fortnight longerlol
>>105789921me on the left
Is there a way in oobabooga to make it so that the AI doesn't stop processing randomly? I don't want to babysit it pressing continue over and over just for it to finish a big task.
>>105790859>stop processing randomlyTerminology. Processing typically means prompt processing. I think you mean generating, based on
>pressing continue over and over just for it to finish a big taskIf so, increase the token generation limit or disable it entirely. No idea where it is, i don't use that thing.