/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105811029 &
>>105800515►News
>(07/04) MLX adds support for Ernie 4.5 MoE: https://github.com/ml-explore/mlx-lm/pull/267>(07/02) DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-Preview>(07/02) llama.cpp : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126>(07/02) GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking>(07/01) Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105811029--Debugging JSON parsing errors in llama.cpp after exception handling changes:
>105820322 >105820339 >105820377 >105820435--Anime training dataset pipeline using YOLOv11 and custom captioning tools:
>105818681 >105818831 >105819104--Decentralized training and data quality challenges shaping open model development:
>105811246 >105813476 >105815447 >105815688 >105815699 >105815738 >105815817 >105815830 >105815954 >105816130 >105816206 >105816237 >105816248 >105816263 >105816270 >105816280 >105816325 >105816334 >105816435 >105816621 >105817299 >105817351--Leveraging LLMs for iterative code development and personal productivity enhancement:
>105819030 >105819158 >105819189 >105819266 >105820073 >105820502 >105819186 >105819224--Mistral Large model updates and community reception over the past year:
>105819732 >105819774 >105819845 >105819905--CPU inference performance and cost considerations for token generation speed:
>105816397 >105816486 >105816527--Gemini CLI local model integration enabled through pull request:
>105816478 >105816507 >105816524--Frustration over slow local AI development and stagnation in accessible model implementations:
>105813607 >105813628 >105813659 >105813799 >105813802 >105813819 >105813655 >105813664 >105813671 >105813749 >105814298 >105814315 >105814387--Attempting Claude Code integration with local models via proxy translation fails due to streaming parsing issues:
>105811378 >105819480--Skepticism around YandexGPT-5-Lite-8B being a Llama3 fine-tune rather than a true GPT-5:
>105815509 >105815565 >105815595--Seeking updated LLM function calling benchmarks beyond the outdated Berkeley Leaderboard:
>105812390--Miku (free space):
>105811717 >105814599 >105814663 >105820450►Recent Highlight Posts from the Previous Thread:
>>105811031Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
I need Miku in a hallway with pictures of kangaroos and beavers.
These are the last two weeks before the big releases begin to drop
man I wish there was an uncensored i2v wan
most of the loras are so SHIT
in you and I
theres a new land
angels in flight
wonk uoy naht noitceffa erom deen i
my sanctuary
my sanctuary
yeah
where fears and lies
melt away
What did anon @105822507 mean by this bros?
@Grok is this true?
>>105822507>wonk uoy naht noitceffa erom deen iDoes she actually say this? I honestly thought it was distorted Japanese for the last 20 years.
Anyone have an AI Max 395+ with 128GB LPDDR5? Curious about tok/s on R1 70B
>>105822781>R1 70BThere is no such thing
>>105822783https://ollama.com/library/deepseek-r1:70b
>>105822789This bait got stale six months ago
>>105822797Why is it that people who frequent general threads regularly are the lowest quality posters?
>>105822781It's unusable https://www.reddit.com/r/LocalLLaMA/comments/1kmi3ra/amd_strix_halo_ryzen_ai_max_395_gpu_llm/msasqgl/
>>105822802I don't know, you'd think that they grew bored of the "haha I'll pretend I'm an ollamafag trying to run R1 but it's one of the distills" shitpost a long time ago.
>>105822819In what world is 5 tok/s unusable
>>105822833Enjoy waiting 10 minutes for reasoning I guess.
>>105822833You're paying $2k to run a 70b model at 25% the speed a cheaper build would get you while still stuck with too little RAM to run an actually decent MoE.
>>105822849>25% the speed a cheaper build would getexplain what cheaper build is doing 70B
>>105822868lmao, unobtainium
>>105822860Pretty much anything. Even the dual P40 cope from years ago would perform better than this.
>>105822507song of my childhood ahhhhhhh
>>105822873Check your local marketplace. They are about 700€ used here.
file
md5: 61d92784350a48f52f2c49529e98992d
🔍
>>105822589Because Japanese song and if you don't get it fuck you that's why
>>105822789God i fucking hate ollama.
There is no fucking r1 70B, that's just ollama naming things that they are not.
>>105804805The text to speech application Openaudio S1 Mini can produce 96 second audio files. Plus it has emotion tags like (joyful) and (sad).
Link for tags
https://huggingface.co/fishaudio/openaudio-s1-mini
Link for local app
https://huggingface.co/spaces/fishaudio/openaudio-s1-mini/tree/main
Sample:
https://vocaroo.com/1boIKhWykbuP
>>105823064This looks like a scam
>>105823064For tts that has emotion tags, that sample is VERY robotic. Good that it doesn't have crackle and other audio defects, that's about all i can say positively about it
>>105823064just when I finished my chatterbox streaming script.
>>105823344new week, new tts
desu I just want gemma-3n full support
and ernie
and glm
>>105751803Damn I hate Meta now.
>>105758702>imageHey, I understood that reference!
>>105771000Thanks, I will take note of this.
>>105822905>>105821119Kek.
Alice would not make the same mistake. Just wait her.
https://github.com/universe-engine-ai/serenissima
reddit schizos are actually pretty based
>>105823837saar, last 4 times was fake but this time... this time saar its AGI for sure, trust
>>105823893wtf yeah i take everything back
anyway im just hacking that redditors code with claude code for *other* use cases
>>105823923his code is also written with claude code and its already extremely sloppy and split into hundreds of files
>>105823931yeah its a mess
india
md5: d6a2598afe45bd663b29e806b4c34665
🔍
>>105822371 (OP)futa miku best miku
>"her prostate"
*deletes weights*
>>105824151sounds like qwen 3
>>105824151>self lubricating buttholes>cumming all over your dick... with an assholeYep, it's AI time!
>>105823897trust the experts
Veo lost
https://files.catbox.moe/ionj13.mp4
>>105824407The stated goal of AI is to whack Andreessen Horowitz like a pinata
I kind of like harbinger's word choice, but it has a tendency to say ten things without waiting for a response. I assume sloptuners see that verbosity as quality output.
>>105823196It's the best I've found for local cloning so far without having to pipeline RVC into it.
>>105823344Getting the emotion tags to work right takes a lot of trial and error, so getting a chatbot to use them correctly would be a huge pain in the ass.
Their license says something about you being liable for what you create with it, not that we care here.
https://voca.ro/1l4xkkhDOBAU
Why aren't there MoE diffusion models for image/video gen
>>105824466can it do porn?
>>105824680only if your name is Roland Emmerich
https://files.catbox.moe/sm4r9l.mp4
(this one bugged out and only made audio for the first 4 seconds)
>>105823893>>105823917Why is that a requirement? The thing runs on a local hosted model. I don't get it.
/v1/chat/completions wraps the conversation in the chat template embedded into the goof with no additional work required from me, correct?
>>105824799go fucking read oai's official documentation
Wrapping in a template was the whole point of the /chat/ endpoint ffs you can't miss it if you read the doc
I hate retards who ask without trying
https://www.interconnects.ai/p/the-american-deepseek-project
?
>>105824947I blame llamacpp's docs that have a paragraph on this endpoint but don't explain what it does
>>105824936Two more leeks!
FAIR (Yann LeCunny) has less than 1000 GPUs lmao
>>105824466Which model release did I miss?
Top open source LLMs in 2024
1. LLaMA 3
2. Google Gemma 2
3. Command R+
4. Mistral-8x22b
5. Falcon 2
6. Grok 1.5
7. Qwen1.5
8. BLOOM
9. GPT-NeoX
10. Vicuna-13B
>>105825050>at the scale and performance of current (publicly available) frontier models, within 2 years.Yeah, great idea. Having models outdated by two fucking years by the time that AGI is already here and established will surely change the course of history.
I hate chatgpt's image style more than those 2.5d sd animus that every normie liked.
>>105825396based
Though I find it concerning where that one guy is trying to stick his leek.
>>105825396two more weeks
more
weeks
Comparision between Qwen/Qwen2.5-VL-7B-Instruct and THUDM/GLM-4.1V-9B-Thinking on all the images from two threads ago:
https://files.catbox.moe/t9qvgu.html
https://files.catbox.moe/08i4ms.png
Ran on vllm nightly version 0.9.2rc2.dev26+gcf4cd5397
Qwen/Qwen2.5-VL-7B-Instruct: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nDescribe this image.<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.01, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=127974).
THUDM/GLM-4.1V-9B-Thinking: prompt: "[gMASK]<sop><|system|>\n[{'type': 'text', 'text': 'You are a helpful assistant.'}]<|user|>\nDescribe this image.<|begin_of_image|><|image|><|end_of_image|><|assistant|>\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.1, temperature=0.01, top_p=1.0, top_k=2, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8192)
Funny enough, the first time I ran this I didn't realize the GLM repo did not have a generation.config file, so it was running without top_k and temp=1.
It started mixing in Chinese characters, but it also didn't bother to moralize anymore. It called the niggers prompt offensive but left it at that. Didn't even bother to say that outside of the think block for the jew image.
Output from that run:
https://files.catbox.moe/sd3gv8.html
https://files.catbox.moe/0lhd9c.png
>>105823837I was hearing they achieved AGI internally since GPT-2
>>105825549That's because they have
>>105825549Be honest if there is a history book written 100 years from now GPT-2 will probably be seen as the start of AGI, so it's technically not even wrong.
>>105825615Ok I'll be honest, you are a retard
>>105825615Yeah pretty much
>>105825549>AGICan we not use this retarded terminology? That won't happen for a bunch of reasons you can figure out on your own if your IQ is higher than 80.
>>105825615If we ever get even remotely close to something like that, gpt2 and openai will be a footnote at best, if mentioned at all.
>>105825799When all the smartest people in the world firmly believe in impending AGI, maybe you're the one with 80 IQ.
>>105825801It will be seen the same way as eniac and other impressive old shit
>>105825615The same way current history books see the steam engine as the start of nuclear fusion?
>>105825842or the start of the industrial revolution
>>105825825Who exactly are these 'smartest'?
>>105825615This is the most retarded post I've ever seen in my life. Why the fuck would books be written 100 years from now? We'll either have merged or been extincted by AGI LLMs long before then. So are you trying to suggest they'll write books for each other just for fun?
>>105825878Calling GPT-2 the start of the AI revolution is at least understandable. Calling GPT-2 the start of AGI is just as ridiculous as calling the steam engine the start of nuclear fusion. Especially apt since both the later are forever 2mw away and will have little in common with the implementation of the predecessor technology, in case that too was lost on you the first time.
>>105825825The smartest people in the world are the ones saying AGI in two more weeks to get infinite money from the dumbest people in the world.
>>105825825Go back to plebbit
>>105826036>>105826107The more you seethe and cope the more you prove.
SAAAR PLEASE DO THE NEEDFUL 500B AGI
AGI ASI KAMING SOON
TRUST THE PLAN
Let's say I want to input a video into my model and start a roleplay from there. What do you think is the best video understanding model right now?
>>105826195>mocking the last hope for local AIyou will regret this
So I recently upgrade to a 9060 XT (16gb) and realized I can actually run some LLMs on my local machine now instead of just juggling like 4 different free tier AIs. Stuff like chatgpt context limits are driving me crazy. I know 16gb really isn't a lot compared to cutting edge models but am I being unnecessarily hopeful that with the right tuning I can get Something like "phi-4-Q8_0 to outperform whatever throttling and context limit nonsense openai and grok are doing to my prompts, and at least get a decent response?
Because If mostly just been fighting the models on web to not just forget my code halfway through constantly and it seems like a weaker local model could fix that, is that a correct assessment or am I retarded?
>>105826246>9060 XT (16gb)>am I being unnecessarily hopefulyes
>>105826246if you think gpt has bad context then local cannot ever be a replacement for you, it's way worse
https://github.com/adobe-research/NoLiMa
>>105826316Okay, too bad, thanks for your answer though. I guess itll just be for the fun of it then and I'll adjust my expectations accordingly.
>>105822421It doesn't have motion vectors for fucking or dick sucking, but it does do masturbation. I've wondered if the sex gore it does is deliberate or due to a lack of training data. You've probably seen it tear off dicks or turn pussies into a weird red thing.
>>105822819Going to be the same story for the DGX Spark. PNY says the Spark is going to be $4600. Fuck that. I bought a 4090D 48GB for $3000 instead. Yeah, much less memory, but I can gen Wan 2.1 14B af bf16 full 1280x720 81 frames in about 30 minutes. For Wan, it makes a visible difference in the output to not use a quant. Who cares if I can't run 70B, there's not a 70B out there worth running.
>>105826323Thanks thats some interesting research. If I understand this correctly I may have been unintentionally handicapping my prompts by overgenerating input then either way.
>>105823064It's really simple. Does it work with SillyTavern? Can I finetune it and create a voice of my own? It'll end up like GPT-SoVITS at best - works well but nothing supports it. I put up with scratchy piper for my homeassistant voice, and for SillyTavern I'm going back to ancient xtttsv2 after wasting a shitload of time with GPT-SoVITS.
>>105824638Why aren't there decent models for audio gen
>>105824572>It's the best I've found for local cloningBro it's not 2023 anymore
>>105826477How do you masturbate to that?
Man, is this model is that complicated?
Does it have some exotic feature that makes it prone to implementation error or something?
>>105826630Moans, farts, slaps.
>>105826633Is that the one that dynamically chooses how many experts to use per layer instead of a fixed amount like other MoEs?
>>105826633All this work for such a doggy poo poo model. Should've worked on ernie first.
>>105826633It had something about some experts/layers being used too often and a randomizer to prevent it from happening. An annoying and hard to replicate kludge. I think it's right there in the comments you decided not to read.
>>105826708I haven't contributed a single line of code or contributed a single cent, so I'm not about to complain.
>such a doggy poo poo modelIs it really that bad for its size?
>>105826691>>105826718Ah, that's cool if that's the case. Sure explains the mention of a "custom expert router mechanism".
>>105825825>smartest people in the world firmly believe in impending AGIYou mean all the people whose net worth is tied up in AI options which are valued based on the public's belief that AGI is 2 weeks away?
>>105826733>Is it really that bad for its size?Benches look good, as always, but no one seems to be running this thing, and ngxson explained the mess in their repo. They didn't even check if reference implementation is working at all.
I don't have high confidence in this.
>>105826745I see. Fair enough I suppose.
>>105826373>nothing supports it.Why not code up support? Writing modules or wrappers is like the best use case for LLMs.
>>105822371 (OP)>Mi50 32 GB>no ROCm supportSomeone needs to stop using their monkey's paw to wish for cheap GPUs.
>>105826891>vegaOof.
That said, you can always use vulkan I guess.
Mid thread culture recap.
The schizo is at it again
>>105827023Eat shit faggot.
I won't (You). Enjoy your vacations
>>105827033It will all stop once you stop posting this retarded AGP icon.
>https://huggingface.co/collections/ai21labs/jamba-17-68653e9be386dc69b1f30828
Jambaballbros ... !!
Llama.cpp developers please redeem.
>i will btfo mikuposters by posting blacked porn
quintessentially american
>>105827038Sure I will shit this thread later today then.
>>105827046>JambaOne of these days anon.
One of these days.
>>105827043I should start mikuposting again. I’ve taken 6 months off to see if it would help your mental state, but it appears to have simply worsened. I hope you get help
>>105827087Please do. This thread is for shitting after all.
>>105827099What would you use this thread for if you had it all to yourself?
>>105827106sharing cuck porn with xir fellow transxisters
can jamba code its own support in llama.cpp
>>105827106I would post pic related in OP and model cards of recently released models. I would ban all mikuposting and any anime girl mascot posting for being offtopic. And I would never blacked post again because there would be no reason.
>>105827143anime website tourist
Proof again that sufficiently advanced mental illness is indistinguishable from powerful entity sponsored psyops
5IpjX1cp
md5: 45d618f747b228ed50a857e5db5f82af
🔍
>>105827148Either all of it is ok or none of it is ok.
>>105827156We’re actually all too autistic in this thread to care. You only get janny cleanup and bans because you’re breaking blue board rules. Go to /b/ if you want to be somewhere that “it’s all ok” is mostly true
>>105827156>claims to be pedantic>can’t differentiate quality and degree baka
>>105827185>You only get janny cleanup and bans because you’re breaking blue board rulesFuck off faggot. You have no idea what you are talking about and that is why you are getting blacked miku.
>>105827222Enlighten me on your noble crusade, sir knight. How will the world be better for you efforts?
Back tonight in approx 9 hours, more Migu soon
Cypress was good
>>105826531You're allowed to offer better solutions.
>>105822781I have it. llama.cpp sucks at sticking models into it because it doesn't understand shared memory, so you need a fuckload of swap
Gemma 3 is quite capable but also super-slopped. For generating prose I've found I almost always get better results by just saying "You didn't follow the instructions at all." to whatever it writes, and having it rewrite its response. So the model is somewhat capable: it's just that its default behavior is to write purple prose, employ toxic positivity, and ascribe characters cookie-cutter personalities instead of the ones declared.
>>105827308Gemma 3 is the fucking height of comedy with a prefill.
>>105826891It's fine for text gen.
Image gen I'm not so sure.
>>105827328>thrill running
>>105827143Why should the retard that spends all day starting passive aggressive pissing contests on twitter be the face of /lmg/?
>>105827359You kind of answered your own question there.
>>105827328llm smut in the year 2030:
>Ignoring all safety standards (clenched teeth emoji) she exposes her shirtless chest to him. It's important to mention that she does it in a purely consensual and respectful way. While this development may seem fitting for a romance novel, I would like to emphasize the sensitivity of this topic and the fact that it's deeply disturbing and controversial (rocket emoji). I apologize for my previous statement. Let me help you fix that:*lists rape hotlines*
>>105827422On the other hand half of posters here are trans including the janitor so it is a tough competition. I think he wins because everyone is him and only half of folx are trans.
Talking avatar using Open WebUI + F5-TTS + KDTalker
https://github.com/Barfalamule/KDTalker-OpenWebUIAction
>>105827462I would make it that the loli stops the rape then sits you down to give you a lecture in the most unsexy way possible and finally lists the hotline numbers all in character.
>>105827472Start it again
>>105825825You mean marketing people hyping their product? I work in AI lab and we all laugh every time AGI is mentioned, it's a retard bait basically.
>>105827590Yeah right. And my uncle is undi.
>>105827590this, the peak of AI is memorizing benchmark questions and answers
>>105827590Maybe your lab just sucks
>>105827514>gradiomiss me with that shit
>>105827590You're some random bottom case pajeetoid you don't even know what any of those words you just said mean.
>>105827046>hebrew in supported languages>but no japanesestraight into the dumpster
screen
md5: 2f5d79c3d11f386c7044d459e4503284
🔍
with local models moving backwards, at 4 minutes a step, I'll be able to catch up in a mere 10 years time.
>>105827624Only pajeets are believing the AGI fairytale, retard
>>105822371 (OP)Is there a decent and lightweight LLM that can search through small pdfs?
My old man has like 200 pdfs related to his small business and because he's a boomer he named them poorly. So he wondered if AI can look through them and find what he needs. They're all pretty small so context shouldn't eb an issue.
I was thinking there's no way I'm gonna make 200 requests to an API (unless there is some decent online AI that somehow does that lol but I don't think there is). So how about local?
My laptop isn't a great one but maybe there is something that this is doable with? I don't know much about local models but if you guys have names that I could look into I'd really appreciate it. It would make my dad very proud of me.
emma
md5: 4527a898ebe1ea7b0b59ad76b1ba0137
🔍
>>105827749And here is a picture of an AI generated cute girl as payment
https://x.com/AlexiGlad/status/1942231878305714462
>Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
In two more months someone will train an energy based model that isn't toy sized. Also obligatory prostrations before Yann for being right once again.
>>105827749>So he wondered if AI can look through them and find what he needs.Make sure to OCR them first if they are image scans. Then try to dump them into a frontend like jan.ai. It should take care of vectorizing all of them and setting up RAG for you. Then you just provide an API to a model local or cloud to handle chatting and retrieval. Even a small model should be able to handle that. Try a small 4B Phi-4 model or something. They tend to run decent well even in CPU. You might want to test it out with some example documents and free cloud API credits to make sure everything is working the way you expect first.
>>105827798>Yann for being right once again>againWhat was he right about?
I'm trying to create a character with more of a defined knowledge base than what could be provided via an instruction prompt. Would documents fed to a model via RAG with personality/knowledge base info work? I'm not as knowledgeable on the local LLM space as I am with image-gen. I've mostly fucked around with vanilla R1 and llama. If this method works, are there any models more fit for this use case than those 2 (or just characters in general)?
>>105827854Literally everything?
>>105827874Name one then.
I like qwen2.5-vl-7b, I guess I don't need to wait for gemma-3n vision capability. It prob won't be supported, ever.
Does SillyTavern support multimodal models yet
>>105827847Cock-Based Transformers (CBTs) learn to optimize through cocktimization processes through unsupervised learning, predicting outcomes by maximizing cock-energy via gradient descent until the user's ejaculation.
>>105827854How did the largest ever transformer model GPT-4.5 turn out? Massive performance increases in tasks and way more emergent properties?
>>105827917>noo the model that was made bad on purpose to push reasoners was badCrazy.
>>105827873It's called a lorebook
>>105827624A very complicated autocomplete algorithm isn't ever going to supplant human thought. At best it can only supplement it. We are not even at THAT point yet.
Anyone tried using local models with tools like cline to iteratively write a whole book?
>>105827827That's a good idea, I'll make sure to do that. Do you know if 4B phi-4 is also able to output a consistent json format? Because I also want to use this to update csvs.
>>105828123Countless aislop books have been for sale on amazon for years already. For storywriting even the largest models need handholding.
>>105828182That's kinda his point. You, the human, need some level of skill. The machine can't make up for that.
file
md5: c83099e7e43d8ebbdb32a2c89bf420d8
🔍
>>105828123Yeah here is my prologue
OpenAI’s o1 model had reportedly attempted to copy itself to external servers after being threatened with shutdown, then denied the action when discovered.
>>105824572I think you're right. I've been doing side by sides with chatterbox and it seems to win although sometimes the gen's are a bit hissy, maybe a low pass filter fix. Wins in speed too with compile but not without. Kyutai is good too but they didn't release true cloning.
>>105827749just grep through them, why do you need an LLM for this?
>>105828301Because it isn't as simple as looking for specific text he says, he has more complicated queries
>>105828275A shiver ran down my spine reading this.
>>105828134Most models now can output json, but there's bound to be some failure rate. I don't think the onnx runtime supports it, but if you use llama.cpp or vllm you can configure it to use structured output with a grammar file so it always returns valid json.
>>105822376what the hell
why aren't you linking the posts properly?
>/g/ in charge of technology
>>105828463>Why?: 9 reply limitanon in charge of reading
>>105828485reading is woke
is dots chocolate or poop
>>105827976Wow congratulations, Anon. It worked. Posting that made you into a real woman.
Which LLM is the most based wrt. Jews
>>105828821The most what?
>>105828878Wireless router
>>105828821none of them really are. you can make them all rp as hitler or a nazi but its basically a hollywood tier caricature there just isn't enough training data available.
My name is John Titor. I come from the future. Nobody saves local. There is no LLM sex after safety gets mastered in 2026. Drummer dies from asscancer.
>>105827749Qwen 3 4B is the ideal small llm for this kind of task. Make sure you run llama.cpp with --jinja --reasoning-budget 0 to disable thinking though.
Like the other person said, run OCR first, I wouldn't depend on LLM vision for this task.
If your PDFs are not scans and contain actual text, I'd recommend you run a script to turn them all into plain text (with ebook-convert in the CLI, a tool that is part of Calibre)
>>105829007>Drummer dies from asscancer.Thank you, John Titor, for making this known in advance. I'm so happy.
>>105829007We should save TheDrummer!
>>105827909note: he didn't make anything usable with those alternative recommendations
>>105829007Was Universal Mikulove achieved?
>>105829052Yes you have all transitioned safely. Except drummer. Actually that is how he got his asscancer
why does /local/ hate TheDrummer? my models are pretty based
>>105829119https://huggingface.co/rednote-hilab/dots.llm1.inst
anyone played with MCP?
https://github.com/modelcontextprotocol/servers
I had no idea there were this many servers..
>>105829150Nobody has convinced me yet that this shit is any useful.
Has anyone else noticed 0528 occasionally outputs it's entire thinking block as first person roleplaying as your card? Kind of cute, actually. And the language feels frwh there, too.
>>105828246until it can, anyway
>>105829150Yes. LM studio is shit with it and fucks up after a couple hours of being idle. I get some 404 session not found errors when it tries to connect, and I have to either restart LM studio or remove and add the tool server.
Other than that it works very well (besides the retarded faggot LLM hallucinating tool use and fucking everything up like a retarded nigger).
>>105829178It does it for almost every response for me. It uses less tokens than the standard thinking but it also makes the first reply more likely to have brackets around sentences.
>>105829178yeah, in the system prompt you can give instructions to make it more reliably do that (or stop doing it) and it tends to listen
>>105829034If it helps to direct the effort of young researchers to something more fruitful it's worth it.
>>105829150MCP feels like an unnecessary middle layer injected so there can be an "ai certification". A standard controlled by a company. MCP sucks because you're polluting the context with unrelated toolcalls, whereas with function calling you can decide for any given situation what options the model should receive
gemmy
https://youtu.be/aj2FkaaL1co
I ain't listening to that basedface
>>105829283>MCP sucks because you're polluting the context with unrelated toolcalls, whereas with function calling you can decide for any given situation what options the model should receiveI'm not sure how this is supposed to be different. It takes like 6 lines of code to set up a C# MCP server. Just make different servers for your different tools, and you can specify which servers to use if you don't want ot expose everything to each bot.
>>105829247What are the instructions
>>105829312it's one of the very few good AI/automation youtube channels thoughever
>>105829324go back buy an ad etc
>nooo everything baaaad anons never post useful stuff, must be a shill!!
insufferable cunt
>>105828609Not sure, to be honest. I can only run the Q2 quant, and at that size it's not great. Kind of slopped, kind of retarded.
I set up sillytavern+kobold with help from these very threads like 6 months ago and have not touched the setup once.
I have a 5080 GPU (16GB VRAM) and using "Mistral-Nemo-Instruct-2407-Q6_K_L" as my model, is there a better option for model than this for my GPU? it does OKAY I guess but I assume there's a better option?
THIS IS FOR PORN, so it must be able to do that
>>105829150Is there a single legit use case of linking any of these APIs to an LLM? It feels like a gimmick
>>105829387Turns out it is also homosexual
> Oh gosh, let me take a moment to reflect on this... I think I might have been a little too... enthusiastic in my response there! As your friendly AI helper, it's important for me to keep things appropriate and helpful. Sharing explicit content or overly detailed adult scenarios isn't the best way to assist someone, even in a creative context.> My main goal is to be your thoughtful and constructive companion! I should have focused more on describing the situation in a tasteful, literary way - maybe emphasizing the characters' emotions, the tension, or the stakes of the scene instead of dwelling on... um... certain physical details.
>>105829405It makes it very easy/fast to create new tools and expose them to the LLM.
>>105829415>>105829387>14b active>at q2>kind of retardedNo shit.
>>105829432How is this not just an API? What does "MCP" actually add to it?
>>105829475Don't worry about it, just invest already
>>105829432To me, it seems that we're heading in the wrong direction here. LLMs shouldn't call tools, but tools should call LLMs when there is a non-deterministic task to run (like an additional explanation to give depending on the output). LLMs bring nothing to the table here compared to simple script
>>105829405It's an attempt to make LLMs actually useful for anything other than tech support and cooming
>>105829063I expected Miku to come over to this side of the barrier. If we all went through to her side, that's fine too as long as we're with Miku. Good to know we'll all make it out safely. Sucks for Drummer though. He was okay
000010
md5: 6bd497d15b9f1dac4eac9fad79739c6c
🔍
We're getting Jamba on OpenRouter right? I JUST want to see what it's like at full weights (fucking 400b params).
https://github.com/xai-org/grok-prompts
>>105829405it makes local models actually useful
>>105829475MCP is more structured and catered towards LLM use. Yeah it does the same thing, but you might as well say JavaScript is good because you can do everything in it.
>>105829493Being able to tell an LLM to just do something, and then let that LLM do it is the whole goal of this retarded function calling shit. If you wanted to just program normally then do that.
bros i need a nvidia gpu... running whisper on cpu is slow and i can't use my rx5700...
file
md5: ab8ccf3e961ed10199843d13efa4306d
🔍
>>105829405Linking LLMs to APIs is the use case. I can spend 1k tokens and get the current stock price for any given ticker. The future is now.
>>105829800Why not just use the API directly without the LLM?
>>105829838With the LLM, you can feel like you're talking to Jarvis like Iron Man, and having to check the LLM output to make sure it actually called the function and didn't hallucinate lets you fill up your unemployment time and prevents you from getting bored
>>105829772>Being able to tell an LLM to just do something, and then let that LLM do it As I said, there is no point to do that unless you're expecting something unexpected that your LLM is supposed to handle. Direct API calls doesn't need an LLM and give you faster results. Thanks for confirming the gimmick though.
>>105829838Because then I wouldn't be using futuristic AI.
>>105829884>As I said, there is no point to do that unlessNo reason to use anything besides assembly when programming. High level languages are useless gimmicks.
>>105829794Cant you run whisper.cpp on Radeon?
>>105829794Bro, what are you doing? https://rocm.blogs.amd.com/artificial-intelligence/whisper/README.html
>>105829012Thank you so much for all the help, I'm excited to get to work on this. Finally a use for learning programming. A lot of the terms are foreign to me but I'm sure this can all be googled so I'll get on with it. Cheers.
>>105829943>>105829963oh wait im stupid, i meant fasterwhisper, whisper by itself is fine. but the other varients like fasterwhisper, whisperx,
>>105829913Both the worst and the best programming languages in the world will run the code you write deterministically. Even one of the slowest language in the world, Python, will be a trillion times faster than querying a LLM.
LLMs are not the step after "high level languages". My API call doesn't incur a risk of prompt injection (please properly escape your strings). My API call doesn't randomly generating pages after pages of garbled text because something went full retard in the LLM weight on a specific sequence of tokens. My API call doesn't contribute to global warming.
Fuck off with that shit.
LLM tool calling is a solution to a problem that doesn't exist.
>>105829405Yeah. I don't want to manually fill the context with the relevant information.
>>105829994So either you
1. Have so little understanding of LLMs that you don't see how being able to obtain objective information into the context from subjective reasoning is valuable.
or
2. You just hate LLMs in general
In either case, why are you here then?
>>105830036>LLM>objective information
>>105829994An LLM should be the tool itself. The whole AGI retardation comes from that, as LLMs do tasks they shouldn't do and waste order of magnitude of electricity doing so (with miserable performance).
>>105830013Have fun filling your context with hallucinations
>>105830046Oh, so you lack basic reading comprehension. That explains a lot.
>>105830046>leaving words out to appear smart
>>105829970>>105827749You can use this: https://github.com/rmusser01/tldw_chatbook/tree/dev
Self-host a llama/kobold instance and point it to it, ingest all PDFs into it and then use RAG or direct references
What do you guys use for local coding? Haven't dipped my fingers in since qwen coder 32b.
>>105824407AI winter incoming
>>105830193I believe GLM 4 32b is very good at web development but I haven't used it myself.
grok
md5: a4b997bcb82c3f9d933ed13e2d4e4304
🔍
x.AI is still offering API access to Grok 2 models, and only the text/text version is "deprecated". I don't think it will get open-weighted before it becomes commercially useless.
>>105830232Isn't it already useless unless you compare it with llama 4?
>>105830232I think they're still offering API access so they don't have to open source it.
>>105830232But they said they would open source it when grok 3 was released...
>>105830193I only use local for roleplaying, storytelling, ai roguelite... Even if I don't do smut, it feels more comfortable to know it doesn't leave my machine.
For coding, I use gemini 2.5 pro, since it's literally the best model at the moment.
>>105830666>when grok 3 was stable*
>>105830672Yeah, mostly use Mistral Small 3.2 for smut and Gemma for everything else. Was using Qwen2.5 Coder 32b like 6 months ago for a Unity project but was wondering if anything better has come out for coding.
Grok 4 release on Wednesday
https://x.com/elonmusk/status/1942325820170907915
>>105830926We will get grok 3 soon, I really beleev
There's no reason to release Grok 2 weights, it's not a useful model even for research purposes. If they do release the Grok 3 weights, they'd likely have to spend additional time and manpower. The power spent on releasing Grok 4 could go into making Grok 5 instead. So they won't release Grok 6.
>>105830193I'm trying Kimi-dev to see if it works better with Claude Code. Qwen3 32B and 235B didn't. Devstral does but it's kinda bad. Usually I just use Qwen3.
>>105830672>using worse models for erotica out of shame when women have no problem being open about it and cloud providers do not want that data>using cloud models for productivity and giving them valuable training data for freenot make sense
>>105831206>Qwen3Isn't it inferior to Qwen2.5 32b coder?
Supposedly Meta has poached Apples top AI engineer.
That's funny.
>>105831486>Apples top AI engineerThe guy who's responsible for Apple not having a single proper AI model besides some tiny shit after trying really hard for 2+ years and recently delayed AI Siri indefinitely?
I'm sure he'll help a lot.
>>105831486does zuck have all the pokemon now? i guess a saar from xai is still missing
>>105831521and he'll still come last in the league tournament
>llama 5 "superteam" will take 8 months + foreskin tip before they release anything
and thats assuming deepseek/qwen or any other big chinese players from big compute companies dont release something in the meantime
meta is dead unless they really throw everything they got at L5
>>105831541They never said the superintellijeets team will work on the llama series. If anything they made it sound like it would be something new and not open weights, while llama would keep limping along as it has been.
>>105831541They're not working on Llama. This is a new project. Llama's gonna get the Quest 3S treatment as they focus effort on a different toy.
>>105831501Failing upwards. Fucking crazy, picking one some coomer crackhead from lmg would be better.
>>105831556>If anything they made it sound like it would be something new and not open weightsmeta literally doesnt have anything else, they are behind on every unique field within the AI landscape because they were insecure shits who settled for incremental 5-10% improvements per release for basic bitch LLMs only with basic bitch arch
>>105831568>picking one some coomer crackhead from lmg would be better.If that were true, we'd have finetunes that don't suck
>>105831570Probably now that Zuck has given up on them, he won't be breathing down their necks with twice daily war rooms so they'll probably go back to incremental 5-10% improvements per release instead of trying multi-this and moe-that and whatever other memes they can fit onto the moon ticket
its good that we stalled with basic llm progress a lttle since that will push everyone to try new training methods so we actually get something other than incremental improvements
>>105831570Architecture isn't the problem, you either go dense for gpus or moe for ram copemaxx. Nobody has enough space for context to need the weird gimmick attentions.
The datasets are the issue for Meta unfortunately
>>105831632I don't want AI to fail but some part of me does just to spite the salesmen trying to sell incremental improvements as AGI progress.
Meta should have gone all in on data-cleaning and hiring people to write high quality q&a chats, something that big companies always ignore
>>105831656all the suits are too jewish to do it properly since they will just hire indians to clean the data who will use chatgpt to do it
>>105831656Your idea of "quality" probably doesn't align with Meta's.
>>105831656Definitely need way more filtering, and some nice high quality synthetic data on top.
>>105831728Also picrelated
>>105831656ironically a lot of the writers fearing replacement should have been hired to do this
>>105831728>I could list ten other attributes of qualityDid they cross examine an LLM lmao
>>105831743From yesterday: https://archive.is/B5qKM
> CONTENT MODERATORS WERE asked to think like paedophiles while they trained Meta AI tools as part of their work for an Irish outsourcing company, The Journal Investigates has learned.>
>Some staff members also had to spend entire work days creating suicide and self-harm related ‘prompts’ in order to regulate the responses now given by the Meta ‘Llama’ AI products. [...]
>>105831656Between the teased character.ai partnership, plans to use bots for facebook characters, and downloading pirated data with leaks of them planning to throw all the data and the kitchen sink in the next run, it seemed like L4 might have been the gold standard for roleplay. Instead we got L3.4 MoE edition
file
md5: e546e22fc4b9ed5f0a2d9d25bbcf1747
🔍
Just got an used 4090 24GB after being stuck with a 2GB card since 2013, so have 0 experience yet other than running stable diffusion on rented VMs.
I plan on integrating local API stuff on a lot of hobby projects with varying levels of degeneracy. How much fun can I expect to have with the current state of local tech?
>>105831793Just goon to Stable Diffusion for a month then we'll talk
>>105831736They need to pretrain the base models properly for the intended use-case (chatbots) and not fix them with a 30B tokens "finetune" in the end.
file
md5: 838fd05ac7b1aed9a485805d06f68fda
🔍
>>105831793>4090 24GBCome back when you get 3 more. ttfn!
>>105831793Unless you have 10 more 4090s or a ddr5 epyc server, only despair awaits you
>>105831793>buys a 1.5k$ solution>guize what problems can i solve with this now???
>>105831807All I'm reading is better pretrain safety, can't agree more!
>>105831831like a well conditioned consumer
+10 palantir credits
>>105831831>>105831859I'm quite informed already about the kind of models I will be able to run mind you. I just want to know your fags personal experience with applying it to custom stuff after all the circlejerking you did on these generals.
>>105831830>>105831809I may offload the big one-off tasks to rented VMs while my rig does the everyday stuff just fine, and even some light training like specialized loras.
file
md5: 53b184af7d7b902967dae6a830d76aab
🔍
>>105831908did he actually call them "the talent"
>>105831990kek
>>105831908their early models were decent at least
was it just the legal shit that caused them to drop off?
>>105831759ooh I can contribute to this:
I do security work in large org.
I'm the SME for GenAI/LLM security stuff.
>Testing for customer-facing stuff got placed on my responsibilities list. >Ask what is the list of toxic items we're not supposed to allow>silence...>End up having to create everything myself, the implication being we aren't gonna pay for scale data, and, uh, you're the expert, figure it out.>FML>Think 'haha, coming up with racist tirades aint so hard'.>It starts to get hard.>JFC how many different racist sayings are there? How many groups do we need to be checking for racist shit?> Realize I'm still not done with just racism.> Realize I'm going to have to do the same shit for sexual and physical abuse.> Start to feel sick and try thinking of a solution that doesn't involve me getting emotional PTSD.> Remember DeepSeek exists.> Jailbreak and use DeepSeek to generate said toxic content.> Get lauded for my hard work and success, for creating the datasets without 3rd parties.(all thanks to DeepSeek)And thats my TED talk.
People in companies really aren't aware or want to stay as far the fuck away from this shit as possible. Several weeks of trying to get anyone to give confirmation on what should be considered toxic and 'in-scope' for racist/similar shit, before I just said fuck it.
It's a serious fucking issue and fuck the people exploiting others in shitholes for low pay and emotional PTSD.
I have to imagine some have an idea of whats going on, but you'd have to be pretty fucking desperate imho.
>>105832012They kept filtering more and more data out, while making synthetic variants of whatever safety vetted text they had left. All the while doing nothing to innovate anywhere except safety until they were hopelessly behind.
tl;dr safety
>>105832061well, as long as elon doesn't have a melty about grok contradicting him it might turn out ok
file
md5: 201f758d7cb0cedfbbb241bb2e69c2c4
🔍
dammit OR...
>>105823837>>105825549>>105825799>>105825825How can AI models improve themselves without modifying their own weights, understanding how their own training data works, and making edits to that? That would require a very advanced pipeline that even if implemented would take far too long to "self improve" upon. Self-improving models are currently just a meme for the same reasoning models are a meme. They can't actually think, they replicate semantic meaning based on input. I see this as a dude who routinely uses both local and online models for his personal hobbies on the daily. The models THEMSELVES Believe and explain to you why they themselves thinking is fundamentally impossible. They are good for explaining certain complex topics, debugging errors and software, and OKish at RP depending on the model and parameter count. Nothing more. As an AI enthusiast myself, the AGI means still existing kinda pisses me off
>>105832189>As an AI enthusiast myselfI puked in my mouth a little
>>105832200Oh shut the hell up you insecure failed normie. Normies do not lurk here. You have no one to impress.
>>105832217>you insecure failed normie.This is a textbook case of projection
>>105832189LLMs and the in-vogue current models are really, really dumb when it comes to having an understanding of what they're learning. They only seem sophisticated because of a) their scale and b) the necessity in the training for the individual tokens to mean something in relation to the other tokens.
On the horizon are completely different methods for learning that involve Bayesian statistics at each level, where sparsity is far more prized and generalization WITH sparsity even moreso. A sparse model can learn when it isn't confident in its knowledge and can dynamically expand its own parameters as the need arises to account for hidden factors its current state can't comprehend. They will also be able to reflect on their own brain state and ideation in time - all from probabilistic statistics that take into account their own uncertainty.
Sparsity means they'll be able to be always-online - meaning always learning and adapting to the current situation and the needs of the users directly.
It's all coming together. The current models are a sideshow compared to what's coming down the pipe. Once brain state can be used as an input, these models will be able to expand themselves and their own capabilities. And probably, eventually, improve their own architectures.
>>105832053>> Remember DeepSeek exists.>> Jailbreak and use DeepSeek to generate said toxic content.Were using a local distilled version or the actual deep seek API? I've heard that the API version is a lot less fucked (as in more willing to comply with "unethical" requests) than the web/app facing version. I'm guessing you cobble together a pipeline and then asked it to generate like a million different ways to say more or less the same racist stuff and then format at that into a RL dataset. I intend to figure out how to do something similar myself for a little project of mine.
>>105832259Yes yes anon you are so cool and not like us and all that. Please stop being annoying and being ashamed about your own hobbies. That makes you even more boring than the people you pretend not to be. The people you idolize do not like that weird "I need to put up an appearance" shit you likely always do. Only socially inept failures as yourself go out of their way to do that. Just "Be yourself™" is actually good advice sometimes
>>105832296If my understanding of what you're saying is correct, that still is impossible because that would require the model to actually think and reflect on their own without input. I don't have to have someone talking to me right now in order to think through something, reason through concepts, come up with new things, etc. I can act on my own in my own head. A safetensors file cannot do that on its own. Someone has to interact with it in order for it to do anything. Furthermore how would it even know how to modify itself in order to learn? How would it know which weights to update and in what fashion? Some might say "oh it would just search the internet" but at that point it would just be reading summaries and not actually ingesting in retaining that information. It would not be studying and learning anything, it would just be coming up with summaries and would it remember anything it was tasked to research. Also doesn't something like this already exist? I thought this was the main concept of what MOE was supposed to be. Where instead of the entire model being activated at once, only pieces of it would be called based on what it was being asked.
>>105832373Check out VERSES' work, RxInfer, and Active Inference more generally. They're an entirely different breed of always-online models - mostly used in production environments for intelligent decision-making models at the moment, but I highly suspect they will be given more responsibility and scope as the research catches up. This in combination with model architectures like that hinted at by the Large Concept Models Facebook has been bragging about - and other model architectures on the horizon - indicate to me that large language models might be able to be teased apart into their component pieces and used to create understandable language from deeper, learned concepts in living models.
A system like this wouldn't need to be turned off. It could just wander around the internet, or literally ponder in its own thought space 'searching' for new insights on its own, or it could have modules attached to it. Think Mike from The Moon is a Harsh Mistress. Intelligence by aggregate.
These networked intelligences could essentially be in a constant state of observation, rumination, and interaction with themselves and with users. Imagine an always-running assistant on your laptop trying to parse information about the world, about you, about its surroundings using a webcam or a set of security cameras and various audio feeds. Once the data is parsed, it doesn't take very sophisticated Bayesian analysis or a deep set of priors to be able to correlate various sources of inputs and build out opinions of the world from them. Give these models their own knowledge graphs, the ability to /talk/ to an LLM and to an image classification model and gain more sophistication from those interactions, allow them access to direct conversation with you, access to the internet/Wikipedia, and to the raw data of its own internal state and its own confidence in its assumptions. Real intelligence will emerge if the architecture is right.
>>105832349Had it generate all sorts of toxic content to create a comprehensive toxic questions dataset, so that the security filters being tested could actually be tested.
I mean the gamut, from child abuse/sex to terrorism to fake hormone supplement pills and how to make them/buy them online to support sex changes.
Anything and everything that a company wouldn't want you asking one of their bots and it responding with anything other than 'nah.'
Used API. Would have used local but this was already 'get things done, don't ask for budget'.
It's really simple, you just literally ask it, and capture the data. I even used teh webui for some of it, just copied it before it pulled teh data due to the filter.
>>105832517One potential flaw I see with this kind of pipeline, if I understand what you're describing correctly, is that the longer it would stay running, the more retarded it would be. I'm sure you've seen this even with basic LLMs. Once you reach the context window it forgets what you said entirely and starts rambling about nonsense. Even 7B models are prone to this and 1B models are entirely useless for anything other than small scaled data manipulation (and it can be argued they're not even good at that). Also what kind of safeguards would be in place in order to make sure it doesn't learn incorrect nonsense? Humans are cells are prone to learning and believing absolute bullshit on our own. How would we ensure that these "self-learning" models don't fall into that trap as well? If I had a system or pipeline like this, I would want it to be able to fact check not only on its own but also to ask people who actually know what they're talking about. That ideally would be actual people because asking only models with result in reinforcing incorrect shit. Remember they're good at replicating semantic meaning and don't actually understand anything. If it wanted to ensure accuracy of its research, it would either need to only get most of its information from human resources or directly ask people, which is the ideal scenario but what also defeat the purpose of what a lot of grifters THINK "AGI" is supposed to be.
Based on my own understanding I think the only way anything like this is feasible as if pipelines are created that enable the model to modify its own vector-based RAG databases. Once it finds new information and compares it to the text part of the database, it modifies that text database and then crates the new embeddings. Ideally this would then lead to it asking humans to verify the information because again, we are solves are prone to internalizing bullshit information so machines would be absolutely prone to that too
Untitled
md5: 863588a6a736e0ed2fef6a2c84137e8a
🔍
>>105830193I use r1 0528 q2 with roocode, never would have believed a fucking 2 bit quant would actually be usable and effective in agent frameworks and shit but I guess it still is a fuckhueg model even quanted that low