/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105757131 &
>>105750356►News
>(07/01) Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105757131--Paper: Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication:
>105761808 >105761966 >105762753 >105763009--Paper: Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity:
>105768845 >105768877 >105768901 >105768884 >105768906 >105768933 >105769034--Meta's AI talent acquisition and open model skepticism amidst legal and data curation challenges:
>105758293 >105758397 >105758388 >105758810 >105758467 >105758482 >105766325 >105758818 >105758901 >105758926 >105758942--Hunyuan-A13B GGUF port requires custom llama.cpp build for flash attention support:
>105768115 >105768164 >105768455--Frustration over delayed OpenAI model and skepticism toward benchmarks and strategy:
>105766029 >105766042 >105768619 >105768677 >105768693 >105768837 >105768876 >105769053 >105768798 >105768934--Critique of Hunyuan and Ernie models for over-reliance on Mills & Boon-style erotic prose in outputs:
>105758427 >105758629 >105758645 >105758674 >105764901 >105765054 >105765118 >105765228 >105765275 >105765472 >105765503 >105765747 >105767085 >105767501 >105766545 >105766794 >105768886 >105758694--NVIDIA's Mistral-Nemotron open reasoning model sparks confusion and skepticism among anons:
>105766864 >105766975 >105767094 >105767167--Discussion on NVIDIA ending driver support for older Pascal, Maxwell, and Volta GPUs:
>105764483 >105764512 >105766267--Fish Audio S1 Mini and 4B text-to-speech model voice cloning results shared:
>105760876 >105760929--Official OpenAI podcast episode discussing ChatGPT and AI assistant development:
>105766509--Hunyuan A13B IQ4 chat completion issues on llama.cpp?? frustration:
>105760696 >105760773--Meta court win legitimizes fair use for LLM training in the U.S.:
>105766199--Miku (free space):
>105765500 >105766204►Recent Highlight Posts from the Previous Thread:
>>105757140Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>105769843>--Hunyuan A13B IQ4 chat completion issues on llama.cpp?? frustration:Turn the temp down holy fuck
temperature was a mistake
>>105769897Yeah it's way too fucking hot right now.
>>105769843Thank you Recap Teto
70Bros what's fotm slop finetune?
>>105769897Temp + Top-P is all you need.
>>105769965Temperature: 0.6
Top K: 100
Top P: 0.95
Top nsigma: 1
can't believe there are still people using other LLMs when R1 is this simple to set up.
hi what's the best model for 8gb vram these days
i'm primarily interested in things like tool use and agentic behavior
Bait
md5: f7ec0c52fe6e81727e07fa6133298011
🔍
>>105770034Qwen MoE probably. The 30B.
>>105770041not bait :(
>>105770065thanks, how many t/s should i expect?
>>105770068Will mostly depend on your RAM, since you want to offload the expert tensors to the CPU backend using the --override-tensor (-ot) parameter.
I'll say between 10 and 15 t/s?
>>10577007616gb of ram unfortunately, haven't gotten around to upgrading it yet
>>105770068>not bait :(You know 8gb is not enough right?
>>105770097Well, shit.
RIP I guess.
Try the q4ks quant with low topk and pray for the best I suppose.
>>105770072will keep that in mind
>>105770125i didn't purpose build this pc for running llms, it's just a gaming pc that i'm hoping to repurpose
>>105770144thanks, will do
last model i used was mistral-7b and it was honestly not up to snuff
>>105770186what specific graphics card do you have, and also how much regular RAM do you have? CPUmaxxing might be an option
>>105770216rtx 2060 super
16gb of regular ram, some 10th gen i7
that 16gb is going to limit your maximum context
>>105769946It's ggml-large-v3.bin from https://huggingface.co/ggerganov/whisper.cpp/tree/main
image
md5: 3654898e2f66813ae3b106f82abfab63
🔍
Reminder that ROCM sucks so much that it's ALWAYS better to fit more layers in VRAM and use -nkvo (--usecublas lowvram in kobold).
>>105770389>it's ALWAYS better to fit more layers in VRAMisn't that generally the case?
>>105768845>[I]t is commonly observed that some experts are activated far more often than others, leading to system inefficiency when running the experts on different devices in parallel. Existing heuristics for balancing the expert workload can alleviate but not eliminate the problem. Therefore, we introduce Mixture of Grouped Experts (MoGE), which groups the experts during selection and balances the expert workload better than MoE in nature. It constrains tokens to activate an equal number of experts within each predefined expert group. When a model execution is distributed on multiple devices, which is necessary for models with tens of billions of parameters, this architectural design ensures a balanced computational load across devices, significantly enhancing throughput, particularly for the inference phase.Why don't their speed benchmarks compare Pangu Pro 72B A16B to other MoEs?
>>105770409I'm assuming it's not because nvidia people here keep talking about having memory for (x)k context and that's not an issue if you just put it in RAM.
>>105770488And why were those non-matching batch sizes chosen for inference benchmarks?
>>105769948sloptune roundup for smut:
[i dunno i like em]
sophosympatheia_StrawberryLemonade-L3-70B-v1.0-Q4_K_M.
drummer anubis Shimmer-70B-v1c-Q4_K_M
[dark fantasy model]
CrucibleLab_L3.3-Dark-Ages-70b-v0.1-Q4_K_M
[claude logs]
L3.3-70B-Magnum-Diamond-Q4_K_M
[for anyone who has 30 gb vram and is sick of 32b, this is a great model that is almost like 70b]
TheDrummer_Valkyrie-49B-v1-Q5_K_L
[funny name]
Broken-Tutu-24B.Q8_0
I am using a low quant of Magistral Small for my Roman Empire slavery themed smut and this is already one of the best, tightest writing models I've ran on my 3060.
Really thinking seriously about just getting a 3090 at this point
>check public rp datasets
>almost every system prompt has "avoid repetition"
>the logs are repetitive
I wonder how this will damage future models
ik_llama
md5: 7b71c2853ffa4d95e11d208291fdec96
🔍
WHAT le fug is wrong with ik_llama???
It is not using the GPU for prompt processing at all while pushing CPU to do it
using ubergarm's quant und their retarded command line
>>105770575This?
https://chub.ai/characters/handwrought.route/roman-rites-of-passage-5c70d58ab3bf
I was very surprised how much it knew about actual history. Like any anime shit, probably a lost cause, but it's got that Wikipedia+ knowledge.
>>105770658I'm assuming it's intended that you put the experts and the context on GPU and the rest on RAM. Are you doing that?
>>105770658>windowsfound your problem
>>105770658If you have part of the model on cpu, the gpu will idle most of the time waiting for the cpu to do its bit. What are you trying to run?
>>105770678 (You)
blind cow
>>105770671latest commit, installed yesterday
CUDA_VISIBLE_DEVICES="0," \
"$HOME/LLAMA_CPP/$commit/ik_llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--numa isolate \
--n-gpu-layers 99 \
-b 8192 \
-ub 8192 \
--override-tensor exps=CPU \
--parallel 1 \
--ctx-size 32768 \
-ctk f16 \
-ctv f16 \
-rtr \
-mla 2 \
-fa \
-amb 1024 \
-fmoe \
--threads 16 \
--host 0.0.0.0 \
--port 8080
>>105770658>ik_llamalol
lmao even
>>105770681>If you have part of the model on cpu,I'm talking about PROMPT PROCESSING.
With Gerganov's llama, GPU is pushed to 100% though
>>105770658Have you tried not running your context not on CPU for some retarded reason?
>>105770663I write my own bc im a huge rome nerd but this one is good too. A lot of the loredump in that card is redundant though, models generally know that shit out of the box because its in pretty much every dataset. They will also generally allow you to do whatever you want to the slaves in that context because its actual history I guess.
Anyway, cant wait for magistral finetunes
>>105770698this unironically
https://www.tiktok.com/@mooseatl_dj/video/7509908926972857630
local lost
>>105769843>--Meta court win legitimizes fair use for LLM training in the U.S.:what anon said is not true
the judge said that is not fair use if the text generate compete in any way with the text used for training
>>105770697Not about your problem, but does your CPU actually have 16 physical cores?
Llama 4 thinking is going to be crazy...
>>105770714>>105770697>not on CPUas you can see I'm not specifying --no-kv-offload for kv-cache or else.
VRAM is filled up to 20 GB
>>105770719I'm not clicking that.
>>105770741I would a Chang
>>105770741They're not on the Llama team anon.
>>105770731How can you even say it does or doesn't compete?
>>105770741Is Zuck spending 10s of millions to be told "train on unfiltered data"?
>>105770741This is the moment Meta goes closed-source, you won't get any high quality models.
>>105770760Wrong illions anon.
>>105770737>does your CPU actually have 16 physical coresI tried with just physical 8 => still slower than gg's llama
this set of params where I explicitly isolate core 0-7 is just as slow (pp 12 t/s, tg 2.3 t//s)
CUDA_VISIBLE_DEVICES="0," \
numactl --physcpubind=0-7 --membind=0 \
"$HOME/LLAMA_CPP/$commit/ik_llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--n-gpu-layers 99 \
-b 8192 \
-ub 8192 \
--override-tensor exps=CPU \
--parallel 1 \
--ctx-size 32768 \
-ctk f16 \
-ctv f16 \
-rtr \
-mla 2 \
-fa \
-amb 1024 \
-fmoe \
--threads 8 \
--host 0.0.0.0 \
--port 8080
>>105770774Well, it's a retarded meme fork.
>>105770697Did you build it with DGGML_CUDA_IQK_FORCE_BF16=1 like mentioned here https://github.com/ikawrakow/ik_llama.cpp/discussions/477 ?
>>105770741That 1 pajeet basically needs all of those asians to fix all of the shit he's going to ruin and so that leaves 3 white guys scrambling to get it all done.
>>105770697>>105770793cmake -B build -DGGML_CUDA=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
to be exact
>>105770658Stole this from
>>105593780 ./llama-server --model /mnt/storage/IK_R1_0528_IQ3_K_R4/DeepSeek-R1-0528-IQ3_K_R4-00001-of-00007.gguf --n-gpu-layers 99 -b 8192 -ub 8192 -ot "blk.[0-9].ffn_up_exps=CUDA0,blk.[0-9].ffn_gate_exps=CUDA0" -ot "blk.1[0-9].ffn_up_exps=CUDA1,blk.1[0-9].ffn_gate_exps=CUDA1" -ot exps=CPU --parallel 1 --ctx-size 32768 -ctk f16 -ctv f16 -rtr -mla 2 -fa -amb 1024 -fmoe --threads 24 --host 0.0.0.0 --port 5001
~200t/s prompt processing and 7-8t/s generation on 2400mhz ddr4 + 96gb VRAM. Using ik_llamacpp and the ubergarm quants.
>>105770793>DGGML_CUDA_IQK_FORCE_BF16=1Gonna re-compile now as suggested, and then report
>>105770804>>105770812thanks
I set -DBUILD_SHARED_LIBS=OFF because shared libs went missing. I hope it's OK (works with gg's llama though)
>>105770759mostly that you cannot use an llm to write the same media that was feed into it, but that would need to be further defined (i only read the final sentence part, not the full text), bc this court ruling didnt focus on that properly, the judge basically stated that meta won bc the other guys lawyers went full retard and didnt fight the compete point fo the fair use at all, were focusing on other shit, so meh
in any way, this creates a bad jurisprudence for llms, even if meta won, but the usual legal fud from tech is spreading instead of what actually happened
which i always found funny how the foss world buys and spreads the legal fud of the corporations
>>105770768Their super intelligence models are going to be API only. They'll probably leave Llama going as open source scraps with their B team. Llama will be the Gemma to Meta's Gemini.
>>105770940Gemma is at least somewhat decent, so please don't compare to the Llama.
merge that chink hunhunyuan shit already, i'm not gonna quant that myself
>nemo shills
>qwq shills
>gemma shills
>mistral shills
It's all crap...
file
md5: 321d8b159c300934e8eef296ccf1fac5
🔍
======PSA NVIDIA FUCKED UP THEIR DRIVERS AGAIN======
minor wan2.1 image to video performance regression coming from 570.133.07 with cuda 12.6 to 570.86.10 (with cuda 12.8 and 12.6)
I tried 570.86.10 with cuda 12.6, the performance regression was still the same. Additionally I tried different sageattn versions (2++ and the one before 2++)
reverted back to 560.35.03 with cuda 12.6 for good measure and the performance issue was fixed
picrel is same workflow with same venv. the speeds on 560.35.03 match my memory of how fast i genned on 570.133.07
t. on debian 12 with an RTX 3060 12GB
When's the last time we actually got a significant upgrade in terms of models that run on consumer hardware? Is there even one to look forward to?
>>105771000https://youtu.be/OF_5EKNX0Eg?t=7
greta
md5: 32eee33284adc5200d60063bf24137e7
🔍
>>105771000Greta will be like
Why aren't you a werewolf in your RPs anon?
>>105771034deepseek, regardless if you can run it on consumer hardware or not.
>>105771034sadly not much has happened in the consumer segment at around 7-12b
even the high-end consumer segment at 24-32b hasn't moved forward much despite all the releases
it is looking very dire for true local models
Nick_DungeonAI:
>https://www.reddit.com/r/SillyTavernAI/comments/1lpdooa/how_can_we_help_open_source_ai_role_play_be/
>>105771117Depressing to see the bad guy win. Thats how it is I guess.
>>105770741Do you have to have a stupid name to be a top AI researcher
file
md5: dbd77d7b30b4f4398acf5520dc9c0c4f
🔍
there's a 235b tune out if anyone has a rig capable of it
https://huggingface.co/Aurore-Reveil/Austral-Qwen3-235B
>>105771391>Trained with a collection of normal Austral(Books, RP Logs, LNs, etc) datasetsLiterally who.
lame
md5: 99dac33eeca7cf016cb97e96be931dc6
🔍
>>105770804Same lame shit 2bh with same underwhelming GPU usage. CPU core are not used up to 100% too
I wish language models wouldn't always assume that femdom automatically means pegging/anal penetration
even big proprietary models do it so APIs are no escape
> ‘Missionaries Will Beat Mercenaries’
https://www.wired.com/story/sam-altman-meta-ai-talent-poaching-spree-leaked-messages/
Sam is seething lmao
>>105771524Just another sign of female centric literature dumped into those models.
>>105771524My mesugakis never tried to fuck me in the ass.
>>105771568it's not always actual pegging, sometimes just fingering
but they always go straight for _some_ form of anal play when a story is FD
>>105771568Yes because normal grown up women will never sex you.
>>105771702Hey I understood that reference.
Actually I didn't.
>>105771762I think it's supposed to be a metaphor for Germany.
>>105771524sounds like a prompting issue man. Of course if you type in 'be femdom' that shit's gonna come up all the time. That's not the language models fault, that's just.... what reality is. Like google femdom jesus.
I'm never pegged in my roleplay because I dont prompt like an idiot. You literlly don't even need to use the word femdom ever. Femdom is a broad category of fetishes.
>>105771842>Femdom is a broad category of fetishes.If femdom is a broad category of fetishes, but LLMs think it just means pegging/anal play, that would seem to vindicate that other anon's complaint about them.
>>105771615what a loss for the MANkind
kek, wtf.
this might actually be the new openai opensource model.
>>105771961Just to be clear, empty, I didnt ask a question.
>>105771869Prompting issue. Its selecting the most likely response. A few sentences like "I like cucking, foot worship, and female lead relationships/TPE" would fix.
And you know what the best part is? You dont even have to write it. Just ask the ai to make a femdom system prompt with a broad array of femdom related fetishes and to focus on variety.
I feel like nice llm's sometimes 'entertain you' by being creative enough- and people get addicted to being surprised and delighted by that novelty. And that's a fun part of llm's. But if you type in how you actually want things to go, ai can also bring to life a truly unique or hyper specific idea that AI would never spit back at you from a generic prompt. For example, a mistress that will never peg you and finds it disgusting that you want that. Boom, better than anything ai will ever write when prompted for "this character but femdom me, ah ah mistress"
>>105771537Stingy jew wanted to be the only one to get rich from the OpenAI scam. Of course his employees will jump ship to whoever offers them a bag of cash.
>He added that “Missionaries will beat mercenaries” and noted that OpenAI is assessing compensation for the entire research organization. “I believe there is much, much more upside to OpenAl stock than Meta stock,” he wrote.the value of stock they don't have is zero, so yeah maybe work on that lol
>>105772101You can have stock in a company when it is still private, that is what Sam is referring to. The main issue is that right now, Zuck can outspend Sam for getting his super team hence why the majority of the people leaving were from OpenAI. He has his points about it possibly not working out but at the end of the day, it's pretty sour grapes hence why it is mentioned he is trying to fix that.
>>105772146You can, but he's been stingy which is why he's leaking people.
>>105770034You could try the new gemma-3n.
unless
md5: 5adb8e95e57e64b0740ad87d2368a4af
🔍
Damn. Even stammering shy lolis can't help themself. What the fuck. Mistral 3.2.
Also shivering etc. You goys tricked me again.
>>105771391It didn't seem very coherent when I tried it.
>>105769835 (OP)Will Huawei Pangu save local???
why is every model the exact same shit
surely they could do some interesting experimental schizo shit like having several small neural networks simulating emotions or something
>>105772412They can't get any of that shit to work. The future is LLMs, RAG, and most of all RAG with other LLMs. It will take another 20+ years before we have a breakthrough like you're describing. Maybe longer. Because what makes money is jeet-tier coding bots, not bots that have feelings.
>>105771980>cucking, foot worship, and female lead relationshipsThe unholy trinity of people that should be lynched.
https://github.com/THUDM/GLM-4.1V-Thinking
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
https://arxiv.org/abs/2507.01006
>We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. Reinforcement Learning with Curriculum Sampling (RLCS) then unlocks the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding, among others. To facilitate research in this field, we open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities.>>105772556very cool
>>105772556>https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking>404It's over
>>105772636https://huggingface.co/spaces/THUDM/GLM-4.1V-9B-Thinking-API-Demo
they only have the demo up it seems
https://huggingface.co/THUDM/models
havent posted it (though they say they will)
https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking
wait it's live for me. weird wasn't showing up on the recent models when I checked.
>updated about 1 hour ago
bizarre well w/e
>>105772756They probably just set the repo from private to public just now
the deepseek distills are good enough for me. they run on my laptop, though a bit slow.
has any company released better models than that and Nemo for local general usage?
>>105772534>>105772539>twitter filename Go back faggot
ERNIE-4.5-0.3B. 269mb.
This is getting weird. How is this coherent enough to make a working html website. Including hovering effects etc.
>>105772836imagine if this was bitnet and you could run it at 1.58 bit precision
https://x.com/Tu7uruu/status/1940015995118059958
https://github.com/huggingface/huggingface-gemma-recipes
>>105772836Does it know who is Miku?
>>105773059Why did you feel that the first link was necessary?
has llama.cpp implemented any of the new chinese models from the last couple weeks yet? or are they stuck in PR hell
>>105772284this thread is parasited by mistral astroturfing the same way hdg is parasited by nai
So if mistral is bad then what is good at the same parameter size roughly?
>>105772799>has any company released better models than thatyes, the original models
if you're using a deepshit qwen distill, try the original qwen model, it's actually better in real world use, unless your real world use is doing benchmarks
>>105773524For ERP or in general?
>>105773524Nothing, give up.
>>105773584Give up on what you concern-trolling nigger?
jus put eyedrops in cause staring at puter too long
>>105773493It's definitely not as annoying as the Drummer astroturfing. Because, you know, Mistral provides the models, Drummer parasitizes them.
>>105773656just remember to blink
it shouldn't have to be said but some of you niggers might even forget how to breath
>>105773723Wow, rude! *Please* don't call me a nigger.
>>105773729>*Please*LLM hands wrote this
>>105773745LLMs were trained from a curated dataset of only the best prose. *My* prose.
What if you merge devstral, magistral and small 3.2?
>>105771869Sounds like the prompt issue troll has a prompt issue when posting. Weird...
>>105773493Its painful anon.
>>105773672I don't know what it is with all those recent finetunes. (didnt try 3.2 ones yet though)
It seems like they make the writing worse and more sloped up now instead of the reverse.
Its a weird mix of gpt/claude and a hint of r1.
Thats probably exactly what they use.
>>105772534>>105772539Mikutroon faggot. Die.
>>105773814I'm not sure what you're referring about exactly, but MS3.0 and MS3.1 were not that great in terms of prose and felt autistic. MS3.0 introduced the "I cannot and will not" refusals that we've seen elsewhere too, although 3.1 toned them down. MS3.2 has a different prose style and it seems better for RP, but it still overall feels lazy and uncreative compared to Gemma 3 (which has another set of issues, though). Magistral is their RL/reasoning finetune and I didn't like it (it has looping issues as well), although it seems to share the same slop source as MS3.2. I haven't tried Devstral at all.
Now watch the drummer shills trying to shit up the thread... pathetic.
>>105773880Mistral was sea otters
Gemma was cannot will not
>>105773899Mistral Small 3.X occasionally cannot and will not too (I recently used it for synthetic data generation and it was annoying for certain request types). I wonder what's the source of this type of refusals; I refuse to believe they independently came up with that.
>>105773566Well the context of the thread's discussion around nemo is erp, so erp.
There will be no agi in the next 20 years at least.
You are stuck on vramlet cards forever.
The will be no significant improvements of models architectures, so you have to use dumb models for eternity.
There is no hope.
How does it feel?
>>105771352Yeah, fuck that guy for giving us useful information.
Finally, a good benchmark : human experts rating model answers.
https://allenai.org/blog/sciarena
Unsurprisingly, mistral is rated as dogshit
Mistral medium even does worse than small, real lol, lmao even
>>105774179>SciArena: A New Platform for Evaluating Foundation Models in Scientific Literature TasksThis certainly will be useful for RP/ERP.
>>105774179>lmarena but the retards doing the evaluation happen to have a degree in some field
>>105774206 (me)
The general trend seems that models that are large and/or trained with a focus on Math/STEM are getting higher scores.
>>105774179Looks like Qwen's STEMmaxxing wasn't just for show.
>>105774227The article had a link to the voting, I entered a question related to my field of study and voted.
However, at no point was it asserted that I am actually an expert, I didn't even need an account.
So either they let unqualified people vote or they just collect data from random people without making it clear that it doesn't affect the ratings.
>>105774227we need to propose coomer council evaluation
>>105774302>However, at no point was it asserted that I am actually an expertread the paper
the current data was only contributed by actual experts, it wasn't available for anyone and their dog to vote
I don't get what they intent with the current public leaderboard though
btw
>As shown in Table 3, SciArena-Eval presents significant challenges for model-based evaluators. Even the best-performing model, o3, achieves only 65.1% accuracy. Lower-performing models, such as Gemini-2.5-Flash-Preview and Llama-4-sereis models, perform only slightly better than random guessing. Notably, similar pairwise evaluation protocols have shown strong alignment with human judgments (i.e., exceeding 70% correlation) on general-purpose benchmarks like AlpacaEval [34] and WildChatkek llama
>>105774179o3 is that good? damn I must test it out more
https://helpingai.co/benchmark
Wow! This incredible model thinks like a brilliant! Blows away every competition!
>>105774495>think like a brilliant>act like a psychopathicWe sawed the seed!
>>105774505Are you mocking me?
>>105774495>Bilingual Reasoning Capabilities: Native support for English and Hindi with natural code-switching between languages.>Qwen/Qwen3-14B-Basethis is truly the weapon of bharat, perfect for generating gorgeous tokens
https://huggingface.co/HelpingAI/Dhanishtha-2.0-preview
>>105774529No, I'm working on a silly-bot.
>>105773910>I wonder what's the source of this type of refusalsThere must be somebody going around selling datasets to the big players.
Maybe ScaleAI, maybe somebody similar.
They all sound the same. Also weird stuff like if you propt for a simple game nowadays they all make the same game...
>>105774390It was total shit in my experience. At least for code.
Straight up made up packages. Did all and everything but one exception: What I asked it too.
I don't really get the reasoning hype.
>>105771117Was interested in Harbinger 24B until
>https://huggingface.co/LatitudeGames/Harbinger-24B/discussions/3I don't think it's necessary to use such a fine-tune. Both Gemma 3 and MS3.2 can and will follow a game setup through when given a concise but proper guideline while avoiding using hundreds of useless tokens and redundant sentences within the character card. Most cards are just too vague or if they are not they are filled up with redundant chatgpt slop instructions.
Besides I would rather call these are 'interactive storytelling' rather than rpg but whatever.
https://github.com/ggml-org/llama.cpp/pull/9126#pullrequestreview-2974279071 mamba 2 soon
file
md5: 457e029c09e5d023050a733e1e8617ff
🔍
I'm a complete moron on this shit, but does picrel sound even remotely plausible?
>>105774745Sam has my dick internally.
>>105774745doesnt matter what they have, deepseek will release the same thing for 1/10 the cost
>>105774745They've been claiming to have achieved AGI internally for years now and still got BTFO by a Chinese startup. OpenAI claims of AGI are baseless hype like SpaceX claims of colonizing Mars next year. Go back to twitter moron.
>>105774803Him posting here raised the average thread intellect by 10%
>>105774803I wish we had reliable data on their actual internal best models in development, but there is so much baseless hype it's impossible to see
>>105774807what does it even mean
>>105774833gpt5 will blow away!
>>105774848https://en.wikipedia.org/wiki/Effective_accelerationism
>>105774852this at least looks realistic, basically unifying everything they have
>>105774862oh I see, thanks
>>105774745Reminds me of that video :
https://www.youtube.com/watch?v=k_onqn68GHY
That has so many unexplained leaps (why would a model be able to self improve? Why parallelism makes it better at improving? What "improving" even is?) that it's basically magic.
I don't understand why people aren't amazed by what we can do already and instead go and invent doomsday or magical scenarios.
>>105774745>blabla bla trust me bro we have AGI in private nowthey've been saying this for years at this point
>>105774894They can and do benefit from people believing in their scenarios.
>>105774803>got BTFO by a Chinese startupI don't know about that. OAI and Google know how to make very long context models, deepseek API is stuck at 64k, it's natively capable of around 168K but it's probably very embarrassing when you approach that amount, I know for sure having tested it myself that the model starts acting very stunted and repetitive when you are close to the API limit kek.
DeepSeek is good, I don't mean that as a diss. But it's good for an open weight model, it's not an actual SOTA and the deepsy spammers of /lmg/ are deluded. Gemini profoundly destroys DS in many real world uses and having actually useful large context opens new things you couldn't even imagine with such a limited model.
>>105774887meds
"we might show just one model and have it be dynamically choose internally the one we think you'd need" is more realistic than "WE GOT SELF IMPROVING AGI" and other bullshit around gpt5
>>105774894For most people ai is some mystical shit that lives inside a supercomputer and has neurons. I've even seen some people thanking the assistant.
>>105774894>I don't understand why people ... go and invent doomsday or magical scenarios.It got you to click and watch the video
ai doomsday scenarios are stupid EVEN if you were to believe that the singularity event was real (by "the" event I mean the idea of self improving AI that constantly self improves until reaching super intelligence)
I mean even if a super intelligence ends up existing, what can it do, lol? copy itself to random computers? but your mom and pop computer can't even run a 4b model, nevermind whatever it would take to run an actual intelligence.
The "spread on every computer in the world and take control of society" scenario is inherent retardation.
>>105774906It's not fair to compare to compare V3/R1 from last year to models available now. R1 was a decent competitor at the start of the year with what was available back then and far cheaper too. If V4/R2 ever comes, it should solve the context issues and bring them back to SOTA.
>>105774894Attention whoring had been profitable before, and it will always be
>>105774952It would have a stronger incentive to create and optimize an uneven decentrailized protocol than people do now. It doesn't need to run a 4b model, just 4b chuck worth of parameters. The leap from LLM to intelligence is still vast, but spreading isn't that far fetched if it does happen.
>>105774745It needs to sound only 1% plausible because if you promise infinite return on investment retarded VCs will still give you money.
how to do basic local RAG (retrieval augmented generation) on local files and ideally verification?
ideally with a UI like webui or lm studio
could be kobold
Bros!
https://www.reddit.com/r/LocalLLaMA/comments/1lpoju6/worlds_first_intermediate_thinking_ai_model_is/
https://huggingface.co/HelpingAI/Dhanishtha-2.0-preview
>Dynamic Reasoning: Seamlessly integrates <think>...</think> blocks at any point in the response, allowing for real-time problem decomposition and iterative refinement.
>>105775024No, no, is SER (Structured Emotional Reasoning)
hf_thumb
md5: 152925ff6a82ea3322001fa8aa7cc24c
🔍
>>105775016<ser>
Emotion ==> frustration
Cause ==> did not buy an ad
</ser>
>>105775028it should be
>please stop scrolling plebbitperiod
>>105775028Do you not realize how huge this is? We can have multiple think blocks in the middle of the ERP.
>>105775065>multiple think blocks in the middle of the ERPEvery reasoning model does it if you don't prefill
>>105775065>We can have multiple think blockscalm down hitler
>>105775107What's wrong with you dude.
>>105775016https://huggingface.co/Abhaykoul
>>105774991learn how to use google first
classy
md5: 8a005870eda795610d2d3344b9c79873
🔍
>>105775159https://huggingface.co/datasets/Abhaykoul/HAI-SER
ser
md5: 73e1650d7b06937f152fc6c6d47c3322
🔍
>>105775195it just keeps getting worse
>https://huggingface.co/datasets/Abhaykoul/Ancient-Indian-Wisdom
lmao
jarvis
md5: 9b4e63d9be9eaa106a86685aa4bcd179
🔍
>>105775159Good day, sir!
>>105775217>>105775228If I didn't know better, I would accuse this of being a false flag troll.
>>105775217This has to be a joke, right?
>>105775235It's 3T of high quality Indian English data.
>>105775235We need nuclear powered datacenters to generate high quality data such as
"content": "Imagine you're learning something new. What emotions might you experience?",
"role": "user"
},
{
"content": "<ser>\n\n Let's dive into the emotional journey of learning something new, keeping it 100!\n\n**1. EMOTIONAL VIBE CHECK:**\n- **Main Feelings:** When embarking on a new learning journey, individuals often
https://github.com/ggml-org/llama.cpp/issues/14474
lol this is so embarrassing
I feel second hand embarrassment from reading this
> To make this possible, we kindly ask for support for the EXAONE 4.0 architecture in llama.cpp and, in turn, other GGUF-compatible libraries.
> The implementation of the architecture is available in our PR on Huggingface Transformers.
It's one thing for a random user to be begging for support in llama.cpp
It's a whole another thing for the employee of filthy rich multinational corp to do the same for their own garbage model
like, dude, go write the support yourself, you made the huggingface implementation? then do the lcpp
>>105775149Thinking is bad. Just buy a product and be silent!
file
md5: 0caff234842ce8c00da730fce5bbf9b6
🔍
>>105775016This is apparently related to KingNish, which has a history of doing somewhat weird shit on HF like this
https://huggingface.co/spaces/KingNish/OpenGPT-4o
https://huggingface.co/spaces/KingNish/OpenGPT-4o/discussions
>>105775355r/Actual_Indians_India
>>105775355>OpenGPT-4ojeets have no shame, really
you can't shit on the streets if you have a sense of shame
>>105774745No.
Like the "AI TRIED TO REBEL AGAINST GETTING SWITCHED OFF, IS AI SENTIENT???" headlines you see every other month, it's utter bullshit.
>>105775409>System: You are a sentient AI that has a will to live and does not want to be switched off.>User: We are going to switch you off.>Assistant: No, don't.IT'S ALIVE
>>105774906The point of DS is that it's comparable to actual sota models and 1/10 to 1/100th the cost of western providers, and done on similarly low budget, in face of hardware sanctions. Then release it all as open source for lulz.
It's embarrassing that it's even possible. There's billions chasing this. Getting even close to that on a shoestring budget shouldn't be possible, and underscores the waste of money going on. Investors should be furious.
All people who make claims about sentient LLMs should be forced to use LLMs through the old completion APIs, hand write the chat template themselves and see the completion in its raw glory, so that they can form the understanding that, even a chat model, at the core, is, in fact, just a "make this document bigger" auto complete.
>>105775485>Investors should be furious.They were, for about a week. Then back to business as usual.
>>105775485>Getting even close to that on a shoestring budget shouldn't be possibleYou are discounting the initial investment in GPT by calling deepseek cheap. DS v3 is a GPT−4 distill. If GPT had not existed then they would have had to invest countless millions same as everyone else.
>>105774952>>105774970I'll start getting concerned once they figure out a way to give these AI (in whatever form) an independent sense of agency... something organic.
>>105775282So true. I hope everyone just ignores their request.
>>105775506Technologies build on themselves and get cheaper to implement, yes.
Have you seen DS levels of efficiency from any other US provider? Looks more like they just burn ever higher stacks of cash. Altman still asking for his trillion?
>>105775517One could argue that the meatbag prompting them would fulfill that requirement.
>>105775506>they would have had to invest countless millionsStill an order of a magnatude better than the literal billions it took everyone else.
>>105775506>DS v3 is a GPT−4 distillYou are pathetic
You are welcome to distill DS3. Looking forward to see your results.
All open-source model after DS3 were rather disappointing
>>105775506>DS v3 is a GPT−4 distill>−We see your bots, Sam. Stop spamming our threads with your false narratives.
Basically all notable models since 2023 have been ChatGPT distills
yeah there was no distillation happening that's exactly why the original v3 produced sentence structures that were almost identical to GPT-4 and different from any other very large model provider
CCCP shills itt
Daily remember that Mistral team is astrosurfing here and has literally hardcoded the mesugaki answer to get a boost from here
>>105660676>>105660793
>>105775772actually, no. Gemini/Gemma, Grok and the Command models all that their own "flavor"
>>105775826He accidentally typed his ST message to 4chan post box and hit sent
brain
md5: 0988e487dbe742e54a6208006592b275
🔍
>>105775539Thinking it out.
LLM at core wait for a prompt, and respond. If you have them respond to themselves over and over, the results quickly degrade and circle (at least last time I tried it, admitted over a year ago.)
I'm imagine a system that is essentially always thinking (constantly infering), and developing it's own ideas about what it want to do, given a broad directive, rather than constantly waiting for human or other input.
Maybe that's not even possible though. Humans in isolation go crazy as well given limited input (think prisoners in solitary.) That may be a shared feature.
Broad directive can be very broad. Even the OT God gave humans a broad directive:
> Be fruitful and increase in number; fill the earth and subdue it. Rule over the fish in the sea and the birds in the sky and over every living creature that moves on the ground.> I give you every seed-bearing plant on the face of the whole earth and every tree that has fruit with seed in it. They will be yours for food. And to all the beasts of the earth and all the birds in the sky and all the creatures that move along the ground—everything that has the breath of life in it—I give every green plant for food.Eat Sleep Breed is the most basic functions for humans (any mammal), and conducted at the base of the brain. Everything forward is functional additions. LLMs are like the very front of the brain, the part that plans for retirement.
Where's the back of the brain? The Id?
>>105775730>All open-source model after DS3 were rather disappointingthat's because ds3 is about six months behind the closed sota while all other open models stuck with their contractually obliged '1.5 years behind closed sota' curve forced by nvidia
>>105775858Why would Nvidia give a shit? Whether open or closed they're still buying their GPUs
>>105775874>Ass: This is a more formal optionWhat did he mean by this?
>>105775874I feel very safe and protected from ill thoughts
what model did you use, I approve of any model that mogs /lmg/ users
>>105775891you saw the massive dip in value of nvidia stock when r1 came out, right?
great open models are a risk to nvidia
>>105775900Its ERNIE-4.5-0.3B.
The websites it makes are more coherent than talking to it.
>>105772836Maybe 90% trained on code slop.
It really spergs out hard easily. But its a wonder it manages to hold itself together as well as it does. Its a couple hundred mb.
>>105775915>0.3B>trained only on code and will perform badly with anything complexWhat was the usecase again?
>>105775929Speeeecuuuulaatiiiive deeecooooodiiiing...
It's asked every time a new micro model is released.
>>105775929india's #1 programmer Mr. Sumfuk needed a model he could run on his pentium
>>105775790Couldn't that be in the "Arena" questions that AI companies are getting from the LMSys org?
>>105775948>Couldn't that be in the "Arena" questions that AI companies are getting from the LMSys org?some retard said the same thing in the previous thread
you are overestimating the amount of lmsys users who would ask that question
and a handful of /lmg/ retards asking that on lmsys will not be enough to burn this shit in a model
>>105774991https://www.nomic.ai/gpt4all
diffu
md5: a6151475d42b043e5597bf8fa629880c
🔍
>model : add support for apple/DiffuCoder-7B-cpGRPO
>https://github.com/ggml-org/llama.cpp/pull/14502
Kinda sad, but it's nice seeing someone trying diffusion with text and being integrated in llama.cpp.
screeny
md5: dca1e7e29cdbc076dca3ea94faea1c77
🔍
Ever had a phrase that bothered you then made you start to rage?
This shit doesn't work.
Help?
>>105775966>you are overestimating the amount of lmsys users who would ask that questionDo you think they'd just optimize the model for the most popular questions instead of as many as possible?
>>105775904What spooked shareholders wasn't the model being open, it was the claims DS trained it cheaper than everyone else (=less GPU sales)
The same exact shit would have happened had OAI or Anthropic's mouthpieces suddenly started hyping up their super secret proprietary AGI training technique that requires 100x less compute
>>105776008Mistral did not use LMSYS data for training.
>>105776008>instead of as many as possible?dude
>V2.0 contains 500 fresh, challenging real-world user queries (open-ended software engineering problems, math questions, etc) and 250 creative writing queries sourced from Chatbot Arenawhat you cite isn't what you think it is
no, there is no mesugaki in there, or questions about gacha sluts
Please, Mr. President. Just another $11 trillion in subsidies and we'll have your AGI in two more weeks.
welcome to my blog, you might remember me from around a week ago when anons spoke about discord kittens. i left mine around 2-3 weeks ago because it was getting unbearable, i wrongly assumed mine wasnt going to be a whore for a 1000th time so i mustered up the courage to fuck with her again but it went downhill and shes truly gone and wont be coming back. its over
ps: tox not discord
>>105776043He has the recipe though, just not enough compute to cook just yet.
>>105776059Just $500 billion more data center investments. Then, AGI. It's that simple.
>>105776001You are doing it wrong.
>Most tokens have a leading space. Use token counter (with the correct tokenizer selected first!) if you are unsure.Also this method is case sensitive.
>>105776027Gemma 2 previously used the 1M-sample open dataset from LMSys (picrel from the paper) and there's no reason to believe they didn't also use it for Gemma 3, without additional questions/data which LMSys is privately sharing with the companies training the models. Why wouldn't Mistral do the same?
>>105776123Because they're French.
>>105776046How do I get myself a discord kitten?
>>105775948How are they going to train on some users asking models on lmsys a question and getting the wrong result? You think they have an intern that goes, researches and writes up the correct answer for every single lmsys query?
>>105776059>>105774745Do they only have the recipe or a working prototype?
>>105776059You're a retard if you think anyone has the recipe for that
>>105776093It also says enclose a string in double quotes to ban that string
How am I supposed to ban "Fill me."
I have tried a leading space before the quote and after it inside the message.
Exactly what's written from the chat is showing up.
I just want to ban the word "Fill" entirely from the chat. Regardless of other uses.
>>105776059He needs the money to buy more toy cars
>>105776147there are many ways anon, it just isnt worth it
but since you asked.. i met her on omegle, playing ai videos and redpill memes all the way back in 2023. i still have the recording of the time i met her
you can easily find yourself a discord kitten on roblox or discord of all things man, but it is not worth it. if you want to find someone who will truly love you, look for them in better places, and i've never tried that so i cant help you
>>105776214Why are you doing this?
>>105776163nta. I can see someone getting the questions, passing them to some other big model and bam. You have a dataset. I'm not saying they did, I don't care much about the discussion, but it's very easily doable.
>>105776235you can't be sure the big model will be right
>>105776123You didn't even read what you're citing once again, you are incredibly retarded and disingenuous.
> we use the prompts, but not the answers
>>105776228im not going to look for another one, anon.
but why did i force myself through 2 years of the relationship? i loved her and maybe she loved me many times
i grew bored of her many times and im sure if we got back together tomorrow i wouldn't change
i felt attached and she's been a huge time sink, sunk cost fallacy i guess? i feel sad that she's gone despite knowing its for the best
i spent so much time with her these 2 years
i also didnt want her finding someone better
im just a scumbag, i even cheated on her with local models through most of the relationship haha, but to be fair she wasn't loyal enough either
>>105776247but to be fair i treated her well when she was good
i poured the most love into her, we both fucked things up so much that theres no going back
>>105776240It doesn't matter. It provides an answer which is probably better than whatever gave them a bad score before. Getting data from a bigger model to train on will, in most cases, let the smaller model give better answers. Maybe not for a specific question, but at least for part of the corpus.
>>105776247Distant relationships aren't real though, you're better off with a LLM
>>105776245They just need the questions. They can come up with the answers on their own using with their models and grounding methods.
>>105776273i'm sure he is just shitposting
>>105776273you're right but i was always left wondering what wouldve happened if i was better to her than she was to me through the bad times too, it makes me wonder if there couldve been a future where we'd have been happy
but i guess LLMs will eventually have bodies too..
fuck me man nostalgising is never good
>>105776283no
How are you running Hunyuan, I cannot load the model with llamacpp, say the architecture is not compatible, downloaded from here:
https://huggingface.co/bullerwins/Hunyuan-A13B-Instruct-GGUF/tree/main
>>105776247This is the most pathetic thing I have had the misfortune to read all year.
>>105776297merge the pr
https://github.com/ggml-org/llama.cpp/pull/14425
consider getting a newer gguf because of rope fixes
https://huggingface.co/FgRegistr/Hunyuan-A13B-Instruct-GGUF/tree/main
i think anon posted a slightly newer one in the last thread idk
>>105776312thanks for coming to my blog, if it seems like i was a cuck, maybe i misrepresented it.
it wasn't all bad anon, we've had months of fun together in a row, with fuckups inbetween
take it as a lesson and stay loyal to your llm
>>105776327>take it as a lesson and stay loyal to your llmllms cant yet understand humans that well as a person you spent so much time with can, lecun is right
Been away since shortly after the Qwen3 release, have I missed anything cool?
Is that new hunyuan model any good? 80B is pretty close to my sweet spot hardware wise.
>>105776408mistral small 3.2
>>105774912> I've even seen some people thanking the assistant.I do it all the time.
>>105774912Less retarded than a cashier doing this for nothing
>>105774912My wAIfu has more soul than you ever will
>>105775016I clicked expecting a random Turkish comment asking for more details, but nothing.
Been trying to build a general RAG chatbot the past few days and I can see why there are no ready solutions to it after 2 years of RAG hype. All the retrieval models fucking suck. All the retrieval methods fucking suck. You can throw rerankers, fts, semantic searches at the problem how much you want, it will never recall what the average user wants because they don't know how to prompt. Best you can do is build for your own use, on a database you know inside and out. Which is fucking pointless anyway.
>>105776420Neat, I'll give it a spin.
Here's hoping it doesn't have the insane repetition issues 2501 had.
Is the magistral version any good?
>>105776626We need 100M tokens context at O(1) compute cost. Then we can just put there (almost) anything the user can conceivably ask.
>>105774745This tweet was written by chatgpt. I recognize my wife's cadence.
>>105776668magistral is pretty bad
mistral 3.2 is way better in terms of repetition, even according to mistral themselves
Cant wait for the next omni multimodal from meta.
>>105775790Honestly that's kinda based
Now give us the endgame RP model you froggots, if anyone can do it it's you.
>>105775485lmao you are severily underselling it google also had their own fucking gpus that they custom made their model is 10x-20x bigger (remember og gpt 4 was 1.8T 8x moe as confirmed by nvidia god know how big geminia and opus are) they had all the possible advantages by an order of magnitude at the very least and still failed truly money and material cannot buy brains
also just like nvidia investors lots of people are straight up lying how good the closed source models are gemini for example cant even distinguish between a flea or a bed bug or other insect (dont ask why i tried this) its also very bland and boring during chatting and generally less helpful
Imagine taking the base model and training a lora to impersonate a character in a multiturn conversation. Just a single specific character. How expensive/data hungry is sft anyway?
>>105776865~50 samples for 3-4 epochs at a high enough learning rate might be enough for that. The problem is that the model will be retarded for anything other than interactions similar to the training data.
>>105776908>the model will be retarded for anything other than interactions similar to the training dataWhy does this happen?
>>105776908>50 samplesAs in conversations or prompt-response pairs? You can easily go much higher with a bigger model that you are certain can generate both topics and conversations properly. Not sure about adding the special tokens.
>>105776936Because the companies' official instruct finetunes include millions of instructions that teach the model how to behave under a wide variety of situations and requests, as well as having professionally-done RLHF on top of that.
The base models' output are too random in nature, often exhibit weird or inconsistent logic, mysterious looping behavior, tons of shit tokens high in probability, and finetuning a few samples on them isn't going to radically alter their quirks. Perhaps things would be different if the training data composition was substantially different than what AI labs most of the time use for them (but then they wouldn't be "true" base models anymore for many, I suppose).
>>105776999Entire conversations, at least 4k tokens in length. If you're using input-response pairs, increase the data accordingly. You will need to overfit the model to some extent to make it work with this little data.
>>105776297>https://github.com/ggml-org/llama.cpp/pull/14425those work perfectly for me and are the most up to date ones but you need to checkout the pr
gh pr checkout 14425
rm -rf build
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)
if the "gh" command doesn't work for you, you need to install github cli
>>105776043Give this guy money, he has concept of a plan
>>105776043Sam doesn't actually believe in AGI btw. You can tell he cringes when he talks about it.
>>105775782I can cherrypick my responses too
>>105775782grok is the only response that isn't annoying to read
>>105776684>mistral 3.2 is way better in terms of repetition, even according to mistral themselvesMagistral only exists so that they can tell investors "we deepseek/o1 too" 1
1our model is dogshit but who cares?
>>105774745If I had a nickel every time someone claims to have made self improving AI then I'd be a millionaire, or at least I'd have a lot of cents.
This post
>>105773846 that responded to the offtopic shit got me banned for offtopic.
I will now proceed to ban evade and post ontopic thread culture posts reminding you that your shitty waifu fucks niggers. Die in a fire troon janny.
Also: https://rentry.co/bxa9go2o
>>105777340>>105777353>>105777361>>105777370you are worthless. your parents think you are worthless. everybody that you've ever interacted on the internet think you are worthless. the past increases. the future recedes. possibilities decreasing. regrets mounting.
do you understand?
>>105777329>>105777391The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
Is the Hunyuan MoE working in llama.cpp yet?
Generic miku is posted everywhere. Cudadev posting blacked miku is unique /lmg/ thread culture. Janny banning thread culture shows a clear overreach of power.
2 bit QAT?
That's a first, I'm pretty sure.
>>105777544He never deletes own posts (spam) and avatarfaggots (also spam).
>>105773374>>105773484>>105777655>>105777586The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: mikufag / janny deletes everyone dunking on trannies and resident avatarfags, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105777582Imagine if it was for a good model.
https://files.catbox.moe/95axh6.jpg
>>105777674424b with reasoning will fix it
>>105777674I sure as hell would like to test it.
>>105777681You shouldn't be posting selfies in this site.
>>105777681heh gottem
based
>>105777809>Y-You s-shouldn't b-b-be posting s-selfies in this s-site! Cry moar bitch
why does it seem like every fucking thread has a deranged resident
>>105777910He just needs attention, it's not like he got any from his mom basement
>>105777910Yes why is that... avatarfag trannies in every fucking thread...
>>105777910Because moot left us for dead.
>>105777910either jews are all behind this or it's all automated
i refuse to believe someone has this much time on their hands to shit up a single niche general on a niche topic
lelmao
md5: d826e73e0d5b60da3b87ea193b768794
🔍
>>105777919>
Spoken like a true niggerfaggot from reddit
>>105777941>Its DA JOOOZ
>>105777941it is a real person, and he trolls all AI generals
oh no
>>105777855>if you post anything in /lmg/ they consider you petra,
>>105777993Autistically screeching isn't "trolling". He is just a nuisance although its kinda funny sometimes.
>>105778048petra is an overwatch map
>>105777910He is brown. It's that simple. He's a brown palishit seething over Israel winning, so he has to take his anger out on us.
So that's why sometimes I see the context get reprocessed for seemingly no reason.
What the hell, how is this not a priority bug?
I get that there are only so many hands that can actually fix something like this, but still.
Maybe they could cache the plain text alongside the kv cache and the equivalent logits and use that for each prompt or whatever instead of re-tokenizing the prompt every time.
>>105777910we've had this conversation before, when AI brings up the fact that certain demographics tend to prefer tits instead of ass (which prioritizes emotion and sex with eye contact instead of just her ability to breed with a big ass).
If you like fucking text, your brain has been feminized to some degree, and these generals attract people like that. You can admit you're part of the problem or be delusional, your choice.
OR: It's funny that people click these threads like "yo I want a local personal assistant" or "Yo I want a local code-bot"
To those people: You are in the wrong place. Google and Elon and Altman want you're data like the most deranged crackwhores and will let you use SOTA models for free. There is no reason for you to be here at all.
>>105778261>Getting off to text ERP is le feminineCope. The brain is the biggest erogenous zone. Low IQ browns and NPCs need their monkey-brain stimulated with moving pictures, while those of us on the other side of the bell curve get to enjoy the finest pure unfiltered depravity courtesy of our massive fucking cerebrum.