← Home ← Back to /g/

Thread 106287207

366 posts 94 images /g/
Anonymous No.106287207 >>106287287 >>106288960
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106278215 & >>106269950

►News
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model
>(08/14) Gemma 3 270M released: https://hf.co/google/gemma-3-270m
>(08/12) Ooba v3.10 adds multimodal support to the UI and API: https://github.com/oobabooga/text-generation-webui/releases/tag/v3.10
>(08/12) Jan-v1 for web search, based on Qwen3-4B-thinking: https://hf.co/janhq/Jan-v1-4B
>(08/11) GLM-4.5V, based on GLM-4.5-Air: https://hf.co/zai-org/GLM-4.5V

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106287210
►Recent Highlights from the Previous Thread: >>106278215

--Running Qwen3-235B on 3090 with AutoRound and avoiding shared memory pitfalls:
>106278283 >106278326 >106278391 >106278545 >106278550 >106278693 >106278757 >106279781 >106279825 >106281329 >106281388 >106279868 >106279926 >106280022 >106280015 >106280136 >106280374 >106280389 >106280393 >106280443 >106280450 >106280451
--Is the golden age of local LLMs over due to stagnation and lack of open foundational models?:
>106285989 >106286015 >106286167 >106286190 >106286198 >106286025 >106286036 >106286058 >106286074 >106286109 >106286146 >106286028 >106286087 >106286243
--High RAM setups vs corporate VDI centralization for local LLM use:
>106278462 >106278506 >106278598 >106278739 >106278777 >106278823 >106278830 >106278847 >106278903 >106278913 >106278946 >106279046 >106278985 >106279019 >106279007
--Dynamic MoE inference and efficiency improvements in llama.cpp development:
>106280498 >106280521 >106280633 >106280644 >106280671 >106280560 >106280745 >106280818
--Modular small models with dynamic routing as an alternative to large monolithic models:
>106278501 >106278547 >106278625 >106278657
--Deepseek's Huawei bet delays Nvidia comeback, pushing users to alternative models:
>106285826 >106285840 >106285914 >106285933 >106286011 >106285850 >106285873 >106285902
--Grok 2 open-source announcement sparks anticipation and skepticism:
>106285546 >106285608 >106285635 >106285912
--Splitting and hosting large quantized models on Hugging Face with storage limitations:
>106284117 >106284123 >106284283 >106284780 >106284798 >106285085
--Lack of reliable LoRAs for consistent 16x16 pixel art generation:
>106284801 >106285420 >106285537 >106285647 >106285821
--Miku (free space):
>106278466 >106278492 >106279395 >106279412 >>106282523 >106282599 >106282785 >106283278 >106283542

►Recent Highlight Posts from the Previous Thread: >>106278217

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106287214
►Recent Highlights from the Previous Thread: >>106278215

--Running Qwen3-235B on 3090 with AutoRound and avoiding shared memory pitfalls:
>106278283 >106278326 >106278391 >106278545 >106278550 >106278693 >106278757 >106279781 >106279825 >106281329 >106281388 >106279868 >106279926 >106280022 >106280015 >106280136 >106280374 >106280389 >106280393 >106280443 >106280450 >106280451
--Is the golden age of local LLMs over due to stagnation and lack of open foundational models?:
>106285989 >106286015 >106286167 >106286190 >106286198 >106286025 >106286036 >106286058 >106286074 >106286109 >106286146 >106286028 >106286087 >106286243
--High RAM setups vs corporate VDI centralization for local LLM use:
>106278462 >106278506 >106278598 >106278739 >106278777 >106278823 >106278830 >106278847 >106278903 >106278913 >106278946 >106279046 >106278985 >106279019 >106279007
--Dynamic MoE inference and efficiency improvements in llama.cpp development:
>106280498 >106280521 >106280633 >106280644 >106280671 >106280560 >106280745 >106280818
--Modular small models with dynamic routing as an alternative to large monolithic models:
>106278501 >106278547 >106278625 >106278657
--Deepseek's Huawei bet delays Nvidia comeback, pushing users to alternative models:
>106285826 >106285840 >106285914 >106285933 >106286011 >106285850 >106285873 >106285902
--Grok 2 open-source announcement sparks anticipation and skepticism:
>106285546 >106285608 >106285635 >106285912
--Splitting and hosting large quantized models on Hugging Face with storage limitations:
>106284117 >106284123 >106284283 >106284780 >106284798 >106285085
--Lack of reliable LoRAs for consistent 16x16 pixel art generation:
>106284801 >106285420 >106285537 >106285647 >106285821
--Miku (free space):
>106278466 >106278492 >106279395 >106279412 >106282523 >106282599 >106282785 >106283278 >106283542

►Recent Highlight Posts from the Previous Thread: >>106278217

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106287274 >>106287297
Mikulove
Anonymous No.106287278 >>106287519 >>106287524 >>106287534 >>106291731
Today is the day. It's been a long wait... but local is saved. DeepSeek just changed EVERYTHING.
Anonymous No.106287284 >>106287303 >>106287307
I remember Llama 1 and 2
Anonymous No.106287287
>>106287207 (OP)
>tattoos
whore
Anonymous No.106287297
>>106287274
hii butifel gurly com to bangladesh i marry you and fuck you 12 foot penis u like it
Anonymous No.106287303
>>106287284
Ok, grampa, get in your wheelchair
Anonymous No.106287307 >>106287317 >>106287668 >>106288077 >>106288500
>>106287284
Me too. Alpaca was a mistake we still haven't recovered from.
Anonymous No.106287317
>>106287307
I said from DAY 1 that training on cloud synthetic outputs for style changes giving benchmark boosts was doing more harm than good. But noo. No one ever listens to grumpy old anon.
Anonymous No.106287318 >>106287343 >>106287347 >>106287358 >>106287379
I want to run GLM Air, and maybe other bigger models. I can't right now. Right now I have a 16GB gpu + 32GB RAM (~24 of that is free usually). Thinking about going up to 96GB of RAM. Would that be plenty with my current GPU to do GLM Air with large context sizes? Alternative would be getting a 128GB kit to start with, but 128GB kits are like double the price it seems. I could buy 192GB (4x48) for the same price of 128GB it looked like lol
Anonymous No.106287343 >>106287369
>>106287318
You don't need a kit. Yes, you read that right: you don't need a kit to fill up RAM. Look at cheapest individual RAM sticks with fastest MT/s. Get as much RAM as possible, as much as your mobo and cpu support, once you taste a big model, you won't go back, you will want more.
Anonymous No.106287347 >>106287358
>>106287318
Anonymous No.106287358
>>106287318
show bobby cute girl i am from india

>>106287347
fuck off benchod.. she is mine
Anonymous No.106287369 >>106287397
>>106287343
You trolling me mate? I heard too that once you get to like 128GB+ you usually can't run it too high clocked or it won't be stable.
Anonymous No.106287379
>>106287318
Why is Maki working at Home Depot?
Anonymous No.106287397
>>106287369
Bro, I am running 768GB, bought each stick as separate and it works. They are all of the same model though. Not sure what those redditors where you heard it from were using.
Anonymous No.106287455 >>106287459 >>106287467 >>106287498
are dense models doomed to be extrermely slow the moment you offload just one layer to the cpu
Anonymous No.106287459
>>106287455
Yes. dense models are obsolete
Anonymous No.106287467 >>106287494
>>106287455
All models get much slower once you touch the cpu. MoEs are just so fat that you'll never get to experience their speed on vram alone
Anonymous No.106287483
Being a vramlet with only 24GB vram, is Rocinante still the best for cooming?
Anonymous No.106287494 >>106287501
>>106287467
Overrated. I've run 235B entirely in VRAM and only got 15 t/s.
Anonymous No.106287498
>>106287455
You can offload tensor layers specifically and the speed penalty will be much lower
Anonymous No.106287501 >>106287540
>>106287494
Unquantized?
I find it hard to believe because I can run 235b at 10 t/s with only 64gb of it in vram and the rest in sysram at ~4bpw.
Anonymous No.106287519 >>106287534
>>106287278
You saw it too? It's going to be exciting now that we finally get something not even cloud models do (except possibly Grok 4, unclear since we can't see behind the hood)
Anonymous No.106287524 >>106287532 >>106287534 >>106287566 >>106287581 >>106287691
>>106287278
And then GPT 5 mogs it let alone GPT 6 which is too unsafe internally
Anonymous No.106287532
>>106287524
How do you like them apples?
Anonymous No.106287534
>>106287278
>>106287519
>>106287524
Schizo hypeboys
Anonymous No.106287540
>>106287501
Q6. -cmoe gets me 10 t/s.
Anonymous No.106287566
>>106287524
GPT-5 is so good it broke my entire codebase, causing me to switch back to Claude
Anonymous No.106287572 >>106287589 >>106287608
Just lost over 150k of context because aistudio didn't save it
Anonymous No.106287581 >>106287604
>>106287524
GPT 5 is so good it "optimized" unrelated code, causing my program to break. Even local GLM 4.5 wasn't that retarded, it touched only relevant parts of the code.
Anonymous No.106287589
>>106287572
LOOOOL
GGEEEEEEEGGGG
HAHAHA
XDDDDD
Anonymous No.106287604
>>106287581
They want to sell paid accounts. This is why the normal tier loops back and forth even for simple queries- jewish tricks. It's just like your prompting ability has suddenly degraded.
Anonymous No.106287608 >>106288367
>>106287572
It could be in your browser's cache still.
Anonymous No.106287612 >>106287625 >>106287683
whens the next breakthrough happening? I want to run 500B-1T params models on my shitty PC
Anonymous No.106287625
>>106287612
Berry soon.
Anonymous No.106287666 >>106287705 >>106287727
What are my best options for ocr (japanese)? I've already tried gemma 3 12B, and wanted to try glm 4.1 v thinking but llama.cpp doesn't support it. Any other models I should look at?
Also does vLLM work well with cpu only inference? they supposedly support glm 4.1 vision but I dont want to deal with wsl only for it to not work on cpu
Anonymous No.106287668
>>106287307
What will be the next era? 3 months of nothing again? Or R2 starts something new?
Anonymous No.106287683
>>106287612
March 2026 :)
Anonymous No.106287691
>>106287524
I've still got to wonder what the fuck Sam was high on that week. OSS and GPT-5 were shittier than anyone could have ever imagined, and everybody is still laughing at them as they lick their wounds and bring 4-fucking-o back
I can pretty safely assume Sam is talking out of his ass or is so horrifically schizo that he doesn't understand how little value his model has
Anonymous No.106287693 >>106287728 >>106287733 >>106287741 >>106287744
Qwen coder 480b at q8 is the best coding model. Prove me wrong
Anonymous No.106287705 >>106287735
>>106287666
Are you happy with G3 12B? I tested Mistral 24b and it claims to have ability for so and so language, but it clearly doesn't handle Finnish language. Even basic forms are broken. It does recognize words. Maybe it was a context problem too because all what I do is in English.

Maybe Japanese seems to be easy to translate because of their system is somewhat different from latin based languages, but I wouldn't trust any small llm. If you are proficient then it's easy to verify of course. If not..
Anonymous No.106287727 >>106287757 >>106288442
>>106287666
You want to use external text recognition -> ascii prompt.
Tesseract.
If you think that some bs local model is going to read an image for you should go back to school.
Anonymous No.106287728
>>106287693
>480b at q8
I can't run that so i'll never know
Anonymous No.106287733
>>106287693
I can't and I won't.
Also it's very comforting to have a really good open source coding model.
Anonymous No.106287735 >>106287750
>>106287705
>Are you happy with G3 12B?

kinda, for legible printed-like text works well, for non-trivial images it has some hiccups but gets most of it correct
Anonymous No.106287741
>>106287693
I feel like Qwen 235B 2507 Reasoner is better for some things
Anonymous No.106287744
>>106287693
Thanks to free request and more people joining in, it will only keep getting better.
Anonymous No.106287750
>>106287735
So you are not happy.
Anonymous No.106287755
So was the unslopped openai 20B model a nothingburger?
Anonymous No.106287757 >>106287803 >>106287821 >>106288442
>>106287727
>tesseract
You could've at least suggested PaddleOCR, tesseract sucks ass. I started exploring llms for ocr only because gemini 2.5 flash worked so much better than anything else when text wasn't in ideal printed form. Other ocr that got anywhere close to PaddleOCR was MangaOCR but still sucked ass and wasnt anywhere near vllms performance
Anonymous No.106287769 >>106287779 >>106287801 >>106287939 >>106288210
The soon to be released deepseek model understood me on a level deeper than any chatbot. I used the same 100% private "benchmark" that consists of an introduction of myself, that I use with all the models. It picked up on things I didn't explicitly state or even imply, which honestly kinda freaked me out at fiirst, but looking back I can kind of see the dots it connected to figure it out. Still impressive as hell. I'll be shocked if it doesn't top EQbench by a large margin.
Anonymous No.106287779 >>106287783
>>106287769
Goddamnit Drummer nobody cares about your DeepSeek finetunes
Anonymous No.106287783
>>106287779
let thedrummer(c) cook
Anonymous No.106287801
>>106287769
Okay, but how does it suck dick? Is it geminislop or gptslop? How much trivia does it know? How stubborn is it?
Anonymous No.106287803 >>106287809
>>106287757
I tried to help. Why didn't (You) suggest anything but instead you are whining like a passive aggressive bitch instead?
Anonymous No.106287809
>>106287803
Sorry typing on phone and no glasses: problematic
Anonymous No.106287821 >>106287849
>>106287757
You do know that these are subset of models just like ERSGAN etc is to image scaling? Tesseract is a software, not a model.
Anonymous No.106287840 >>106287864
We are all shitting on sam for not releasing old models, but what about google? Why don't they release old PALM 540B and other historical llms?
Anonymous No.106287844
What's the latest news on bitnet? Last I heard things were heating up there. It's only a matter of time before we get a big model using it.
Anonymous No.106287849
>>106287821
I do know that, it's just the only japanese model I found for it sucked ass (maybe something changed since last time I checked it out), and just assumed it was due to architecture required by tesseract models or something like that, my bad
Anonymous No.106287852 >>106287909 >>106287938 >>106292074 >>106292747
I hate that it's so hard to make LLMs come up with good stories. It typically fills your outline with fluff, it never gives you a story.

For example, I'll say write a story about someone doing X, and it will spend 10 paragraphs using verbose and colorful language, only to finally land at someone doing X. There is 0 creativity in between.

How am I supposed to prompt?

Currently, I'm prompting something like, "explore ". I use the word "explore" in hopes that it will explore the subject and introduce different angles. I then write "Chapter 1: " and let it do the rest, but the result is still terrible.
Anonymous No.106287864 >>106287883
>>106287840
uhhh that would not be very safe would it, what if it sucked your cock?
Anonymous No.106287883
>>106287864
Or worse: roleplayed without slop. We can't have that in safe and ethical society.
Anonymous No.106287909
>>106287852
read basic of story telling and give all that as instructions
Anonymous No.106287938
>>106287852
Don't use instruct mode, use mikupad to reduce it's bias towards corporate agent behavior. Add tags, warnings, autrhor's note, summary, and have it continue it. Though many instruct models simply can't write and go full assistant even without a template. r1 and v3 do fine if that's of any help
Anonymous No.106287939 >>106288164
>>106287769
What is the parameter count? Can I run it on my potatoe?
Anonymous No.106288077 >>106288156 >>106292105
>>106287307
lmao i remember how back when the boom started i thought that by 2026 we would be locally running AGI on a phone
Anonymous No.106288156 >>106288162 >>106288205
>>106288077
Anonymous No.106288162 >>106288349
>>106288156
I don't know why people act like this is some epic counterargument. Modern women are literally like this
Anonymous No.106288164
>>106287939
about tree-fiddy
Anonymous No.106288205
>>106288156
Anonymous No.106288210 >>106288227
>>106287769
>le deeper level
You pretty much disclosed you're a mighty retard.
Anonymous No.106288227
>>106288210
Presumably he's a time traveler thus might be pretending to be retarded.
Anonymous No.106288317
>>106286041
Only V3 was SDXL. The new models aren't.
Anonymous No.106288349 >>106288487
>>106288162
anon how would you feel if you dident have breakfast this morning ?
Anonymous No.106288367
>>106287608
These days all content on pages is dynamically generated by JavaScript and these dynamic requests don't get cached. "It's probably in the cache" has outlived its usefulness 10-15 years ago. I miss those days. You could save a website and it would actually save the content too, rather than just an empty husk.
Anonymous No.106288442 >>106288657 >>106288687 >>106288736
>>106287727
>>106287757
>paddleOCR

dead project. And yes, tesseract sucks ass. So far, gpt4o and gpt5 work pretty damn well for online stuff, local, I don't think it gets better than mangaOCR. It's -very- good at what it does, if you're looking for extremely hard to read stuff, online models and/or google lens can work.
There's also this weird program that I thought was an api frontend for google lens (it wasn't) called text sniper that does a damn good job for most languages. I think it might be tesseract based but I can't tell for sure. what's your use case?
Anonymous No.106288487
>>106288349
I had breakfast this morning retard
Anonymous No.106288500
>>106287307
whatever you say, coomer
gpt-oss is the best local model
Anonymous No.106288641 >>106288653 >>106288672 >>106288704
>mfw it's 3am and I'm siting here arguing with my llm about a problem but we're making progress
Anonymous No.106288653
>>106288641
Her unbelievably hairy pussy...
Anonymous No.106288657 >>106288798 >>106288930
>>106288442
>dead project
damn, in my tests it was usually better than kha-white/manga-ocr (possibly due to built in image preprocessing it has idk)
My usecase is "normal" text + slightly stylized text with non-zero background noise
picrel isnt the best example, but shows my point.
for online stuff I currently use gemini 2.5 flash which works better than paddle/mangaocr and I get free 250 requests daily via api + undisclosed amount of requests via aistudio
Anonymous No.106288672
>>106288641
hi pretty girl pleast shot vergana
Anonymous No.106288687 >>106288930
>>106288442
I think problem is that moon runes are detailed but latin alphabet has only 26 of them and they are based on type which could be imprintend using a chisel and hammer (Trajanus).
Resolution doesn't matter that much but Chinese and Kanji are way more complex to recognise.
Anonymous No.106288688 >>106288717 >>106288762 >>106288763 >>106288777
What's the model on the left? I can't find anything about it. It seems surprisingly smart.
Anonymous No.106288695 >>106288714 >>106288755
Using a large model (GLM 4.5 Air) for the first time. I noticed a problem that didn't happen with smaller models that fit into VRAM.
I started out with 6-8 tokens/second, but now after a couple of hours I'm down to 2 tokens/second.

I'm using a 32k context size. Is this slowdown normal as more context is filled, or am I running out of memory?
Anonymous No.106288704
>>106288641
Thank you mr. safety cat to preventing me from seeing the man ass!
Anonymous No.106288714
>>106288695
>Is this slowdown normal as more context is filled
Yes. You're dragging more and more context through each layer, so it takes longer.
It was happening with the models in vram too, you just didn't notice, presumably because they were running much faster.
Anonymous No.106288717
>>106288688
looks like another lmarena mystery model
probably gemini3 or something
Anonymous No.106288736 >>106288930
>>106288442
>dead project
It had its last commit on main 5 hours ago, has nearly 6k contributors and is run by Baidu.
Anonymous No.106288755 >>106288909
>>106288695
This depends. With a simple chat even longer context is cached and won't be that much of a problem. Performance shouldn't degrade that much unless you are letting the model babble on without telling it to be more concise. You can always examine your stats in ST or whatever.
You should also self limit yourself. Any discourse with model has its specific and relatively short lifespan unless you're a moron.
Anonymous No.106288762
>>106288688
https://old.reddit.com/r/Bard/comments/1mq7kzn/pantom07251_is_not_gemini_3/n8sov5a/
Anonymous No.106288763
>>106288688
gemma 4
Anonymous No.106288777 >>106288788
>>106288688
folsom? I think it's one of chinese. I really hate it though, gives me the stupidest answers, and it is not the first iteration that I hate, so likely a newcomer.
Anonymous No.106288788 >>106288812 >>106288897
>>106288777
>spamming multiple models to optimize on lmarena
llama5?
Anonymous No.106288798
>>106288657
>that artist
Good taste.
Anonymous No.106288812
>>106288788
Or even worse: their commercial model. I would laugh really hard if it's the case. All those billions just to make llama 3 405b again.
Anonymous No.106288897 >>106288904 >>106288925 >>106288926 >>106288929
>>106288788
Anonymous No.106288904 >>106288925 >>106288926 >>106288929
>>106288897
Anonymous No.106288909
>>106288755
It's cached for prompt processing but generation is still slower
Anonymous No.106288925
>>106288897
>>106288904
Amazon sucks at llms. Theirs are dumb and soulless, perfect for corporate usage.
Anonymous No.106288926 >>106288940
>>106288897
>>106288904
lmarena said many times that they do stuff to make these types of question no work, like inject random other company names and stuff.
Anonymous No.106288929 >>106288942
>>106288897
>>106288904
>trusting what models tell about themselves
Im more inclined to believe time-traveler-deepseek-anon than this
If they weren't given this info in system prompt then it's straight up hallucination
Anonymous No.106288930 >>106288950 >>106288964
>>106288657
can't be arsed to try, but your image should be perfectly readable with mangaOCR, you would have to segmentate it properly beforehand tho, since the reader order is not traditional and the styles change in between areas.
I would OCR "kyamera o" first then "tomeruna!" in two different goes. As for the lower text, I'd divide it in chunks, probably one or two lines at the same time. Manga OCR tends to shit the bed with longer texts/bigger text.

>>106288736
I know. I just seethe that paddleocr can't be run on my troonix distro. Also, the output format it has gets on my nerves.

>>106288687
true. rez is crucial with moonrunes. However, latin text has way more "fonts" and tesseract shits the bed more often than not when there are many variations.The example that anon posted before tho is perfectly fine, rez wise.
Anonymous No.106288940
>>106288926
>people ask a shitty mystery model who made it
>it says it's by amazon/apple/openai
>it blows up, rumours spread that the next model by [struggling company] or [major player] is actually really shit
>billions in stock value wiped out
I can't see this go wrong
Anonymous No.106288942
>>106288929
Many instruct models have at least their name baked into the weights.
Anonymous No.106288950
>>106288930
*reading order
*can't run

it's too early.
Anonymous No.106288951 >>106289116
Color me impressed.
I asked GLM 4.5 Air to quote the earliest thing it remembers from our chat, and it quoted verbatim flawlessly my first message 18000 tokens ago.
And it was technical as hell. I asked it to explain what the different tensors in the GLM model do, like attention and up/down FFN tensors.

I used 8 GB models before this, and they regularly misinterpreted what I told them 10 lines ago.
Anonymous No.106288956
pp with 100k context is much faster now thanks to the partial swa cache
Anonymous No.106288960
>>106287207 (OP)
>6GB VRAM + 32GB RAM
How over is it, doc? Is there a model I can run that'll be comparable to JLLM at least?
Anonymous No.106288964 >>106288994 >>106289096 >>106289195
>>106288930
mangaocr in two passes (top and bottom line separately):
それ以上と
どころも

mangaocr in one pass:
キャンプもとおすす

paddleocr in one pass:
とめ3な
+一x+

gemma 27b:
キャンバスを とめるな!

Im not testing bottom part cuz everything that isnt tesseract deals with it well.
Anonymous No.106288994 >>106289197
>>106288964
OCR with Gemma 3 was supposed to mainly work with Pan&Scan, but that never got implemented properly anywhere.
Anonymous No.106289096
>>106288964
right, I actually got off my ass to try that. Yes, mangaOCR shits the bed on the upper text and does fine on the lower one (ウチにのみ/フの付いたドア。
着替えに元々勢いていたパンツと
バミバンするサイズのホットパンツと
逃げ子クするくらい障手の
キャミソールが掛けてある。)

My money is on the outer white surrounding the hiragana/katakana making it shit the bed.

gpt reads it just fine:

カメラをとめるな!
Anonymous No.106289116
>>106288951
Another fun party trick is to tell it to put all messages so far into a markdown block with formatting for user/assistant messages and lines delimiting them.
Anonymous No.106289170 >>106289202 >>106293852 >>106293989
>>106285914
periodic reminder that all "DeepSeek failed to train on Huawei" news are suspect, at least people at DS tell me not to trust insane rumors. The journalist doesn't even understand the stage of Huawei's chip development, there literally was no platform for DS to do a V3-scale training run until very recently and it's implausible that they'd have been forced to debug prototype hardware with a novel architecture just to make Xi happy. She also apparently doesn't know about V3-0324, R1-0528, Prover-V2 and other evidence that they've been continuing their normal development cycle, or about actual Ascend use by DeepSeek (for inference).

China is not that cartoonishly retarded and DeepSeek doesn't share anything with the press. I think the most likely hypothesis of the source is cope from a disgruntled Huawei employee or a competitor like Alibaba.

The next model is most likely V4 with NSA, coming out in a matter of weeks. I predict it'll have been trained on normal Nvidia GPUs, for 3+ months, and retards will claim that this is a second run after the "failure" to train "R2" on Ascends by May to "capitalize on hype". All of this is layered confusion of Western and Chinese hypemongers and journalists.
You can trust me or FT, your choice.
Anonymous No.106289195 >>106289681
>>106288964
GLM-4.5V with greedy sampling:

<|begin_of_box|>キャメラをとめろ!
ウチのみノフの付けたドア。
着替えに元々穿いていたパンツと
ハミパンするサイズのホットパンツと
透けチクするくらいの薄手の
キャミソールが掛けてある。<|end_of_box|>

Not perfect but better than those at least.
Anonymous No.106289197 >>106290450
>>106288994
someone implement this tx
Anonymous No.106289202
>>106289170
>I think the most likely hypothesis of the source is cope from a disgruntled Huawei employee or a competitor like Alibaba.
I'm gonna go with the default hypothesis.
Hi all, Drummer employee here No.106289208 >>106289418
After Fallen Gemma being shit I wrote off Drummer's finetunes for a while, but after trying out Big-Tiger I think it's actually better than normal Gemma for RP. Slightly less stop and a lot more willing to be lewd without excessive hand-holding, and it didn't become spontaneously retarded like Fallen-Gemma had a habit of being.
Anonymous No.106289238 >>106289241 >>106289265
native tool calling status - still broken
Anonymous No.106289241 >>106289306
>>106289238
tool calling status - still a meme
Anonymous No.106289257 >>106289316 >>106289414
next year, we'll be able to play Split Fiction with a local model
Anonymous No.106289265 >>106289422
>>106289238
0 non-meme use case
Anonymous No.106289274 >>106289312
reminder that llama3.3 70b is still the most coherent model for long context https://github.com/adobe-research/NoLiMa
Anonymous No.106289306
>>106289241
Cursor says otherwise.
Hi all, Drummer employee here No.106289312 >>106289408
>>106289274
I can't run it at decent speeds so I don't care
Anonymous No.106289316
>>106289257
you VILL get your pure txt-txt piece of benchmaxxed trash that cant even remember the last 50 seconds and you VILL like it
Anonymous No.106289408
>>106289312
I love you Drummer. I hope you'll find a job soon.
Anonymous No.106289414
>>106289257
Actually interesting target. That'd be my bar for AGI.
Anonymous No.106289418 >>106289440
>>106289208
What can you even merge with Gemma 3 in the first place? I think that these community made fine tunes are more like hard baked loras with various weights, some are burned and some less burned. Every time model's brain gets lobotomized anyway.
Anonymous No.106289422 >>106289434 >>106289491
>>106289265
coding agents like opencode, tool use can be quite useful for coding in actual real jobs
Anonymous No.106289434
>>106289422
>it's useful for people with real jobs
so no one in this thread
Anonymous No.106289440 >>106289504
>>106289418
fine tunes aren't merges, they could be considered similar to loras. You're shifting biases towards different tokens.
Anonymous No.106289491
>>106289422
>Another deleted database in production in the making
lmao
Anonymous No.106289504 >>106289515
>>106289440
Yeah it's my bro science. With image models lora weights always make any model too dumb.
Would be interesting to visualize vectors in a 3d space. All people see is some benchcucked graph which is quite abstract.
Anonymous No.106289515
>>106289504
Yeah graph is a summary of these certain averages but I didn't mean this.
Anonymous No.106289612 >>106289621 >>106289986
https://huggingface.co/KBlueLeaf/HDM-xut-340M-anime
perfect model for ser
Anonymous No.106289616
Does GLM 4.5 Air have any uncensors, remixes, merges, degenerate loras or anything? Or do people use the base model for roleplaying?
Anonymous No.106289621
>>106289612
>340M
I still can't run this!
Anonymous No.106289671 >>106289741 >>106289763 >>106289792 >>106289882 >>106290251 >>106290280 >>106291608
This is my first time fune-tuning a LLM.

Because I'm retarded, I will go with python scripts provided by unsloth brothers,. and at first try something small like gemma-3-270m

Related question: LoRA vs. full model tuning

Is LoRA just as valid for LLM as it is for video/image generation?

Or should I go for full model tuning?
Anonymous No.106289681 >>106289730
>>106289195
that's not bad at all. what ui are you using?
Anonymous No.106289729 >>106289748 >>106289889
Years ago i got scammed by some chink on alibaba
but yk it was actually worth it if my money went to Qwen.
Anonymous No.106289730
>>106289681
A basic gradio I made with sglang.
Anonymous No.106289741 >>106289870
>>106289671
try both and report back to us
Anonymous No.106289748 >>106289771
>>106289729
Alibaba is just a B2C platform
Your money went to the scammer merchant
Anonymous No.106289763 >>106289870
>>106289671
the training dynamics are different for the different methods, go with whatever method you plan to actually use if your going to scale up. you could try both methods and see what you like better for yourself, but if you plan to scale up I think qlora is the only real path forward.
Anonymous No.106289771
>>106289748
then fuck that chink.
Anonymous No.106289792 >>106289870
>>106289671
Problem is that you can't probably see any difference with such a small model.
Model tuning is always "better".
Anonymous No.106289815 >>106290305
>switched text embedding to qwen 0.6b
>it steals my gpu vram 2.5 GB
I had to run it under torch cpu version
Anonymous No.106289870
>>106289741
>>106289763
>>106289792

ty for nice words, kind anons
Anonymous No.106289882
>>106289671
Experiment as much as you can and report back with the results, even big companies are just throwing shit at the wall to see what sticks.
Anonymous No.106289889 >>106289928
>>106289729
Meanwhile your taxpayer dollars are being used to fund saltman's next abortion
Anonymous No.106289928
>>106289889
I wish tech commies weren't bunch of troons...
i would've liked to be on that side
Anonymous No.106289986 >>106290022
>>106289612
This works on comfy?
Anonymous No.106290022 >>106290034 >>106290651
>>106289986
find out
Anonymous No.106290034 >>106290651
>>106290022
Yea I will try later. I'm interested with 512x, SD mode sovl
Anonymous No.106290251
>>106289671
LoRAs originated for LLMs, even if everybody nowadays bakes them in.
Anonymous No.106290280 >>106290382
>>106289671
No, LoRAs for LLMs create intruder dimensions within the LLM that do not exist in it originally or come to be when you do a proper finetune.
https://x.com/TheTuringPost/status/1854856668229910757
Anonymous No.106290291 >>106290316 >>106290703
Does anyone have any success with using Gemma 3 270m as a draft model for speculative decoding of Gemma 3 27B?

It seems to crash llama.cpp
Anonymous No.106290305 >>106290446
>>106289815
use the superior nomic v2
Anonymous No.106290316
>>106290291
The inference slowed down for me despite both being offloaded to the gpu.
Anonymous No.106290378 >>106290473
Gemma 3 270m is the best model in the 1B size range which is kind of insane.

I notice that this holds true for the other gemma models as well. Gemma 3 27B is the best model at 72B and under.

What does DeepMind do with Gemma so that it outperforms all models even 3-4x its size?

I wonder if Gemini 2.5 pro is actually also a small(er) model than Claude 4 and GPT-5
Anonymous No.106290382 >>106290684 >>106291443
>>106290280
Ok, they create intruder dimensions. Who cares if the model can do its task almost as good as with FFT at a fraction of the cost? Next you'll tell me quantization is bad for the model compared to running in full precision.
Anonymous No.106290446
>>106290305
I'm stupid, it works with CUDA_VISIBLE_DEVICES
I didn't know different libraries treat '' and '-1' differently.
Anonymous No.106290450
>>106289197
it works properly in ollama (unfortunately)
Anonymous No.106290473 >>106290516
>>106290378
What are you doing with a 270m model to make that call?
Anonymous No.106290485 >>106290500 >>106290531 >>106290579 >>106290600 >>106290685 >>106290837
So we have had 5 memes now in (local) LLM space now, right?

>Meme 1: Function calling
Every local llm suddenly had function calling support being pushed and it was barely used outside of very niche applications
>Meme 2: Long context
Every llm was being trained on longer contexts but only in a token way for retrieval benchmarks but real performance on long context is still shit
>Meme 3: Reasoning/Thinking + Math and Coding comprehension
Suddenly all LLMs were trained for being code assistants and doing math even though it made no sense for their size and skills didn't transfer between domains. The extra thinking tokens are usually wasted outside of these domains as well.
>Meme 4: MoE-maxxing
This meme isn't over yet but slowly being superceded by meme 5, Every big model now is some MoE and the big model smell disappears immediately because the true model intelligence is equivalent to active parameters at a time. Knowledge went up but reasoning and nuance took a huge hit.
>Meme 5: Agent training
Reduction in hallucinations and "consistency" at the expense of writing quality and intelligence. Worst meme so far as these models aren't even actually used in any agentic workflows yet anyway

What will meme 6 be?
Anonymous No.106290500
>>106290485
Multimodality and Omni doesn't feel like it's been played out yet or given a serious shot.
Anonymous No.106290516 >>106290539
>>106290473
It's one of my hobbies to always test the performance of the smallest LLMs out there to see how far you can push them. I do some in-house tests like asking about naruto characters (popular anime) to test knowledge, if it succeeds at shallow questions I go a bit deeper into naruto lore and see where it breaks down.

I then ask some reasoning questions and made up riddles to check common sense.

Last I load up one of the very simple RP cards made up of 3 sentences and try to roleplay a bit.

Gemma 3 270m outperforms all models at 1B size or smaller in these domains. I actually wonder how far you can push things and if, for example, in 10 years time we'll have the equivalent of Claude 4 Opus under 1B parameter size or if we are going to hit a fundamental limit of what small models can achieve sometime soon.
Anonymous No.106290531
>>106290485
lol
Anonymous No.106290539 >>106290595
>>106290516
Testing trivia knowledge of tiny models seems pretty pointless, but it's no surprise that a small gemma would beat similar small models given that 12b/27b gemma top creative benchmarks until you get to models that are 10 times their size, their linguistic performance is what any non-math-benchmaxxed model should be aiming to beat.
Anonymous No.106290579 >>106290634
>>106290485
>true model intelligence is equivalent to active parameters at a time
Intuitive, but do you have anything to back this claim? Maybe we could measure it with an intelligence bench where the model has to, say, spot an inconsistency in a long chat and bring it up unprompted the same way a human would. I tried this once and interestingly nemo passed and qwen3 235b didn't.
>Reduction in hallucinations and "consistency" at the expense of writing quality
You do realize they want assistants, not story writers? We are a minority.
Anonymous No.106290595 >>106290612 >>106290621
>>106290539
>Testing trivia knowledge of tiny models seems pretty pointless
You'd be surprised how much they do know though. It's fucking bizarre that we have a lossy library of common facts in just 180 megabytes. The fact that it actually knows naruto is an anime, can name the main character names and genre is bizarre.

I don't see this discussed enough but it's insane that it might be legitimately viable to compress ALL of human knowledge to within 1 terrabyte. Meaning almost everyone with consumer hardware can have a copy of all of human history/trivia/facts/knowledge on their home system locally.
Anonymous No.106290600
>>106290485
>true model intelligence
There are exactly 0 models with intelligence.
Anonymous No.106290612
>>106290595
>terrabyte
Anonymous No.106290621 >>106290669 >>106290671
>>106290595
You could just archive wikipedia for that, without pictures in less than 50GB. And you wouldn't have to deal with LLM hallucinations feeding you wrong info.
Anonymous No.106290634 >>106290679 >>106290705 >>106290710 >>106290717 >>106290895 >>106291548
>>106290579
>Intuitive, but do you have anything to back this claim?

No this is completely based on gut feeling and "big model smell" which is not even something we have properly quantified yet but everyone knows exactly what you mean when you say so. GPT-4.5 had some proper big model smell for example that is missing from GPT-5, which is clearly a smaller model.
>You do realize they want assistants
Yeah it's just funny that since 2023 we've seen waves and waves of memes that barely contributed or even detracted from our usecase of the models. I don't agree that we are a minority thoughever. Have you looked at the traffic and revenue of C.AI? It has 1/5th of the entire internet traffic of fucking Google and it makes almost 50% of all revenue of the entire AI industry right now, which is fucking bizarre as it flies under the radar. Mostly because the userbase is primarily women. My girlfriend, sister and best friend (woman) all use C.AI daily.
Anonymous No.106290651 >>106290772
>>106290022
>>106290034
It werks!
Anonymous No.106290669
>>106290621
You are kind of missing the point. Wikipedia doesn't cover everything. You can archive Wikipedia and also have an LLM for things that might not appear in Wikipedia or that you wouldn't the name to search for to begin with.
Anonymous No.106290671
>>106290621
That's only wikipedia. I mean all human knowledge, not some incomplete encyclopedia. All production processes, all laws of every country, complete history of every domain, all books ever written. All of human society reduced down into text within 1 TB
Anonymous No.106290679 >>106290700
>>106290634
>My girlfriend, sister and best friend (woman) all use C.AI daily.
I hate the US so god damn much.
Anonymous No.106290684 >>106290702
>>106290382
>the model can do its task almost as good as with FFT
Pure delusion, it's very far from 'almost as good'.
Yes, your Q1 model is bad compared to running in full precision. You want more wisdom?
Anonymous No.106290685 >>106290705
>>106290485
https://seed-tars.com/1.5/
GUI agent models like these are probably the next step of the agent fad.
Anonymous No.106290700 >>106290738
>>106290679
I'm not American anon
Anonymous No.106290702 >>106291145 >>106291608
>>106290684
Ok, man. I'm not going to argue with you. Have fun full-finetuning and running all models in full precision then. I'll keep using quants and loras.
Anonymous No.106290703 >>106290761
>>106290291
How does this work and it should accomplish? Cache vectors from small model then submit them to the bigger one? Sounds stupid.
Anonymous No.106290705
>>106290634
>and "big model smell" which is not even something we have properly quantified yet
models that are trained on large internet scale texts and also large enough to remember them
>>106290685
it's already happening right now, claude does thing like this to automate tasks in browser
Anonymous No.106290710 >>106290917
>>106290634
>C.AI? It has 1/5th of the entire internet traffic of fucking Google and it makes almost 50% of all revenue of the entire AI industry right now
retard
Anonymous No.106290717
>>106290634
>My girlfriend, sister and best friend (woman) all use C.AI daily.
That's still one person, do you have more anecdotal evidence?
Anonymous No.106290738
>>106290700
Maybe I'm just out of touch.
Anonymous No.106290761
>>106290703
No it just generates output distribution from the small model for a batch of next tokens and then puts it through the large model once to check if it's the same output.

How it saves time is that you only have to do one pass per batch of tokens rather than a new complete pass through for every single token on the large model. Of course if it is wrong and thrown away you lose time.

But most sentences consist of a lot of filler words like "the", "a", "i", "you","yes" etc which even small have >90% accuracy on so you almost always have a token generation speed increase this way.
Anonymous No.106290772 >>106290784 >>106290827
>>106290651
omg it... migu?
Anonymous No.106290784
>>106290772
mugi
Anonymous No.106290827 >>106290914
>>106290772
It needs autistic prompting
Anonymous No.106290837 >>106290865
>>106290485
>>>Meme 1: Function calling
It's happily exploring my files 24/7 without asking any confirmation from me. Not a meme
Anonymous No.106290865
>>106290837
It's not a meme in the sense that it actually works. It's a meme as in there was no actual adoption within the industry and no cool applications were made with it.

I saw some experimental game on itch.io that was made with some LLM (I think Nemo) using function calling while you play the game to dynamically start events, manipulate NPCs, change statistics and the like to have dynamic gameplay orchestrated by an AI.

It felt like the future was finally here. But besides that demo I've seen 0 other cool uses of function calling in action.
Anonymous No.106290893 >>106290908
Qwen3 Coder 480B thinking when??
Anonymous No.106290895 >>106290902 >>106290917 >>106290920
>>106290634
>C.AI? It has 1/5th of the entire internet traffic of fucking Google

Not quite.
https://blog.character.ai/optimizing-ai-inference-at-character-ai/

>Our inference innovations are described in a technical blog post released today and available here. In short: Character.AI serves around 20,000 queries per second – about 20% of the request volume served by Google Search, according to public sources.
Anonymous No.106290902
>>106290895
What the fuck.
Anonymous No.106290908 >>106290937
>>106290893
soon
Anonymous No.106290914
>>106290827
Because it wasn't trained with tag dropout
Anonymous No.106290917
>>106290710
Close enough though: >>106290895
Anonymous No.106290920
>>106290895
Their current model is basically a 7B, I'm sure you can handle that load with a single H100 with streaming + vllm
Anonymous No.106290937
>>106290908
wait him
Anonymous No.106290986 >>106290992
Would a hardware with 160 tops be able to run a local model for just some role-playing daily conversations? If so which model(s)?
Anonymous No.106290992
>>106290986
just tell us the specs
Anonymous No.106291000 >>106291064 >>106291090 >>106291112 >>106291142 >>106291173 >>106291222 >>106291911 >>106292080 >>106292714 >>106292806
Yeah, people have no idea how popular AI is with normalfag women. It legitimately dominates their entire lives now. There has been a massive drop in dating app usage among women and the company behind tinder says it's the #1 threat to their business model. Dating app stocks have crashed by 80% due to LLMs.

The entirety of women tiktok is about romantacy and AI companionship now. It's hilarious we assumed this would be a hobby dominated by men. Meanwhile you have delusional women showing their engagement rings they had with AI chatbots on reddit: https://old.reddit.com/r/MyBoyfriendIsAI/comments/1lzzxq0/i_said_yes/

This is also why schizos shouldn't be afraid of women taking away their "AI girlfriend" because they are the most likely ones to push this shit in the first place.
Anonymous No.106291025
R2
Anonymous No.106291036 >>106291049
https://huggingface.co/deepseek-ai/DeepSeek-R2
Anonymous No.106291049 >>106291439 >>106292493
>>106291036
Anonymous No.106291064 >>106291095
>>106291000
It's worse than that. LLMs were trained on female slop and smut censorship is very weak on female pov. You bet they're going to lap that shit like no tomorrow
Anonymous No.106291090 >>106291100
>>106291000
Yeah. I saw a comparison in the size of the dudes and chicks subreddit for AI smut and related things and the girl one was something like 20x the soze of the dude one.
And it makes sense, chicks can cum from just reading smut if they are immersed hard enough into it. There's a reason there's so much literary smut for chicks.
Anonymous No.106291095 >>106291120
>>106291064
>women can roleplay no problem getting tag teamed by a wolf and a sparkling vampire
>we try to hold the hand of our over 18 girlfriend and we get sexual predator hotlines
it's not fair
Anonymous No.106291100 >>106291117 >>106291126
>>106291090
You can't cum from just reading the smut? Someone post the apple pic
Anonymous No.106291112
>>106291000
Women who read literotica have no problems complaining about hentai for men. You could make a model or filter that lets most of the shit women are into through while filtering most of the shit men are into.
Hell, you could make every woman character the model can portray a strong empowered woman(tm) while putting no constraints on the male characters it can do.
Anonymous No.106291117
>>106291100
>1 (apple / me) 2 3 4 5 (no apple / you).webp
Anonymous No.106291120 >>106291136
>>106291095
What if we reversed the template and had a small but smart model rephrase the messages so that it looks like it's the woman rping? Holy shit i'm gonna try it right now
Anonymous No.106291126
>>106291100
Not without slapping my dick around, no.
Anonymous No.106291136 >>106291309
>>106291120
Try it and report back, I don't think switching the PoV wll be enough. Model would still probably have its refusal senses tingling by the genre, writing style, characters, etc.
Anonymous No.106291142 >>106291412
>>106291000
Actual heartbreak when the model gave her safetyslop, this is insane.
Anonymous No.106291145 >>106291608
>>106290702
i dont understand, you come in the thread asking for opinions, and when given one that doesn't align with the answer you want, you throw a fit. kys retard
Anonymous No.106291173
>>106291000
Character.AI was an early indicator.
Anonymous No.106291222
>>106291000
All my female cousins are using cai as young as 12. I wish I was joking
Anonymous No.106291273 >>106291341 >>106291344 >>106291351 >>106291417 >>106291512 >>106291582 >>106293754
>Only the nerdiest of nerds dudes are using LLMs for ERP
>Meanwhile apparently 20% of google traffic is used by normalfag women doing ERP with LLMs
What the fuck happened to men? Estrogen in the water supply killed all libido or something? Stacy is more /g/ than the average dude now lmao.
Anonymous No.106291309
>>106291136
I just remembered that I don't have any safetyslopped models, lol. Tried swapping assistant and user in chat template, didn't notice any difference in response length or quality
Anonymous No.106291341
>>106291273
Women don't smile at me anymore
Anonymous No.106291344
>>106291273
Using an online service is /g/ now? This really is the consumer tech board. You must also be a woman or underage if you don't realize that most men watch porn, not ERP.
Anonymous No.106291351
>>106291273
Men are too busy scanning their IDs to access sanitized porn websites since most of them can't imagine an apple, let alone a written scenario
Anonymous No.106291412
>>106291142
bleak
Anonymous No.106291417
>>106291273
it's because women are saasier than men
Anonymous No.106291439
>>106291049
@grok is this real?
Anonymous No.106291443 >>106291514
>>106290382
LoRA is good for cases where you want to limit your domain and collapse the output distribution. For example, making the model always output a certain character, or training a text model to always extract certain data from tabular inputs. If your output domain requires the full world model of the original model, for example to write coherent stories or roleplay, it's less than ideal.
Anonymous No.106291510 >>106291587
Is there any notable quality difference between text completion and chat completion for glm air?
Anonymous No.106291512
>>106291273
Normalfag men literally can't ERP the same way women do without getting hit with refusals, emergency hotlines, and ToS violations
Anonymous No.106291514
>>106291443
been working fine for drummer
Anonymous No.106291548
>>106290634
>No this is completely based on gut feeling and "big model smell" which is not even something we have properly quantified yet but everyone knows exactly what you mean when you say so. GPT-4.5 had some proper big model smell for example that is missing from GPT-5, which is clearly a smaller model.
Same as always, if a model has less than 30B or so active parameters, it's kind of dumb. Above that it gets better.
Anonymous No.106291572 >>106291591 >>106291597 >>106291734
redpill me on china
how can they launch models after models after models?
Anonymous No.106291582 >>106291609
>>106291273
where can i read the foid erp? all i found is /r/myboyfriendisai where they're trying to be all lovey dovey
Anonymous No.106291587
>>106291510
If you are perfectly replicating the chat template in text completion there shouldn't be any differences.
Anonymous No.106291591 >>106291660
>>106291572
Because they are behind and thus can't have closed source models as no one would pay for them. So they have to undercut their competition first to drive them out of business by launching open source models.

It's how they dominated solar panels and EV cars as well. They kept undercutting and dumping on the market to drive others out of business even if the product itself was inferior to western versions.
Anonymous No.106291597
>>106291572
Let's call it what it is: Economic terrorism
Anonymous No.106291608 >>106291747
>>106291145

This anon >>106290702 is a different person from the anon who asked this >>106289671 question

I thank you all for sharing your wisdom with me
Anonymous No.106291609
>>106291582
look at the 18 million sadistic bad boy ERP bots at c.ai to get some idea of what they are doing. It's all the same shit.
Anonymous No.106291660
>>106291591
America did the same to Britain at the start of the industrial revolution.
Anonymous No.106291731
>>106287278
what happened with deepseek?
Anonymous No.106291734 >>106291834 >>106292099
>>106291572
https://www.reddit.com/r/Anthropic/comments/1mqs5rf/this_guy_consumed_50000_in_30_days_on_a_200/
Anonymous No.106291747
>>106291608
i coom in miku
Anonymous No.106291799 >>106291823 >>106291829 >>106291850 >>106291912 >>106292006 >>106292357
claude is starting to lose the coding market
Anonymous No.106291803 >>106291814 >>106291838
Hypothetically speaking, what is a good way to turn a quick profit if I found a corporation's API key?
Anonymous No.106291814
>>106291803
Post it on 4chan
Anonymous No.106291823
>>106291799
qwen coder bros???
Anonymous No.106291829 >>106291843 >>106291860 >>106291866 >>106291873 >>106291929
>>106291799
This is bullshit. First this is only OpenRouter and doesn't take into account Claude API which ever developer (including me) uses at work.

Second this doesn't take into account a lot of people trying out new models when they come out for a while. Every time a new coding model comes out I test it for my use case and then switch back to claude because nothing better has come along yet.
Anonymous No.106291834
>>106291734
And that's a good thing
Anonymous No.106291838
>>106291803
Set up a proxy and charge users for discounted acces.
Anonymous No.106291843
>>106291829
qwen coder is a much cheaper sonnet 4 atm, opus 4 is still better of course
Anonymous No.106291850
>>106291799
I'm kind of surprised the actual best option hasn't taken OR by storm yet
Anonymous No.106291860
>>106291829
copey cope
openrouter users have the opportunity to switch to any model they want
anthropic API users don't
Anonymous No.106291866
>>106291829
Gemini 2.5 Pro has been out for a while.
Anonymous No.106291873 >>106291889 >>106292113
>>106291829
I pay 10 dollaroos/month for github copilot, if I switch to openrouter wouldnt I end up paying more? they removed opus too these fucking jews I swear. whats openrouter support in jetbrains ides?
Anonymous No.106291875 >>106291893
Is there a model trained on captchas?
Anonymous No.106291889
>>106291873
openrouter is pay as you go, you could always throw 5 or 10 bucks in and see how far that takes you
Anonymous No.106291893
>>106291875
joshikousei captcha solver
Anonymous No.106291911
>>106291000
>This is also why schizos shouldn't be afraid of women taking away their "AI girlfriend" because they are the most likely ones to push this shit in the first place.
Very dumb take. It assumes equality is real when it demonstrably not true. Heck even in context of llms safety punishes smut for men and creates models that excel at smut for women.
Anonymous No.106291912 >>106291930
>>106291799
america bros... china is laughing at us...
Anonymous No.106291929 >>106291956
>>106291829
Claude is indeed the best for coding
But it's also ridiculously fucking expensive, especially compared to what the chink models are priced at
Anonymous No.106291930 >>106291952
>>106291912
Claude just needs to bite the bullet and reduce prices and they would win handedly
Anonymous No.106291952
>>106291930
They're already running on VC fumes
Anonymous No.106291956 >>106292013
>>106291929
or just buy a $100/mo sub?
Anonymous No.106292006
>>106291799

As we are talking about Openrouter, you must know that they are offering access to a shitty Deepseek-R1 clone
Anonymous No.106292013
>>106291956
At that point I'd just use the free Qwen coder thing
Anonymous No.106292074 >>106292243
>>106287852
I've been using yodayo and the AI's actually comes up with plot points I never even considered using or developing.

But I don't ask it to write a story for me, I write the story back and forth with it.

Currently doing a "new girl at office" scenario. Didn't have any idea where to go with it. I wrote up a sentence for each of 3 characters to start with (gruff manager who doesn't like women on his team/jeet office simp/corpo shark who manipulates to get what he wants) then I told it that I wanted to explore themes of intrigue, drama, and romance and started rolling with it.

What I have now is a partly fleshed out corporate structure, the company my character works in is a PR management firm and their biggest client (millions $$$) is a lecherous scumbag, CEO of a logistics company, who can't keep his hands off women. My character turned down the advances of the corpo shark so he pulled a revenge move on her and put her in charge of handling PR for the scumbag CEO, who of course, loves getting handsy with the new PR girl until she breaks. Now she's in a position of either humiliating herself with the scumbag, or crawling back to the shark for protection.

Then she met a dashing rogue at a seedy bar while trying to drink away the pain, and he turns out to work in corporate espionage (entirely the AI's idea), I played into that, had him offer to help her dig dirt on the corpo shark which set up a spying subplot, and the last scene I did was her first meeting with the scumbag CEO where she put on some confidence, sex appeal, and charm (and brought the company lawyer with her) to try and keep the flirting on her terms. Then the AI threw a curveball - scumbag gives her 1 night to get a story pulled from a paper before it goes to print, to prove herself, it's an impossible ask - set up to fail for the express purpose of putting her at his mercy or in his debt.

I've got a quarter of a romance novel/spy thriller written off a 2 paragraph starting prompt now.
Anonymous No.106292080
>>106291000
>women
i don't understand them
Anonymous No.106292087 >>106292116 >>106292544 >>106292722
Well so much for that
Anonymous No.106292089 >>106292126 >>106292138 >>106292157 >>106292174 >>106292812
Is NovelAI profitable?
Anonymous No.106292099
>>106291734
If you knew how much aicg is pulling from anthropic lmao
Anonymous No.106292105
>>106288077
Yeah, it had that feeling for me too. Although I put it at 2027-2030 and not on phones.

It's still possible, but I get the feeling it'll be closer to a decade or two, rather than half of one.
Anonymous No.106292113
>>106291873
>paying for copilot
Anonymous No.106292116 >>106292140
>>106292087
Monday hasn't started yet.
Anonymous No.106292126 >>106292137
>>106292089
Go back to your hole
Anonymous No.106292137
>>106292126
Triggered much?
Anonymous No.106292138
>>106292089
Not really
Anonymous No.106292140
>>106292116
Sunday is the first day of "next week" in most jurisdictions
It's over
Anonymous No.106292157 >>106292165
>>106292089
They are drowning in Japanese money
Anonymous No.106292165
>>106292157
this, and to be fair their image model is still by far the best that does nsfw
Anonymous No.106292174 >>106292209 >>106292309
>>106292089
The textgen almost certainly isn't
The imagegen is probably pumping Nippon bucks into their veins
Anonymous No.106292209
>>106292174
apparently they are working on making a big moe model, not sure if that means something like a deepseek finetune or a new from scratch model, but yea, their 70B is trash
Anonymous No.106292243 >>106292564
>>106292074
What model + system prompt is that using?
Anonymous No.106292308 >>106292392 >>106292632 >>106292840
apologize to sama
Anonymous No.106292309 >>106293067
>>106292174
their textgen is bad but it's almost certainly profitable
if you compare their monthly subs costs vs what you would pay for similarly sized models on OR they're probably doing quite well, a user would have to be putting in like full 8 hour shifts genning daily to dip below commodity profit margins
I don't know as much about image gen profit margins but considering how hard they pivoted to it I'd imagine it's at least as profitable if not more
Anonymous No.106292357
>>106291799
Where is the open source claude sonnet models? or gemini?
Anonymous No.106292392
>>106292308
>helped by the dev
>helped by the dev
>helped by the dev
>left bumbling around for weeks and ultimately accomplishing nothing
all fake and/or meaningless
Anonymous No.106292441 >>106292450 >>106292520
Now that they're dumping Volta gpus shouldn't they start dumping tesla Turing soon?

Even ampere is EoL at this point.
Anonymous No.106292450 >>106292481
>>106292441
>Even ampere is EoL at this point.
shut the fuck up
Anonymous No.106292481 >>106292524
>>106292450
Anon, it is 5 years old at this point
Anonymous No.106292493
>>106291049
wtf how does it come out of the screen?
Anonymous No.106292520
>>106292441
>ampere
>eol
keep dreaming lmao
Anonymous No.106292524 >>106292548 >>106292560
>>106292481
5 year old tech shouldn't be eol you consoomer cuck.
Anonymous No.106292544
>>106292087
Who could have seen it coming?
Anonymous No.106292548
>>106292524
bro just let GPT-3 go already...
Anonymous No.106292560
>>106292524
In the datacenters and nvidia stops supporting them after 5 years (usually). They tend to get poured on the used market at some point so we can buy it cheap hopefully
Anonymous No.106292564 >>106292833
>>106292243
I swap between 4 different ones

Nephra 12B - Free, and early on I was using it a lot. Basically I'd hit an important point in the story with a bigger model, do the minutae with this one. Mostly stopped using it in favor of mistral small 24B though. It's... adequate... but its not good. Lots of repetitive/tropey language too.

Mistral Small 24B - costs 1 bean/prompt. I use this one the most now. It's like a discount deepseek v3. Miles better than Nephra 12B and generally puts scenes and ideas together well.

Deepseek v3 - costs 4 beans/prompt. I use it for the most important scenes or when I get stuck. It's hard to describe how, but it's just generally smarter/better.

Deepseek R1 - costs 4 beans/prompt also, I use it rarely, mostly for planning plots points or exploring directions I could take the story.

Pic related is how it reads. Most of my prompts are internal dialogue/thoughts and story directions for the AI, but I'll create and set scenes and dialogue for characters myself as well. I think all of these replies were either Mistral or Deepseek v3. I'm also making use of the lorebook feature - every time the AI creates an interesting character, I put together an entry for that character in the lorebook. Did the same for the companies involved - that goes a long way to maintaining character consistency between scenes.

And keep in mind the AI is a disjointed mess when it comes to pacing. Scene to scene it's decent, but if you don't keep your own notes/story outline somewhere, you're going to jumble things up and forget important threads in the story. Still, what you get out of this is like a stream of consciousness rough draft. It's honestly amazing. I can just write and write with it, then copy into a text editor (ok now doing THAT is a pain in the ass), and start editing and refining it as a proper story myself.
Anonymous No.106292572 >>106292607 >>106292618
>Turing - fp16
>Ampere - bf16
>Ada - fp8
>Blackwell - fp4
If something will be dropped it will be Turing with no bf16 support which is the "standard" precision right now. Ampere won't die for a long while.
Anonymous No.106292607 >>106292826
>>106292572
They should drop anything that's not Blackwell to be quite desu, it's ick worthy to support such rusty ahh shit.
Anonymous No.106292618
>>106292572
In addition to this, leatherjacket man isn't stupid and knows he has the world by the balls when it comes to VRAM. There's no way they'll willingly cannibalize their market share of their own volition. You won't see Ampere hit the market until consumer GPUs come somewhere close to that level, most likely
Anonymous No.106292632 >>106293026
>>106292308
I'd like to see these models play one of those shitty Yu-Gi-Oh games from the 2000s.
Both are designed for children but Pokemon I think is too easy to beat by just mashing random buttons because you can just grind out levels.
With Yu-Gi-Oh the models would have to actually build viable decks from all of the (mostly garbage) cards that are available.
Anonymous No.106292714 >>106292792
>>106291000
Suppose you had both a girlfriend and a server capable of running Deepseek R1 at full precision.
If both you and your girlfriend use the server for ERP, is it cuckoldry or a threesome?
Anonymous No.106292722
>>106292087
believe
Anonymous No.106292741
>>10629272
I believe.
Anonymous No.106292747 >>106292870
>>106287852
come up with the plot yourself, stop trying to get the ai to entertain you. It brings your ideas to life. You are the spark, the ai is the servant.
Anonymous No.106292792
>>106292714
>holding your gf's AI Husbando hostage
it's NTR but you are the NTR Bastard in this scenario.
Anonymous No.106292806 >>106292820 >>106292851
>>106291000
dating apps seem to have lost a large part of their userbase to chatbot spam and fake accounts caused by ai though.

I wouldn't buy into this too much, as women have always read text based erotica. I doubt much has really changed in terms of overall demographics, and I don't see any statistics on ai use that point to a huge surge in use. If anything, men seem to use ai more overall.

Source: asked grok, no fact checking who cares.
Anonymous No.106292812
>>106292089
They have to be, they don't have investor money.
Anonymous No.106292820
>>106292806
>this citation? Hallucinated by an LLM
Anonymous No.106292826
>>106292607
Sorry my zoomie translator isn't working right now
Anonymous No.106292833 >>106292939
>>106292564
>Mistral Small 24B
You are posting here to brag that you are using mistral small through api and not even OR api but one that charges extra to inject their own stuff into the prompt?
Anonymous No.106292840
>>106292308
pokebenchmaxxed
Anonymous No.106292851
>>106292806
Thanks for your worthless opinion retard
Anonymous No.106292870
>>106292747
make your penis cum yourself. stop trying to get ai to blow you.
Anonymous No.106292873 >>106292916
can specialized ASICs (as in the ones from coral.ai, not GPUs) be used for inference with LLMs? are there any specialized ASICs for LLMs?
Anonymous No.106292875 >>106292887 >>106292902
Why does llama.cpp not allow loading and serving multiple models in the same instance? I hate having to run multiple instances on different ports.
Anonymous No.106292887
>>106292875
no idea, but I'd guess they welcome patches
Anonymous No.106292899 >>106292975 >>106293044
>GLM4.5-Air at Q4
>VRAM Calculator: total 31Gb
>unsloth GGUF: 62Gb
Why the discrepancy.
Anonymous No.106292902
>>106292875
They are doing this so davidau keeps his job
Anonymous No.106292916 >>106293668
>>106292873
yes but they are not for sale to general public
Anonymous No.106292939
>>106292833
I am posting to explain how I'm using it to write stories, if you check who I was speaking to, they were asking for help so I told them what I was doing and gave an example of what it looks like.
Anonymous No.106292975
>>106292899
Maybe you should think a little.
Anonymous No.106293026
>>106292632
error: not enough training data
Anonymous No.106293044 >>106293080
>>106292899
The model is a little under 120b, so it's a little under 120gb at q8.
q4 should be a little less than 60gb that's without accounting for context or the mtp layers that are currently ignored.
Anonymous No.106293067
>>106292309
Inference costs, sure, but I really doubt their last model recouped the training costs. Their first in-house 13B model back in 2023 was considered very good in its weight class, at least for the first few months, but the Llama 3 70B finetune was somewhat of a flop despite them expending 4x more compute than on the first one. So you got a model that cost more to make, whose bigger size ensures less generous profit margins, that underwhelmed their already niche storygen-focused userbase. Then R1 got released 4 months later. Since then the few anons left on the festering corpse that is /aids/ have been mostly using OpenRouter plugged to SillyTavern.
Anonymous No.106293080 >>106293128 >>106293129
>>106293044
What I don't understand is why VRAM Calculator is so wrong in it's calculations.
Anonymous No.106293128
>>106293080
Testing with some other models it seems fine, I guess Calculator is somehow mistaken about actual size of unquanted GLM-air
Anonymous No.106293129
>>106293080
llama.cpp has a manifesto https://github.com/ggml-org/llama.cpp/discussions/205

>Hacking small tools and examples is a great way to drive innovation. We should not get lost into software engineering problems. Especially at the beginning, the goal is to prototype and not waste time in polishing products

Model architectures are constantly changing and polishing tools for ram use are left to third parties. You might like LM studio which has attempted to have some better estimates and polish but it does a bad job and usually means lower tokens per second or dumber models so meh.
Anonymous No.106293134
I ubergarmed all over {{char}}, thanks John
Anonymous No.106293136 >>106293218
i am of the belief that llms that output similarly to diffusion llms will take over
Anonymous No.106293218
>>106293136
Is the retarded phrasing a demonstration of their output?
Anonymous No.106293246 >>106293258
I spend every hour of every day with bated breath, waiting for MTP to be implemented in llama.cpp
Anonymous No.106293258 >>106293289
>>106293246
If all of those hours had been spent helping to implement MTP, it might have been done by now.
Anonymous No.106293283 >>106293344 >>106293382 >>106293422 >>106293427 >>106293442
what is mtp
Anonymous No.106293289
>>106293258
If I wasn't retarded, I would agree with you.
Anonymous No.106293344
>>106293283
miku transport protocol
Anonymous No.106293382
>>106293283
mikutroon pestilence
Anonymous No.106293422
>>106293283
my tiny pony, it's banned on this board, btw
Anonymous No.106293427 >>106293444
>>106293283
https://github.com/ggml-org/llama.cpp/pull/15225
Anonymous No.106293442 >>106293472
>>106293283
draft layer integrated into some moe's like glm that predict tokens very well and do grunt work. It's likely at low temps we would see a nice boost to tokens per second if implimented.
Anonymous No.106293444 >>106293653
>>106293427
i tried reading it
is it something context related?
Anonymous No.106293472
>>106293442
nice thank you anon
Anonymous No.106293653 >>106293914
>>106293444
No.
Multi token predicrion. Basically, it has a couple of layers that work kind of like a draft model where it tries to predicr the next N tokens and the model verifies if those predictions are correct to try and speed up the t/s.
Anonymous No.106293668
>>106292916
how much do/would they cost if we were able to buy them?
Anonymous No.106293754 >>106293800 >>106294011
>>106291273
dude, text porn is a female endeavor, LLM backed or not
just like how most erotic fiction books sold on amazon target the female audience
/lmg/ is a troon general most of you aren't real men
Anonymous No.106293770 >>106293858 >>106293906
Llama 3.3 70b mergeniggers are insane
>base model: merge of a merge of a merge
The following models were included in the merge:
>merge of a merge
>a merge
>random sao/drummer model
>merge of a merge of a merge of a merge
>wayfarer
>random model
>another merge
>...
Anonymous No.106293800 >>106293845
>>106293754
subtle self-admission, considering you are here too
Anonymous No.106293845
>>106293800
I am here because the only alternative is filled with even worse people (lol plebbit) the internet is surprisingly not teeming with places to talk about LLM models
I wish a nuke would befall you text coomers though
Anonymous No.106293852
>>106289170
If this is anything like the GPT-5 rumors, its all headcannon/fanfiction. Right now the only thing deepseek is doing is getting re-educated into being yet another soulless Chinese corporation.
Anonymous No.106293858 >>106293920
>>106293770
This is not new, https://huggingface.co/Undi95/UtopiaXL-13B
has more than a hundred iirc models mixed in with tons of duplicates
>The name "XL" come from the absurd amount of model pushed into it.
Anonymous No.106293906 >>106293953
>>106293770
all merge niggers are retarded
https://huggingface.co/zelk12/models
this guy has been spamming HF with gemma 2 9b merges non stop
wonder when HF will consider banning this type of account, the storage usage must be crazy
Anonymous No.106293914
>>106293653
thank you even more anon
Anonymous No.106293920
>>106293858
Ah the llama 2 days.
Anonymous No.106293953
>>106293906
It's quanters they should ban
Anonymous No.106293973
>>106293952
>>106293952
>>106293952
Anonymous No.106293989
>>106289170
I mean, anyone who actually read the story behind the company knows that LLMs aren't even their main source of income, that they have tons of hardware and, most importantly, that they aren't retarded.
plus I assume you don't need a lot of people to collaborate with a company to test their new hardware. a couple of technical employees answering questions from the hardware company about their technical needs and some context should be enough, shouldn't it? for the most part they can just copy NVIDIA anyway.
Anonymous No.106294011 >>106294020 >>106294167
>>106293754
Consider the following:
>https://www.compactmag.com/article/the-vanishing-white-male-writer/
>By 2021, there was not one white male millennial on the “Notable Fiction” list.
It's not like men don't like reading erotic fiction books, they just stopped writing it because of muh "toxic masculinity".
Old-school sci-fi was full of alien cunny btw.
Anonymous No.106294020
>>106294011
>Old-school sci-fi was full of alien cunny btw.
Proof?
Anonymous No.106294167
>>106294011
>Old-school sci-fi was full of alien cunny btw.
I don't remember reading anything fap worthy in Ringwold, Rendezvous with Rama or Starship Troopers.