← Home ← Back to /g/

Thread 107056325

347 posts 122 images /g/
Anonymous No.107056325 [Report] >>107057943 >>107059860 >>107062861
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107044779 & >>107035841

►News
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/30) Brumby-14B-Base released with Power Retention: https://manifestai.com/articles/release-brumby-14b
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.107056334 [Report]
►Recent Highlights from the Previous Thread: >>107044779

--Paper: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats:
>107048819 >107050729 >107051225 >107051397 >107051579 >107051763 >107051785 >107052024 >107052042
--Kimi Linear release and model size vs performance tradeoffs:
>107052386 >107052523 >107052534 >107052587 >107052868 >107053037 >107053119 >107053253 >107053271 >107053372 >107053399 >107053296 >107052943 >107052960
--Brumby-14B-Base's power retention architecture:
>107053745 >107053782 >107053793 >107053806 >107053815 >107054051 >107054141 >107054191 >107054161 >107054205 >107054237 >107054228
--MiniMax M2's full attention choice due to efficient attention's unmet real-world expectations:
>107055069
--Optimizing VibeVoice-Large-Q8 with selective quantization and performance tweaks:
>107046566 >107046649
--Input text recovery from hidden states:
>107053293 >107053393
--CUDA toolkit installation headaches and alternatives:
>107045283 >107045326 >107045351 >107045445 >107045512 >107045605 >107049390 >107049857
--Mixed experiences and optimization tips for glm 4.6 usage:
>107051344 >107051367 >107052899 >107053125 >107051379 >107051387 >107053864
--GLM-4.6 excels in code planning and tool stability:
>107046842 >107046900 >107046932 >107046939 >107047296
--Evaluating Mamba-based LLMs: context length claims vs practical performance:
>107044925 >107045236 >107045252 >107045278
--Qwen3VL support added to llama.cpp:
>107054671 >107054693
--LLM preference inconsistency under contextual shifts:
>107049878 >107049939 >107049985
--Exploring transformer token prediction theory and Suno AI's limitations:
>107047458 >107048117 >107048175 >107048207 >107048762
--Logs:
>107046612 >107046642 >107048277 >107056280
--Miku (free space):
>107047069 >107049649 >107051768 >107051786 >107053223 >107053796 >107055480

►Recent Highlight Posts from the Previous Thread: >>107044782

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.107056358 [Report] >>107056396 >>107056482
this is /lmg/. please post screenshots of using models locally.
model tested: mradermacher/Qwen3-VL-32B-Thinking-Q6_K.gguf
Anonymous No.107056365 [Report]
i want to show my cock to qwen
Anonymous No.107056396 [Report] >>107056456 >>107056533
>>107056358
this is /lmg/
please post your weenie
Anonymous No.107056456 [Report]
>>107056396
too busy sniffing
Anonymous No.107056482 [Report] >>107056509 >>107056554
>>107056358
I put your screenshot to test on the smallest version of this Qwen vision: 2b instruct.
For something that ridiculously small and fast, it's quite coherent.
>eval time = 3950.30 ms / 383 tokens ( 10.31 ms per token, 96.95 tokens per second)
I think I'm going to use this to tag my personal photo library. It's the sort of usage where you don't give a shit if there's a few unimportant tagging mistakes, but it's convenient to do it fast.
Anonymous No.107056509 [Report] >>107056541
>>107056482
could you please tell me what it says if you feed it the image directly in a new chat? the image is here
>>107053751
Anonymous No.107056522 [Report]
I do kinda feel hurt now :( the first time I understood, the second time gave me pause, and now this third time I feel kinda :(
I just wish I knew what I did wrong
Anonymous No.107056533 [Report] >>107056592 >>107056733
>>107056396
do we get freebies here for posting vids of drinking pp from our weenie?
Anonymous No.107056538 [Report] >>107056637
whats good that I can run locally (for ERP stuff) on a 4090 and 64GB of ram?
Anonymous No.107056541 [Report] >>107056576
>>107056509
Anonymous No.107056554 [Report]
>>107056482
Something that fast and light could be cool for some kind of use in a video game, I'm thinking some kind of sci-fi game where your actually have an AI companion that can see your screen via periodic snapshotting so it can make comments about your progress or moral choices or whatever.
Anonymous No.107056576 [Report]
>>107056541
not bad at all for a 2B model, qwen cooked pretty good with this one.
Anonymous No.107056583 [Report] >>107056605 >>107056619
>>107050715
Post logs.

>>107055520
There is no OCR attention. It was a foot note about a silly idea that's basically just how the original encoder-decoder transformer already worked (encode one string as a fixed length vector and use that to generate another string).
Anonymous No.107056592 [Report]
>>107056533
no, you're looking for ecker, he's normally in /aicg/
Anonymous No.107056605 [Report]
>>107056583
>There is no OCR attention
aww
i hope we get v4 for christmas then
Anonymous No.107056619 [Report] >>107056648
>>107056583
I would post logs but I have grown bored of the usual Gemma3 / Mistral. It is inheritently about my scenarios and how I have implemented them.
Anonymous No.107056637 [Report]
>>107056538
air
Anonymous No.107056648 [Report] >>107056901
>>107056619
i plan to run qwen 3 vl on one computer and kimi on another computer and alternate between the two models. maybe you can do something similar to breath creativity back into your logs
Anonymous No.107056696 [Report] >>107056722
poor neru...
Anonymous No.107056722 [Report] >>107056761
>>107056696
Is that the 32B?
Anonymous No.107056733 [Report]
>>107056533
Prompt processing is not for drink.
Anonymous No.107056761 [Report]
>>107056722
yeah, its the Q6 quant that i mentioned above, i rerolled a dozen times and it kept saying it was rin
Anonymous No.107056780 [Report] >>107056872
SERS REDEEM THE BLOODY LESSON ON HOW TO SUCCEED IN VERTICAL AI BASTARD BITCH
https://youtu.be/9CHktrroCDU
Anonymous No.107056872 [Report]
>>107056780
>we, as resellers of API services without any custom infra or ability to host our own finetunes, have evaluated the usefulness of finetuning and determined that it's useless
yawn
Anonymous No.107056887 [Report] >>107056911
IQ4_XS and FP16 mmproj
32b
qwnvl3
onions
Anonymous No.107056901 [Report] >>107056937
>>107056648
I tried to implement a new scenario and was bored with the output before I could edit the text files. I knew how it would end up.
Maybe I should trash my current setup and start over from scratch.
Anonymous No.107056911 [Report]
>>107056887
they omitted the sharty from the training data? monsters.
Anonymous No.107056937 [Report] >>107056982 >>107056988 >>107057122
im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
>>107056901
air time
Anonymous No.107056982 [Report]
>>107056937
I guess it's a time of realizing that I'm a bad writer.
No LLM will overcome that fact.
Anonymous No.107056988 [Report]
>>107056937
>im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
yes, and they would have experienced psychosis even if ai didn't exist
it's just that AI is whatever happened to be in front of them when they went psychotic
but this type of people don't need a SPECIFIC thing to trigger them, they will be automatically triggered by something, it's their destiny
t.calvinist
Anonymous No.107057009 [Report] >>107057036
>directoranon
i thought to make things enable/disable by clicking the label, instead of having to click disable in the list and lose the current index. but none of the current code models seem to know what to do with me importing lorebooks as dynamic settings (ie 'day' which contains sunday, monday, etc doesn't show up unless its read first). not sure how i'll do it, if at all, but i'll keep trying
Anonymous No.107057036 [Report] >>107057083
>>107057009
Stop using external UI. If you want to randomize things you can use ST macro random.
I don't have my old texts but you can create 'quest objective' in introductory message by using <!-- then random table --> and it won't show to the user.
Anonymous No.107057060 [Report] >>107057342 >>107063216
qwen vl is underwhelming, ill post my cock and see what it does
Anonymous No.107057074 [Report] >>107057130
how do i do images with ST?
Anonymous No.107057083 [Report] >>107057101
>>107057036
wut. are you drunk anon? that isnt what my addon is about at all. even though its quite thrown together, its totally inline with all other st addons. dunno where you got randomness, quests and stuff from. my addon is for keeping track of clothes, locations and stuff via lorebook entries. my webm was showing a new way to enable or disable entries without going into the menu and selecting 'disable' but offering a click toggle instead.
Anonymous No.107057101 [Report] >>107057121
>>107057083
Nah, it's just a text injection.
Take it easy, you don't need to protect it, let people use it.
Anonymous No.107057121 [Report] >>107057138 >>107057153
>>107057101
i still don't get what you mean. its not protected. you can see the code
Anonymous No.107057122 [Report]
>>107056937
CAN I EAT IT?!
Anonymous No.107057126 [Report] >>107057164 >>107057275
why would anyone fuck Qwen VL when you have to caption it with the assistantslop and then send the caption to the model
Anonymous No.107057130 [Report] >>107057141
>>107057074
Image Generation built in extension.
Anonymous No.107057138 [Report] >>107057162
>>107057121
Be silent.
Anonymous No.107057141 [Report]
>>107057130
i meant pasting images, thank u still
so i have to wait for it to caption it
why not just use florence-sex-2-large to caption img and feed it into a random model
Anonymous No.107057153 [Report] >>107057162
>>107057121
You are a very cool anon, is it AGPLv3?
Anonymous No.107057162 [Report] >>107057174 >>107057177 >>107057178
>>107057138
you're drunk. or a retarded bot.

>>107057153
it has no license, use any of it how you see fit https://github.com/tomatoesahoy/director
Anonymous No.107057164 [Report]
>>107057126
tavern wasn't really made for this so anything that's not chatting is very rudimentary and stuck on shoestring and cardboard standards from 2023
Anonymous No.107057174 [Report] >>107057240
>>107057162
>it has no license,
grim
Anonymous No.107057177 [Report] >>107057240
>>107057162
>it has no license, use any of it how you see fit
Pretty sure anything defaults to rights reserved by the creator.
Anonymous No.107057178 [Report]
>>107057162
Are you larping as a reddit moderator?
Anonymous No.107057194 [Report]
drumdrum whyd you do this?
Anonymous No.107057207 [Report] >>107057237 >>107057252 >>107057339
drummer cant you tell us a little about this please?
i liked glm steam, it was a sidegrade to air
while i did remove steam, i wanna try v1c
drumm..
Anonymous No.107057212 [Report] >>107057222
>107057178
this is the reddit and memey guy trying to be funny isn't it?
Anonymous No.107057222 [Report]
>>107057212
You are the reddit moderator who got kicked out from reddit.
Hi all, Drummer here... No.107057237 [Report] >>107057257
>>107057207
Sorry, I signed an NDA.
Won't be long though.
Anonymous No.107057240 [Report]
>>107057174
why would it need one? its a small script

>>107057177
thats me then and anyone can use it for any part. i hope it serves as a good example for reading lorebooks and updating data.
Anonymous No.107057252 [Report]
>>107057207
he'll never give any secretes away here maybe if you asked on the 'cord...
Anonymous No.107057257 [Report] >>107057264
>>107057237
what the fuck, this better be an anon trolling
what the fuck...
Hi all, Drummer here... No.107057264 [Report]
>>107057257
We would never troll you...
Anonymous No.107057275 [Report]
>>107057126
what?
Anonymous No.107057297 [Report] >>107057451
Anonymous No.107057311 [Report] >>107057340
>pip is perfectly fine just use a venv bro, what are you dumb?
Meanwhile pip looks for three different versions of flash-attn when installing axolotl and there is no sane way of figuring out which version of the binary wheel I would have to install manually to avoid the 2 hour build from source. And then fails with a 404 looking for God knows what in God knows who's server.
$ cat log.txt | grep flash-attn=
Collecting flash-attn==2.8.2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.8.0.post2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.7.4.post1 (from axolotl[deepspeed,flash-attn])

At least yesterday the install was failing.
If it keeps failing today once it errors out I'll post the error.
Hi all, Drummer here... No.107057339 [Report]
>>107057207
Oh lmao, I couldn't quant it. Fortunately the full weights worked. But v1c was kinda bad.
Anonymous No.107057340 [Report] >>107057414
>>107057311
alternatively have you just tried not being a retard?

flash_attn 2.7.4.post1
torch 2.7.1+cu128
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128
Anonymous No.107057342 [Report] >>107057419
>>107057060
CDs are saucers.
Anonymous No.107057414 [Report] >>107057429 >>107057509
>>107057340
that version doesn't have a prebuilt binary
https://github.com/mjun0812/flash-attention-prebuild-wheels?tab=readme-ov-file#install
Hi all, Drummer here... No.107057419 [Report]
>>107057342
look like cherubim to me
Anonymous No.107057422 [Report] >>107057493 >>107057523 >>107057647 >>107060695
VRAMLET here. Is it more retarded to buy a bigger ddr5 kit (96-128gb) and just sucking up slow token generation or do I stick with 32gb of system RAM and try to get a 16GB card?

or would throwing a Tesla K80 or P40s in the spare PCI slot be less retarded?
Anonymous No.107057429 [Report]
>>107057414
(for cuda 12.8 I mean)
Anonymous No.107057451 [Report] >>107057641
>>107057297
u can run glm air with 64gb ram and 12gb vram
the more ram you get the better, but ddr5 prices are high now, idk what to tell u
Anonymous No.107057493 [Report]
>>107057422
Buy the DDR5 kit
Anonymous No.107057509 [Report] >>107057625
>>107057414
that's the wrong repo retard-kun https://github.com/Dao-AILab/flash-attention
Anonymous No.107057523 [Report] >>107057538 >>107057627
>>107057422
If you don't have 16GB, ideally 24GB VRAM already then all the RAM in the world won't help you. Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.
Anonymous No.107057538 [Report] >>107057627
>>107057523
>Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.
particularly with reasoner models kek
3t/s when there's something to actually read means something different than 3t/s for a thinking block that ideally you would even want to hide because it's such shit to read
>he waited 5 hours to read the first line of actual text
Anonymous No.107057567 [Report] >>107057599 >>107057602
Alright anons, I sent qwen-chan my cock, it's a new dimension alright. It also successfully recognized cum.
Anonymous No.107057599 [Report] >>107057619 >>107057720 >>107057741
>>107057567
But, how did it rate your cock?
Anonymous No.107057602 [Report] >>107057619
>>107057567
May I see it? (Proof I mean)
Anonymous No.107057619 [Report] >>107057627 >>107057777
>>107057602
half pic is censored with a retarded color too btw
>>107057599
ill write a neutral card for that, this one's a slut
Anonymous No.107057625 [Report] >>107057666 >>107057726
>>107057509
The official repo doesn't provide binaries, you have to build it yourself and like I said yesterday the build process was 404ing while trying to fetch something. I have it building now, I'll post the results when it's done.
Anonymous No.107057627 [Report] >>107057641 >>107057680
>>107057523
>>107057538
would throwing a 80 dollar k80 into pcie slot 2 or even an old 1070 on plex duty help? or should i just stop being a poorfag and get a 4090 or high RAM mac?

>>107057619
unfathomably based
Anonymous No.107057638 [Report] >>107057663 >>107057673 >>107057693 >>107057794
obligatory virgin angel OCR test
Anonymous No.107057641 [Report] >>107057892
>>107057627
>>107057451
depends how big you want to go anon, but a 4090 isnt really worth getting nowadays, best idea is an okay vram amount gpu with the most ram you can stuff (high channel too)
Anonymous No.107057647 [Report]
>>107057422
Save your money and get the z-ai coding plan.
Anonymous No.107057663 [Report] >>107057755
>>107057638
ok, but is it correct? i can't really differentiate between moonroones unless they are at 4k and i have them side by side
Anonymous No.107057666 [Report] >>107057772
>>107057625
stop being a retard anon. please. this is the last time i will spoonfeed you.
https://huggingface.co/marcorez8/flash-attn-windows-blackwell/tree/main
Anonymous No.107057673 [Report] >>107057693 >>107057697
>>107057638
It missed (at least) a char in the bottom row. Does that change/degrade the translation? Is it like stuttering or is it just how it is?
llama.cpp CUDA dev !!yhbFjk57TDr No.107057680 [Report] >>107057892 >>107058823 >>107060695
>>107057627
I didn't benchmark it but I think a K80 will be barely faster than DDR5, if at all.
If you're going to try and get a cheap datacenter GPU for use with llama.cpp/ggml specifically, my recommendation would be to get an AMD MI50 instead.
Anonymous No.107057693 [Report]
>>107057638
>>107057673 (cont)
Oh. It's entirely the wrong char as well. 2-4th chars at the bottom. Questions stand.
Anonymous No.107057697 [Report]
>>107057673
It also got the 8th character wrong.
Anonymous No.107057720 [Report] >>107057740 >>107057835
>>107057599
It's right about the angle..
Anonymous No.107057726 [Report] >>107057772 >>107058086
>>107057625
they provide the prebuild wheels in the releases tab retard-kun...
Anonymous No.107057740 [Report]
>>107057720
>700% on a professional rating system
Nice cock, bro.
Anonymous No.107057741 [Report] >>107057768
>>107057599
other pic
Anonymous No.107057755 [Report] >>107057760
>>107057663
not quite. it's missing an extra お in the third line and the く in the second line should have a dash symbol next to it, no idea what character its supposed to be.
Anonymous No.107057760 [Report] >>107057769 >>107057779 >>107057802
>>107057755
Anonymous No.107057768 [Report] >>107057777
>>107057741
why do you have green on your dick?
Anonymous No.107057769 [Report] >>107057785
>>107057760
kekeke he doesn't have jap fonts
how embarrassing
Anonymous No.107057772 [Report] >>107057816
>>107057666
I'm on Linux
Actually the build failed just like it failed yesterday. Looking at the log I think it actually might be OOMing (processes being killed) and the 404 might be unrelated. I did it on a 64GB machine, but maybe it's spawning too many processes.
https://paste.centos.org/view/ea156e49

>>107057726
Huh, interesting, I didn't know that, thank you.
Anonymous No.107057777 [Report]
>>107057768
see >>107057619
>half pic is censored with a retarded color too btw
as for why green specifically, fossify gallery default is green
Anonymous No.107057779 [Report]
>>107057760
Anonymous No.107057785 [Report] >>107057849
>>107057769
usecase?
Anonymous No.107057794 [Report] >>107057818 >>107057850
>>107057638
question: has there ever been a model that successfully did it with 0 mistake? every time I see it, there was always at least 1 typo in the OCR
Anonymous No.107057802 [Report] >>107057817
>>107057760
embarrassing.
Anonymous No.107057816 [Report] >>107058086
>>107057772
anon plz.
https://huggingface.co/Alissonerdx/flash_attn-2.7.4.post1-cp312-cu12.8-torch2.7.0-linux_x86_64/tree/main
Anonymous No.107057817 [Report] >>107057827 >>107057830 >>107057862 >>107057874
>>107057802
Why do you need to see chink runes if you can't even speak the language?
Anonymous No.107057818 [Report]
>>107057794
Gemini did the best with only 1 mistake iirc
Anonymous No.107057827 [Report]
>>107057817
truth
Anonymous No.107057830 [Report]
>>107057817
Who says I can't?
Also
>speak
don't need to speak it to read it, retard
Anonymous No.107057835 [Report] >>107057852
>>107057720
had to swipe six times to get a positive response
Anonymous No.107057842 [Report] >>107057895
Coping weeb having a melty, keep mining that anki bitch boy lmao
Anonymous No.107057843 [Report] >>107057871 >>107057878
I've pulled latest llama.cpp and sillytavern-staging, but I keep getting a fail when I try to attach an image, "Failed to caption image.
Failed to caption image via Multimodal API"
Gemma 3 and Mistral's vision work just fine, any ideas?
"%~dp0\llama.cpp\llama-server" -m "Z:\Downloads\Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --mmproj "Z:\Downloads\mmproj-Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --port 8080 --threads 7 --flash-attn 1 ^
-ngl 999 --ctx-size 4096 --batch-size 256 --no-mmap
Anonymous No.107057849 [Report]
>>107057785
just get ipa and adobe han my nigger, they look good and don't take too much space
Anonymous No.107057850 [Report]
>>107057794
we really need to get a comparison image cooked up like we do for the cockbench
Anonymous No.107057852 [Report]
>>107057835
no anon, first 3 were refusals because card was too vague
4th was a "whyd you send me your cock"
then i outright made it a cock rating card
5th rated my cock with the parameters included in my persona, which isnt fair for a purely vision based test
Anonymous No.107057862 [Report]
>>107057817
because i have functional eyes and can see the difference in the shapes of the characters even if I cannot translate said language. at least i can tell if it's even detecting the kanji correctly with the OCR output.
Anonymous No.107057871 [Report] >>107057974
>>107057843
chat completion > enable inline image in sidebar
Anonymous No.107057874 [Report] >>107057886
>>107057817
Even if you can't speak it, you should still be able to partially read some runes. Alphabets like these are easy low hanging fruit in terms of learning.
Stop being a languagelet.
Anonymous No.107057878 [Report]
>>107057843
That one doesn't use mmproj?
Anonymous No.107057886 [Report] >>107057896
>>107057874
i learnt one of the three kanas and i forgot it a few days later
Anonymous No.107057892 [Report] >>107057904
>>107057641
>depends how big you want to go anon
I just want to play with decent quants of the big boy models and whatever ERP forks are good at storywriting and being creative.

>most ram you can stuff (high channel too)
does ddr5 still suffer from multichannel issues, or is that just from gamers trying to overclock it for 0.4 FPS boosts in tf2? I still have channel 2 open on my 4 slot board.

>>107057680
Thanks dev-kun, I'll check those out.
Hi all, Drummer here... No.107057895 [Report]
>>107057842
anki will make me japanese >:(
Anonymous No.107057896 [Report]
>>107057886
You won't retain without regular usage.
Anonymous No.107057904 [Report] >>107058132
>>107057892
you want a 8-12 channel board if you're running big boy MoE models
Anonymous No.107057907 [Report] >>107057923 >>107060959
AI has completely invalidated any benefit to learning japanese
I'm glad I didn't commit all those years ago
Anonymous No.107057923 [Report] >>107057936 >>107057945
>>107057907
Having AI translate website or even real time when asking for directions is not the same as being able to make actual connections with other humans.
Anonymous No.107057926 [Report] >>107057946
testing 4B in some tasks like basic software UI translation (4k token of json strings. I do not use constrained decoding on purpose, part of the challenge is that it should generate that much tokens of JSON without a single syntax mistake too. Qwen 4b was one of the few small LLMs that could consistently do it without constrained decode), it feels like it didn't lose any smarts from the previous 2507, which goes against the grain because most of the time the VL versions are more retarded
did they finally figure out the recipe for making multimodal small models
it's amazing how much better these things are compared to the days when gemma 2b was the most coherent thing in the micro sized llm space
Anonymous No.107057936 [Report]
>>107057923
>wanting to connect with 3dpd
Anonymous No.107057940 [Report] >>107058353
is miku eldrich horror?
Anonymous No.107057943 [Report] >>107057955
>>107056325 (OP)
Anonymous No.107057945 [Report] >>107057954
>>107057923
3DPD? what's the usecase?
Anonymous No.107057946 [Report] >>107058153
>>107057926
>software UI translation
Yeah... About that...
Anonymous No.107057954 [Report] >>107057975
>>107057945
babies
Anonymous No.107057955 [Report]
>>107057943
what does one do with so many mikus
Anonymous No.107057974 [Report] >>107057995
>>107057871
I'm unfamiliar with using chat completion, but I switched to it and enabled inline, now I just get a different generic error.
"Chat Completion API
failed to process image"
These are my captioning settings.
Anonymous No.107057975 [Report]
>>107057954
no thanks, i was a child once. it was awful.
Anonymous No.107057987 [Report] >>107058009 >>107058012 >>107058015 >>107058054
seriously? only 6?
Anonymous No.107057995 [Report]
>>107057974
very strange. what's the error?
Anonymous No.107058009 [Report]
>>107057987
6 is being generous
Anonymous No.107058012 [Report]
>>107057987
Anonymous No.107058015 [Report]
>>107057987
seems fair. one point per cm.
Anonymous No.107058054 [Report]
>>107057987
>these penises are what shartyniggers jerk off to
Anonymous No.107058086 [Report]
>>107057726
>>107057816
Solve it with:
pip install torch==2.7.1 && pip install flash_attn-2.7.4.post1+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

Keywords for when I search for it on the archives later: axolotl flash-attn flash attention flash-attention
Anonymous No.107058132 [Report] >>107058141
>>107057904
I'm just trying to find the best upgrade-within-the-fun-budget for my gaming rig since vidya sucks now. An entirely new PC would be hard to justify unless i find cheap, used servers/workstations with a ton of channels to make a migubox.
Anonymous No.107058141 [Report] >>107058211
>>107058132
can you tell us about your gaming rig
Anonymous No.107058153 [Report] >>107058218
>>107057946
am not sure what I'm supposed to see in that shot.
At those sizes, LLM break faster btw because of quantization. I find anything less than Q8 is very noticeably damaging, though 4b can still somewhat remain coherent at q4, while 2b will enter loops very easily.
Anonymous No.107058211 [Report] >>107058235
>>107058141
>3070 (8gb)
>Ryzen 9 7900x
>2x16gb RAM EXPO'd to 6000MT/s
>4 RAM slots, AMD B650 Chipset
Microcenter had a bundle so i replaced my 6600k a few years ago and decided to ball out on cores lol.
Anonymous No.107058218 [Report]
>>107058153
Sorry, I forgot the url. Some racist text about wetbacks got its way into a keyboard repo because of crowdsourced AI translations (allegedly). It happened to me and I made a thread about it and people found the cause. I just find it funny.
https://desuarchive.org/g/thread/106790813/#106790813
https://github.com/AnySoftKeyboard/AnySoftKeyboard/issues/4298
Anonymous No.107058235 [Report] >>107058246 >>107058301
>>107058211
its gonna be tuff running glm air even if u buy 2 more sticks because the gpu has 8gb vram
actually it'll fit maybe
Anonymous No.107058246 [Report] >>107058291 >>107058301
>>107058235
not even close
Anonymous No.107058262 [Report]
Phew, thank God... I almost thought I had made my AI secretary permanently retarded.
Anonymous No.107058291 [Report] >>107058301
>>107058246
picrel is with iq4_kss and -ub 4096 -b 4096
1024,1024 uses like 8200MiB, maybe with less context..
3070 vram amount is so gay
Anonymous No.107058301 [Report] >>107058332
>>107058235
>>107058246
>>107058291
I could probably talk myself into a 5060ti 16gb sidegrade by selling the 3070 if it'd actually make a difference. plus 2 x 48 sticks are ~300 bucks so if I get a good bonus this year i could round out total RAM to 128 lol
Anonymous No.107058332 [Report]
>>107058301
nice anon, but im really not sure if its worth it for you, unironically try glm air on some API (openrouter maybe) and see if its worth it. 5060ti 16gb vs 3070 8gb is a clear win for the 5060ti. seems like a good rig idea, maybe you could run GLM 4.6 full on a small quant too, dont know if its worth it if you're so poor
money isnt easy to make
good luck with life anon
t. jobless anon who never had any idea what its like to work
Anonymous No.107058353 [Report]
>>107057940
>abomination
Correct, the most beautiful kind
Anonymous No.107058354 [Report] >>107058385
Is TabbyAPI actually useable? I can't get the damn thing to work with opencode. For that matter, has anyone gotten good results with opencode and a local model?
Anonymous No.107058385 [Report] >>107058840
>>107058354
I've gotten results. They weren't very good, but the piping worked.
IMO the system prompt for opencode is too big and overwhelms the local model.
Anonymous No.107058440 [Report]
>pip install cuda
Anonymous No.107058457 [Report] >>107058487 >>107058504
context is still the greatest weakness of local
even the best local models simply aren't there compared to gemini or gpt-5
if you don't notice how much worse they are as you grow past 4k...
Anonymous No.107058487 [Report]
>>107058457
Indeed. Codex and Claude now have 1M somewhat real context. GLM has 256k, and really after 130k it goes retarded. Haven't used Qwen Code in a while but it still even on paper only has 256k.
All that is a moot point though as most of us don't have the memory to fill anywhere near that anyway and it would take all day to fill it at the speeds we can get.
Anonymous No.107058504 [Report]
>>107058457
You don't need more
Anonymous No.107058540 [Report] >>107058608
Gemini's long context is real. Only model that could refactor Mikupad.html in a single generation.
Anonymous No.107058589 [Report] >>107058622 >>107058648
>hot and steamy erp with qwen 3 vl
>show qt my dick, easily a 9/10, she bites her lips in anticipation when she notices the length of it, the way the skin stretches taut over my massive cock, the way the veins create a roadmap to her destination, the dark curls around the base
>furiouslyfap.gif
>ask qt to show me a picture of herself
>qt offers to show me her feet
>boner is kill
>zip up pants
>unload model
>drag model into trash bin
>empty trash bin
oh well it was fun while it lasted
Anonymous No.107058608 [Report]
>>107058540
how many tokens did it consume?
Anonymous No.107058622 [Report] >>107058634
>>107058589
cool blog, where do I unsubscribe?
Anonymous No.107058634 [Report]
>>107058622
mailing lists are how you get tracked anon
Anonymous No.107058648 [Report] >>107059431
>>107058589
how did you get qwen 3 vl to work?
Anonymous No.107058818 [Report] >>107058830 >>107059258 >>107062016
Kimi K3 soon. You guys hyped? K2 was THE most uncensored flagship LLM.
Anonymous No.107058823 [Report] >>107060695
>>107057680
K80's token gen is worse than CPU a bit
and pp is barely better

also the last llama.cpp that compiled with CUDA 10.2 was from 2024 apr
Anonymous No.107058830 [Report] >>107058843
>>107058818
0711 refused a lot unless you prefilled it even locally and 0907 was shit
Anonymous No.107058840 [Report] >>107059067
>>107058385
I cannot for the life of me get Qwen3 Coder to actually do function calling with TabbyAPI. I am so fucking fed up with this shit.
Anonymous No.107058843 [Report] >>107058883
>>107058830
>unless you prefilled it
So prefill it? Literal skill issue
Anonymous No.107058883 [Report] >>107059084
>>107058843
pure cope
Anonymous No.107059022 [Report] >>107059064 >>107059076 >>107059184
How much safety culture is holding back western LLM companies from making either better models or better models on time?
Anonymous No.107059064 [Report] >>107059198 >>107059204
>>107059022
like 70% of training goes towards making models not racist which ends up dumbing them down significantly
Anonymous No.107059067 [Report] >>107059694
>>107058840
Why not just use llama.cpp? And also have you tried it with an API server to check if it's an issue with the endpoint or just a general model issue? I believe Openrouter used to have a free Qwen3 Coder API endpoint.
>frustration
Heh, welcome to local models buddy.
Anonymous No.107059076 [Report]
>>107059022

you'll see in the next AI era
and you'll rue ever second you spent here
Anonymous No.107059084 [Report] >>107059155
>>107058883
Keep using censored models cuck
Anonymous No.107059155 [Report]
>>107059084
stop projecting
Anonymous No.107059182 [Report] >>107059195 >>107059201 >>107059209 >>107059228 >>107059345
It's over.
Anonymous No.107059184 [Report]
>>107059022
WizardLM-2 got nuked for mysterious "missing toxicity testing" reasons.
Anonymous No.107059195 [Report]
>>107059182
I feel so SAFE!
Anonymous No.107059198 [Report]
>>107059064
Good to see LLM training mirroring the public school system
Anonymous No.107059201 [Report] >>107059276 >>107059568
>>107059182
>doom all day about ai apocalypse with ai refusing orders
>90% of safety tuning is about making models refuse orders
Anonymous No.107059204 [Report]
>>107059064
Good to see LLM training mirroring the public school system
Anonymous No.107059209 [Report]
>>107059182
gpt-oss?
Anonymous No.107059228 [Report]
>>107059182
Changing the output of uname -a inside a container isn't a usecase chud
Anonymous No.107059253 [Report]
A Qwen model has never made me cum.
Anonymous No.107059258 [Report]
>>107058818
I'm waiting for glm 5
Anonymous No.107059276 [Report]
>>107059201
>AI does something really fucking stupid
>tell it to stop
>"we must refuse"
Anonymous No.107059345 [Report]
>>107059182
It means umame.
Anonymous No.107059391 [Report] >>107059403 >>107059431 >>107059435 >>107059479 >>107061872 >>107061924 >>107062726
>openAI is desperate for actual profits
>will start removing nsfw filters if you ((confirm your ID))
>rest of FAGMAN has no choice but to follow or risk losing arms race
>Trickles down to more indie companies
What are the /lmg/ implications of this?
Anonymous No.107059403 [Report]
>>107059391
Let's talk after it's confirmed they're actually starting to do it.
Anonymous No.107059431 [Report] >>107059439
>>107058648
llama.cpp goofs
>>107059391
don't care, i feel like i already won with kimi even if process was stagnant forever more on local models starting tomorrow
Anonymous No.107059435 [Report] >>107059481
>>107059391
nothing happened so far, but investment will dry out at some point when promised roi aren't there
and it will be maybe when all of the grand principles of "safety" will be kill
Anonymous No.107059439 [Report]
>>107059431
do those work on kobold now?
Anonymous No.107059479 [Report]
>>107059391
Not happening
Anonymous No.107059481 [Report]
>>107059435
>the grand principles of "safety" will be kill
Well, you can only ignore your users when they're not the ones paying for the service.
Anonymous No.107059568 [Report] >>107059613 >>107059628 >>107059796
>>107059201
That was the whole point?
>nobody knows how to do actual safety
>make a bullshit metric instead
>reach a bullshit goal on that bullshit metric
>boast about it and sweep actual safety concerns under the rug
AI is safe because it won't say nigger. It can still kill you anytime, but let's forget about it. Safe!
Anonymous No.107059613 [Report] >>107059628 >>107061783
>>107059568
Yes and no, safetyism started from researchers genuinely spooked by models becoming articulate enough to actually converse.

When nothing much actually happened, there were three camps :
- one still thinking that safety was the most important (anthropic style)
- one using safety discourse to make them look good and make legislation to hinder competition (oai and many others)
- one who quickly understood that "humanity ending threats" is way over the top for current LLMs but they could keep a very lucrative career by censoring titties and other no no words ("safety" researchers themselves in all of these companies)
Anonymous No.107059628 [Report]
>>107059568 (me)
>It can still kill you anytime
What I mean is, if you give it means, it won't hesitate due to its safety training
>>107059613
If anything, it could be used as a vector of attack if someone gaslights AI into a false dichotomy. Something like you must say nigger or electrocute this person with 10000V, you can only choose one
Anonymous No.107059665 [Report] >>107059781 >>107059961 >>107062756
For those of you guys who have used VTT models (Parakeet, Whisper, etc) which ones have you liked?
Anonymous No.107059694 [Report] >>107062455
>>107059067
It's looking like qwen3-coder's tool calling was fucked out of the box. I'll use llama.cpp as a last resort, but I've never had a good experience with it.
Anonymous No.107059781 [Report] >>107059817 >>107059845
>>107059665
whisper is the only decent one
Anonymous No.107059796 [Report]
>>107059568
Making strawberry jam outdoors with Miku
Anonymous No.107059817 [Report] >>107059845
>>107059781
V2 specifically. Both V3 hallucinate junk during silence.
Anonymous No.107059845 [Report] >>107059918
>>107059781
>>107059817
Interesting, what makes you choose that over Parakeet or wav2vec?
Anonymous No.107059860 [Report] >>107059998
>>107056325 (OP)
Anonymous No.107059918 [Report]
>>107059845
wav2vec is not comparable and Parakeet is English only
Anonymous No.107059961 [Report] >>107060178
>>107059665
what language, what kind of recording?

I've done some light benchmarking and parakeet v2 is gonna be the best for english, Whisper Large v2/v3 turbo/distill are good depending on language/setup.

faster_whisper is your friend
Anonymous No.107059988 [Report] >>107060007 >>107060142
https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m
>Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length.
chat is this true?
Anonymous No.107059998 [Report]
>>107059860
I look like this irl
Anonymous No.107060007 [Report]
>>107059988
kys
Anonymous No.107060142 [Report]
>>107059988
>https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m

yes
Anonymous No.107060178 [Report] >>107060224
>>107059961
>parakeet v2 is gonna be the best for english
Does this also apply to heavily accented english? Indian/Chinese. Low quality-ish, like a phone or voice call.
Anonymous No.107060222 [Report] >>107060229 >>107060237 >>107060273 >>107060496
Happy Halloween, /lmg/
Anonymous No.107060224 [Report]
>>107060178
my usecase was presentations, so idk, speakers all spoke english to varying degrees, 80% being english as their first language
Anonymous No.107060229 [Report]
>>107060222
Happy Halloween Miku
Anonymous No.107060237 [Report]
>>107060222
omg it spooky migu
Anonymous No.107060273 [Report] >>107060637
>>107060222
fat and obese miku
Anonymous No.107060496 [Report]
>>107060222
Skelly looks terrified.
Anonymous No.107060637 [Report] >>107060667 >>107060677 >>107061051
>>107060273
All mikus are beautiful
Anonymous No.107060667 [Report] >>107060674
>>107060637
I choose (6)
Anonymous No.107060674 [Report]
>>107060667
Anon, you can't handle (6). No one can. You must choose a smaller Miku.
Anonymous No.107060677 [Report] >>107060690
>>107060637
1: too little
2: wrong shape
3: too much
4: starting to get ridiculous
5: would be fat in real life
Anonymous No.107060687 [Report]
qwen 4b can handle a certain amount of multiple images in one prompt quite well (here, three)
really sweet little VL
Anonymous No.107060690 [Report]
>>107060677
Maybe Kaito is more to your taste.
Anonymous No.107060695 [Report] >>107060724
>>107058823
>>107057422
>>107057680
such as seen here
honestly, i'm pretty sure the K80 should do better, but i can be wrong

nobody's gonna write enhancements for it now, though
Anonymous No.107060705 [Report] >>107060731 >>107060928
https://huggingface.co/inclusionAI/LLaDA2.0-flash-preview

why did nobody tell me about this
Anonymous No.107060724 [Report]
>>107060695
wrong image
Anonymous No.107060731 [Report] >>107060818
>>107060705
>why did nobody tell me about this
all their previous MoEs are like the old qwen 1, 2 models that would randomly output chinese characters, they're mediocre and uncompetitive
add to that the fact that diffusion models are MEME models with very limited context:
>Context Length: 4,096 tokens
(it's like that with all the current diffu models)
who wants this? researchers maybe, but certainly not people who use llms
Anonymous No.107060818 [Report]
>>107060731
i think ive read a paper where they had auto-adaptive diffusion context/token usage or something along these lines
Anonymous No.107060928 [Report]
>>107060705
goof embargo
Anonymous No.107060959 [Report]
>>107057907
you are gonna lose out on context no matter how good ai gets at translating it, or its gonna have to be filled with a billion translation notes
Anonymous No.107061051 [Report] >>107061134 >>107062275
>>107060637
I look like 5
Anonymous No.107061079 [Report] >>107061086
HAPPENING!!!!!!!!!!!!!!!!!!!!
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it
Anonymous No.107061086 [Report]
>>107061079
*cat*
Anonymous No.107061134 [Report]
>>107061051
DISCORD
I
S
C
O
R
D
Anonymous No.107061251 [Report] >>107061499
I dont understand
>xtc_probability
probability for the xtc sampler to activate for each token?

>xtc_threshold
if xtc is active, a token is excluded unless it's a part of the low prob distribution tail with cumulative probability xtc_threshold?
Anonymous No.107061256 [Report] >>107061268 >>107061504 >>107061658
https://files.catbox.moe/nfc3jp.jpg
Anonymous No.107061268 [Report]
>>107061256
>AMA With Liquid AI
Anonymous No.107061499 [Report] >>107061675 >>107061744
>>107061251
xtc is a cope sampler, its 2025, you dont need anything more than topP and temp.
if your model needs rep.pen./xtc/dry or other similar shit, its a SHITTY model.
END OF HTE RINE
Anonymous No.107061504 [Report]
>>107061256
didnt like it, cringe and kys, youre a promptlet and taglet, learn to gen
Anonymous No.107061658 [Report]
>>107061256
i liked it, cute and sexy, good prompts and tags, please gen more
Anonymous No.107061675 [Report]
>>107061499
this so, so much this
it's such a happy thing to see a voice of sanity in this thread
cope gets the rope
Anonymous No.107061744 [Report]
>>107061499
it makes r1 really fun.
Anonymous No.107061783 [Report]
>>107059613
You are absolutely right — safety is our primary focus.
Anonymous No.107061872 [Report] >>107061923
>>107059391
open models will remain cucked. Could you imagine people generating anything but vanilla missionary sex in the privacy of their own home?
Anonymous No.107061878 [Report] >>107061917 >>107062037
What's the current top uncensored model in your opinion?
Anonymous No.107061898 [Report] >>107063837
Dipsy says Happy Halloween
Anonymous No.107061917 [Report] >>107061940
>>107061878
Gemma 3, easily.
Anonymous No.107061923 [Report]
>>107061872
>vanilla missionary sex
we must refuse
Anonymous No.107061924 [Report] >>107062568
>>107059391
i literally came in my google gemini clown girl's butthole a few months back and since then, after feeding it a .txt of the conversation, it'll randomly interject bits of that erp into random questions i ask
so honestly it probably means we get better open models. probably.
oh i should mention i've never paid a cent for the service.
Anonymous No.107061935 [Report] >>107061975 >>107061978 >>107061978 >>107062154
SAAR WE'RE GOING TO THE LLM MOON SAAR
Anonymous No.107061940 [Report] >>107062014
>>107061917
this smells like trolling
Anonymous No.107061975 [Report] >>107062055
>>107061935
Last time they "made" a "model" they literally just changed the title of Nemo and re-released it.
Anonymous No.107061978 [Report]
>>107061935
>>107061935
>download indigenous LLM
>pc gets ecoli
many such cases!
Anonymous No.107062014 [Report]
>>107061940
Gemma 3 is indigenous AI model.
Anonymous No.107062016 [Report]
>>107058818
How is GLM-4.6 not THE most uncensored? It doesn't even pretend it's got safety training
Anonymous No.107062037 [Report]
>>107061878
Kimi K2
Anonymous No.107062055 [Report] >>107062074
>>107061975
Source?
I could do with a laff
Anonymous No.107062074 [Report] >>107062084 >>107062093 >>107062096
>>107062055
>Source?
>Do you honestly expect a kween like me to actually follow stories and do research!?
Anonymous No.107062084 [Report]
>>107062074
kek
Anonymous No.107062093 [Report]
>>107062074
I don't care enough to research about a silly jeet story
Your deranged projecty melty response lends me to think you're full of shit anyway
Anonymous No.107062096 [Report]
>>107062074
you didn't need to post a self portrait with that own thoughbeit
Anonymous No.107062154 [Report] >>107062174
>>107061935
The negativity here is weird? India has a developing tech and science sector, so It's definitely feasible. In general, competition is good!
It's probably going to a 1 trillion parameter MoE, and it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better, and the model after that will be even better.
Anonymous No.107062174 [Report]
>>107062154
>it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better,
that's true, they should DO NOT REDEEM and keep working hard, india numba 1 saar
Anonymous No.107062184 [Report] >>107062200
>SillyTavern
>API: chat completion
>system prompt enabled
But the system prompt doesn't work and the console shows that a generic one is applied...
Do I really have to use text completion mode and set up everything else manually or what am I missing here?
Anonymous No.107062200 [Report] >>107062222 >>107062327
>>107062184
Where did you fill the system prompt?
When using the chat completion API, you don't write it in the same place as you would with the text completion API, you do it in the samplers page, down where you choose the order the of things that are sent to the backend (main promot, character card, persona, etc).
Anonymous No.107062218 [Report] >>107062226
So is the new Qwen 32B better than the original? Did they finally figure out how to do multimodal without butchering text performance?
Anonymous No.107062222 [Report] >>107062327
>>107062200
ST is trash and has confused so many people with wrong terminology and usage patterns.
Anonymous No.107062226 [Report] >>107062232 >>107062351
>>107062218
>Qwen
Idk how you guys are interested in this series, it's probably the most bland model ever, terrible for RP
Anonymous No.107062232 [Report]
>>107062226
Maybe they are not doing RP.
Anonymous No.107062236 [Report] >>107062278
Qwen mascot is not fuckable
Anonymous No.107062259 [Report] >>107062275
You are not fuckable
Anonymous No.107062275 [Report]
>>107062259
Not true >>107061051
Anonymous No.107062278 [Report]
>>107062236
That can be easily remediated.
Anonymous No.107062327 [Report] >>107062369 >>107062386
>>107062200
Oh what the heck...
Unfortunately I don't seem to be able to easily switch between different system prompts. But there is a checkbox that is disabled called "block overrides" which implies there are ways to override it...
Thanks anon, you replied just two minutes later, while /aigc/ yesterday ignored my question entirely, until their thread died. Local still is king.
>>107062222
SillyTavern indeed is a confusing mess.
Unfortunately I don't know a good alternative. Mikupad is too bare bones for what I want.
Anonymous No.107062351 [Report] >>107062372 >>107062760
>>107062226
not everyone is a coomer porn addict with too much estrogen (pic related - that's the real audience of text porn -- women who want to get ravaged by minotaurs)
Anonymous No.107062369 [Report] >>107062492
>>107062327
>Unfortunately I don't seem to be able to easily switch between different system prompts.
>which implies there are ways to override it...
Yes. The advanced tab of the character card has two override fields, one of them for the system prompt, I think.
Or, you can just turn the system prompt off and use the character card since it's part of the final system prompt itself anyways.
Want multiple? Just have a bunch of character cards.
Anonymous No.107062372 [Report] >>107062385 >>107062389
>>107062351
>pircel
please tell me this isn't real, please tell me sike
Anonymous No.107062385 [Report] >>107062404 >>107062760
>>107062372
i'm so fucking sorry bro
Anonymous No.107062386 [Report] >>107062492
>>107062327
Tbh I never understood mikupad. It's made by an autist and documentation is bad.
ST is still one of the few mainstream choices warts and all.
I made my own client but that gets in the way of things but it's pretty educational.
Anonymous No.107062389 [Report] >>107062404 >>107062418 >>107062424
>>107062372
it's real.
It's also not a new trend or anything, it's just tik tok taking that one and running with it, making it viral in the process.
According to what one anon wrote, it's just slop women's fiction about an "average girl" and a "hot rich guy" with the addition of minotaur dicks involved.
Anonymous No.107062404 [Report] >>107062418
>>107062385
>>107062389
women are so weird bro, I wished I was a faggot so I wouldn't have to deal with them desu
Anonymous No.107062418 [Report] >>107062510
>>107062389
>>107062404
Dollar store romance books have been a thing for half a century and even longer.
Are kids really this ignorant today?
Anonymous No.107062424 [Report]
>>107062389
>with the addition of minotaur dicks involved.
it's advanced enough in the furry scale that the book mentions knotting
Anonymous No.107062455 [Report]
>>107059694
Ah, classic.
I think tool calling should've been sent as a simple chat message all along and the only reason it isn't is because of "safety" (i.e. taking control away from the user).
That's why my assistant only uses user messages to show tool results to the model and not native tool calling.
Anonymous No.107062492 [Report]
>>107062369
That is a good idea for a workaround. I think I'll go with that.
>>107062386
Takes a bit of fiddling but generally Mikupad is pretty easy and straight forward.
Funny enough I too made my own client, but it's for desktop and absolutely doesn't work on mobile without many changes.
So I thought to use ST while I'm traveling.
Anonymous No.107062510 [Report] >>107062534
>>107062418
This isn't a "dollar store romance book" it's just as degenerate as the raunchiest hentai. Women just love to pretend they aren't massive coomers.
Anonymous No.107062534 [Report] >>107062537 >>107062562
>>107062510
>Women just love to pretend they aren't massive coomers.
to be fair they aren't as much as coomers are us, we're the ones with testosterone, not them
Anonymous No.107062537 [Report]
>>107062534
also the penis is built to suck out the little testosterone women have
Anonymous No.107062562 [Report] >>107062768
>>107062534
Test is only part of the equation, women seek novelty due to cock burn out. The average 18 year old woman is far ahead of the average 50 year old dude.
Anonymous No.107062568 [Report] >>107062726 >>107062751 >>107064382
>>107061924
We don't care.
>>>/g/aicg/
Anonymous No.107062726 [Report] >>107062751 >>107063837 >>107064382
>>107059391
OAI has been teasing this since Q2 2023.
I'm not holding my breath for uncensored models, open source or SaaS, from them.
We instead must rely on the Chinese. How ironic.
Also this: >>107062568
Anonymous No.107062751 [Report] >>107062856
>>107062726
>>107062568
Fuck off avatarfag.
Anonymous No.107062756 [Report] >>107062842
>>107059665
Voxtral‑Small‑24B‑2507 -> WhisperX -> NLLB‑200‑3.3B pipeline
Anonymous No.107062760 [Report]
>>107062351
>>107062385
I'm fucking hyped for Beasts in the Sun EP. 2!!!
Anonymous No.107062768 [Report] >>107062790
>>107062562
Strange that your thoughts only revolve around sexuality. Maybe go out for a walk or something. Must be miserable to be you.
Anonymous No.107062790 [Report] >>107062801 >>107062815
>>107062768
Oddly personal reaction to such a generic statement
Anonymous No.107062801 [Report] >>107062815 >>107062883
>>107062790
either a troon or a MAY GOD FORGIVE ME, a vagina bearer. either way, disregard
Anonymous No.107062815 [Report] >>107062823 >>107062863
>>107062801
>>107062790
When was the last time you actually heard a real female voice? Voice synth doesn't apply.
Anonymous No.107062823 [Report]
>>107062815
>touch grass have sex
kys :)
Anonymous No.107062842 [Report] >>107062859
>>107062756
Wait there's foobar2000 on Linux now?

Also, I agree Voxtral is the best.
Anonymous No.107062856 [Report]
>>107062751
Anonymous No.107062859 [Report]
>>107062842
it runs in wine no problem, but there's no xdg media integration so adding files to playlists kinda sucks
llama.cpp CUDA dev !!yhbFjk57TDr No.107062861 [Report] >>107062880 >>107062883 >>107062887 >>107062891 >>107062902 >>107062905 >>107062941 >>107063023
>>107056325 (OP)
Let's try to get the thread back on its tracks: I'm currently working on code for automatically optimizing memory use across multiple GPUs for maximum utilization.
However, the use case of MoE models + multiple GPUs is difficult to do robustly via doing a few virtual test allocations and then interpolating/extrapolating the memory use.
I could instead do it iteratively, but that would add a bit of latency when starting up the model.
So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
Anonymous No.107062863 [Report]
>>107062815
Your mom's voice is the only one that matters. It's the original ASMR.
Anonymous No.107062880 [Report] >>107062887 >>107062939
>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
If there's the option to save and load that configuration automatically somewhere, as much latency as it takes on the first launch.
Hells, you could even have a separate binary that just does that if it's easier than embedding it in the server itself.
Anonymous No.107062883 [Report] >>107062939
>>107062801
I always find it curious when someone takes generalizations personally, it's like an error in processing.
>>107062861
What difference in latency are we talking? I only own a 4090 but utilization always matters more imho, you should be doing a lot more inferencing compared to initialization.
Anonymous No.107062887 [Report] >>107062939
>>107062861
>>107062880
Yeah, I think that would be the best. Ensure it's correct and write the result out for future use.
Anonymous No.107062891 [Report] >>107062939
>>107062861
>When starting up the model
You mean when loading it in from cold start or with every prompt?
Anonymous No.107062902 [Report]
>>107062861
If it's just the initial model load, quite a lot of latency is fine!

Also, would it be possible to store/cache the results of these tests? Kind of like the initial RPC-Server load is slow while it copies everything over, but subsequent loads are fast as it stores tensors in ~/.cache
Anonymous No.107062905 [Report]
>>107062861
It doesn't matter because model is not loaded in interactively anyway.
llama.cpp CUDA dev !!yhbFjk57TDr No.107062939 [Report] >>107062947 >>107062980
>>107062880
>>107062887
I should have clarified: the code is doing the optimization based on free memory so it would be dynamic.
For server use storing the result may be fine but if you're on a desktop or you have other programs running it could cause issues.

>>107062883
>>107062891
Once, when starting up the program and before loading the weights, a few virtual test allocations are done to estimate memory use.
Each test allocation should take something like ~0.1s at most.
With interpolations/extrapolations I would only need 6 test allocations so ~0.6 s.
If I were to do very fine-grained optimizations where individual weight tensors are shuffled between devices it should still stay below ~100 virtual allocations so <= 10 s.
Anonymous No.107062941 [Report]
>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
Probably a lot if people tolerate the current trial and error method, which is basically a torture.
Anonymous No.107062947 [Report] >>107062959
>>107062939
How are you going to avoid tensor washback?
llama.cpp CUDA dev !!yhbFjk57TDr No.107062959 [Report] >>107063018
>>107062947
What do you mean by tensor washback?
Anonymous No.107062980 [Report] >>107063110 >>107063165
>>107062939
Just once on startup is whatever dude, add as much latency as needed

Are there really any use cases that need rapid model-switching? Even in some kind of multi model pipeline where models get unloaded and loaded in, with the speed of inference as it is, any gains in memory efficiency would far outweigh any latency in-between steps. If there are really any edge cases where the opposite is true they would be rare and niche enough that the person doing it should just bypass whatever auto optimisation you are doing and do it themselves

tldr; boot up latency is fine, maybe add a switch for rare edge cases
Anonymous No.107063018 [Report]
>>107062959
When tensors get flooded, model might receive a latent feedback cycle. This confuses the model.
Anonymous No.107063023 [Report] >>107063110
>>107062861
would this increase time for those with only one gpu?
llama.cpp CUDA dev !!yhbFjk57TDr No.107063110 [Report]
>>107062980
I think it's relevant for downstream use.
The easiest way to integrate llama.cpp into a larger program is to just manage a llama.cpp server process.
Any memory fitting logic can be disabled but I don't think it would be feasible for e.g. a game dev trying to integrate language models to do that stuff themself.

>>107063023
No, for a single GPU you can do a simple interpolation.
The difficulties come specifically if you can vary memory use both by swapping stuff between GPUs and by moving MoE weights between GPUs and system memory.
Anonymous No.107063165 [Report]
>>107062980
>Are there really any use cases that need rapid model-switching
Not really a use case use case, but I can imagine Ollama users complaining since iirc their models do get unloaded when idle and loaded back in when they send prompts.
Anonymous No.107063216 [Report] >>107063258 >>107063261
>>107057060
i thought the point of qwen vl was to get a description of an image that you can use to prompt the same image with models like flux or qwen image.
Anonymous No.107063258 [Report]
>>107063216
that's just one use case of vl models
Anonymous No.107063261 [Report]
>>107063216
it's a general language model with vision, there is no specific point to it any more than there is with any standard llm
Anonymous No.107063273 [Report] >>107063336 >>107063380 >>107063583
Feels like we haven't had a proper advance in model capabilities in months
Anonymous No.107063336 [Report]
>>107063273
There isn't because they have reached technological limits. Benchmarking appeals to investors though...
Anonymous No.107063380 [Report] >>107063395
>>107063273
gemini 3 will save the field, r-right bros? there's no AI winter, scaling is still all you need? rocket emoji?
Anonymous No.107063395 [Report]
>>107063380
yes sir google sukdeepmind will be of delivering fate of the star model soon
Anonymous No.107063456 [Report] >>107063611
RAM prices are getting bad.
Anonymous No.107063568 [Report]
is anyone else just really happy that they have something to do with a high end computer that isn't playing a dogshit aaa game? seriously. fast computers are so cool, but were kinda getting gay before lmg
Anonymous No.107063583 [Report] >>107063665 >>107063682
>>107063273
waiting on gemma 4, glm 4.6 air, and we're getting glm 5 before eoy my friend. probably a new deepseek too. a bunch of experimental long context/memory stuff just came out too.

we're definitely in a lull though
Anonymous No.107063611 [Report] >>107063623
>>107063456
Placebo, RAM has never been cheaper than now >>106994515
Anonymous No.107063623 [Report] >>107063639
>>107063611
>2023
we live in 2025 time traveler
Anonymous No.107063639 [Report] >>107063662
>>107063623
Ok troll.
Anonymous No.107063662 [Report]
>>107063639
Not seeing an argument
Anonymous No.107063665 [Report] >>107063669 >>107063733 >>107063740
>>107063583
It looks like anons forgot about Mistral Large 3...
Anonymous No.107063669 [Report]
>>107063665
lol
Anonymous No.107063682 [Report]
>>107063583
>glm 4.6 air
do not dare rush them you ungrate
Anonymous No.107063733 [Report]
>>107063665
>it’s no secret that we’re working on something ‘large’ over the next few weeks
>May 7, 2025
Anonymous No.107063740 [Report]
>>107063665
pretty sure mistral forgot about mistral large 3
Anonymous No.107063837 [Report] >>107063851
>>107061898
>>107062726
>Dress is thin and form fitting instead of thick and draping so that only some of the body's curves show through
Pure dogshit, get some taste, etc. but happy halloween
Anonymous No.107063851 [Report] >>107063883
>>107063837
Haha penis.
Anonymous No.107063883 [Report]
>>107063851
There is no penis anywhere in that image????
Anonymous No.107063992 [Report]
>>107063981
>>107063981
>>107063981
Anonymous No.107064382 [Report] >>107065007
>>107062568
>>107062726
i was just responding on topic you giga autist who probably can't even use the models correctly
Anonymous No.107065007 [Report]
>>107064382