← Home ← Back to /g/

Thread 106996568

357 posts 82 images /g/
Anonymous No.106996568 [Report] >>106996714 >>106996944 >>106997410 >>106998310
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106986408 & >>106975556

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106996571 [Report]
►Recent Highlights from the Previous Thread: >>106986408

--Critique of AMD's AI GPU pricing and performance:
>106988788 >106988883 >106988901 >106988998 >106988932 >106989085 >106989144 >106989167 >106989210 >106989270 >106989289 >106989403 >106989315 >106989781 >106990321 >106988963
--LLM social media simulator development challenges and solutions:
>106988213 >106988320 >106988386 >106988504 >106988557 >106988673 >106988760
--Pruned GLM-4.5-Air translation quality issues in Chinese-English tasks:
>106990071 >106990094 >106990414
--Antislop sampler's limitations in addressing model collapse and stereotypical outputs:
>106986820 >106987031
--REAP performance evaluation beyond coding tasks:
>106989011 >106989576
--Data loss during ComfyUI update caution:
>106990303
--llama.cpp removes mistral-common dependency:
>106992735 >106992770
--LLM coding viability vs. hardware cost challenges:
>106993311 >106993319 >106993427 >106993447 >106993496 >106993730 >106993769 >106994515 >106994551 >106994595 >106994610 >106994612 >106994670 >106994666 >106994701 >106994967 >106995045 >106995064 >106995392 >106993477
--Assessing LLMs' utility as scientific writing assistants:
>106992842 >106992909 >106993250 >106993408 >106992918 >106992989 >106993354
--Optimizing GLM 4.5 Air's creativity through samplers and minimal system prompts:
>106987422 >106987911 >106995295 >106995450 >106995468 >106995558 >106995547
--LLM paraphrasing limitations and solutions for synonym repetition:
>106986884 >106987091 >106987239 >106992323 >106992343
--Inference inefficiencies and challenges in adapting coding models for roleplay:
>106987264 >106987307 >106987507 >106987620 >106994872 >106987696 >106988344 >106988423
--Mistral AI Studio platform launch:
>106995845 >106995893
--Miku (free space):
>106989693 >106992662 >106993105 >106993427 >106994546 >106994884 >106995336

►Recent Highlight Posts from the Previous Thread: >>106986411

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106996588 [Report] >>106996606 >>106996621 >>106996975
im bored
Anonymous No.106996604 [Report] >>106996606
I am euphoric.
Anonymous No.106996606 [Report]
>>106996588
>>106996604
i am indifferent
Anonymous No.106996611 [Report]
fuck you i'm leaving
Anonymous No.106996621 [Report]
>>106996588
I understand you're feeling bored! There are many exciting activities you could try, such as reading a book, going for a walk, learning a new skill, or connecting with friends. What are some of your interests?
Anonymous No.106996623 [Report] >>106996630 >>106996633
Anonymous No.106996630 [Report]
>>106996623
I look like this and I do this
Anonymous No.106996633 [Report]
>>106996623
very dumb caat is not for eat
Anonymous No.106996665 [Report] >>106996683 >>106996928
>>106996576
To teach it to not produce spaghetti code, to specialized the model (teach it about the topics I'm specifically interested in), and to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

>>106996468
GLM is particularly bad at this. Old models are dumb but they don't outright lie and make shit up (so often anyways).
Anonymous No.106996683 [Report] >>106996703
>>106996665
if you think glm 4.5 air hallucinates more than llama 3.3 70b then i have a bridge to sell you
Anonymous No.106996703 [Report] >>106996722
>>106996683
i have a bulge to sell you
Anonymous No.106996714 [Report] >>106996736 >>106996737 >>106996745 >>106996813 >>106996895
>>106996568 (OP)
was wonder if any knows of any prebuilts designed to run local llms?
Like I could plug it in, do some basic configuration and runs llms out of the box?
Anonymous No.106996722 [Report]
>>106996703
please to be gentle
Anonymous No.106996728 [Report]
repoastin coz Miku says always try your best
>>106996499
If the model can't perform with a basic min-p or maybe nsigma (tbd).. temp is just skewing the model probs=old/t there is no concept of temperature in training. If you're interested in temperature try dynamic temp and mod your inference stack to log the params at each sample, maybe to a format you can easily make some graphs of. There's too much woowoo with sampling, get data
>>106996592
Have you done something new or interesting with your llms recently? not cooming silly boy!
Anonymous No.106996736 [Report] >>106996792
>>106996714
buy a mac
Anonymous No.106996737 [Report]
>>106996714
Anonymous No.106996745 [Report] >>106996792
>>106996714
DGX Spark. Mac Pro. Your Mom.
Anonymous No.106996789 [Report] >>106996838
How do you deal with the fact every model repeats the same phrases and structures regardless of its context or prompts?
Anonymous No.106996792 [Report] >>106996926
>>106996736
>>106996745
Talking about some sort of selfhosting solution something I could just plug in connect it to my network and access remotely.
Anonymous No.106996809 [Report] >>106996876 >>106996950
testing nsigma=1
tw stinky lol
Anonymous No.106996812 [Report] >>106997062 >>107000963 >>107002585
So how does the Arc Pro B50 perform when it comes to running an LLM? I'm still interested in getting one just to have a live (low power!) LLM up whenever I may need one so I don't have to load and unload my 4090 all the time.
Anonymous No.106996813 [Report]
>>106996714
the ones made by egohot
Anonymous No.106996838 [Report] >>106996874 >>106996881
>>106996789
Oh sweet summer child, the path to true creative brilliance lies simply in cranking that temperature slider ALL the way up and whispering “be varied” three times while the GPU fans serenade you—works every time, trust the vibe!
Anonymous No.106996874 [Report] >>106996983
>>106996838
Maybe you are right but I wanted to create an elaborate office scenario and it's clear it is breaking down from the initial prompt. Difference here is that I have multiple characters defined.
Whereas my D&D with more of context is functional. I guess this might be because the model recognizes D&D better. But D&D has more static knowledge.
No, I'm not using ST.
Anonymous No.106996876 [Report] >>106996893 >>106997109
>>106996809
>thought for 3 minutes
imagine actually doing this
Anonymous No.106996881 [Report]
>>106996838
I mean, temp 3 topk 3 was a meme at some point
Anonymous No.106996893 [Report] >>106997109
>>106996876
jerk it a little
wait
come back
Anonymous No.106996895 [Report]
>>106996714
DGX spark

not as dollar efficient as trawling craigslist for cheap 3090s and assembling a rig from those but if you want to pay for a box you can just turn on that's the one you want
Anonymous No.106996904 [Report]
>>106996816
>in my experience GPT-OSS for eg is quite good
LOL
Anonymous No.106996923 [Report] >>106996945 >>106996958
What does context shift do in llama.cpp anyway? I thought it was an infinite context kinda thing where the earlier messages would get dropped as the context runs out but it's still refusing to keep going once the context gets filled?
Anonymous No.106996926 [Report]
>>106996792
imma plug in and connect with your mum tonight
Anonymous No.106996928 [Report] >>106997000
>>106996665
>to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

I'll be very impressed if you manage to achieve this through fine tuning, but I'd temper my expectations if I were you
Anonymous No.106996944 [Report] >>106996956
>>106996568 (OP)
Newfag here
How to use Adetailer on SwarmUI ??
Anonymous No.106996945 [Report] >>106996962
>>106996923
>but it's still refusing to keep going once the context gets filled?
context shift is no longer the default and you need to enable it with a flag now, thankfully
it makes models pretty stupid once you start context shifting, depending on where it suddenly cuts off
Anonymous No.106996947 [Report] >>106996993
In case someone out there is curious and really poor and masochistic. I have ddr4 and an old cpu, regular ram is really slow for air. had some vbios and regular bios hiccups but it worked out thanks to some other posts. very finicky gpu.

llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts

mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens

glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens



vulkan 3090+MI50 32gb ubuntu

mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens

glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens
Anonymous No.106996950 [Report] >>106997465
>>106996809
glm 4.5 a- oh it's a sweatsfag prompt. nevermind, go back to your gross fetish. maybe /aicg/ will appreciate it some more.
Anonymous No.106996956 [Report]
>>106996944
you want ldg, not lmg
Anonymous No.106996958 [Report] >>106996970 >>106997001 >>106997037
>>106996923
https://github.com/ggml-org/llama.cpp/issues/16693
Anonymous No.106996962 [Report] >>106996988
>>106996945
Yes, I thought I enabled it with --context-shift but it didn't seem to do anything. I might be confused though, guess I'll try it again.
Anonymous No.106996970 [Report] >>106996978
>>106996958
this is why people thrust into the kobold
Anonymous No.106996975 [Report] >>106996993
>>106996588
https://www.youtube.com/shorts/HEjJhrwXdCU
Anonymous No.106996978 [Report] >>106996991
>>106996970
>thrust
eww
Anonymous No.106996983 [Report] >>106997022
>>106996874
damn anon now you've given me the idea to tell kimi to treat everyday scenarios like a D&D campaign while keeping things grounded in reality. this could be fun.
Anonymous No.106996988 [Report] >>106997037
>>106996962
Make sure to define --ctx-size too. ST or whatever frontend you are using doesn't do much.
Anonymous No.106996991 [Report]
>>106996978
do not be worries henky! is very nice to new friends
Anonymous No.106996993 [Report]
>>106996947
thats pretty epic
>>106996975
uh...uh... what?
Anonymous No.106997000 [Report] >>106997026
>>106996928
I tried to finetune Llama 405B on a very powerful cloud machine but it didn't do much of anything. I think it's because I used the wrong alpha (I used a rank of 128 and a very conservative alpha of 32). Or maybe it was somehow fucked up in the merge or quantization to use it with Llama (I had to since Llama wouldn't directly load the converted LoRa to GGUF).
Anonymous No.106997001 [Report]
>>106996958
ggerganov is my hero
Anonymous No.106997022 [Report]
>>106996983
Yeah yeah I have a basic prompt like this
https://litter.catbox.moe/bvocjx49xlbfwrht.txt
With some other additions but it describes everything as if it was an interactive fiction game. You need to provide chat examples (eg. prefill) and match the general feel too.
Every 'character' is just additional information. System itself is called Game Master and model plays that role.
Anonymous No.106997026 [Report] >>106997053
>>106997000
>I had to since Llama wouldn't directly load the converted LoRa to GGUF).
i wonder why standalone loras are unpopular........
Anonymous No.106997037 [Report] >>106997054 >>106997072 >>106997107
>>106996958
Oh, thank you. Then if it doesn't do context truncation what *does* it do lol? Just temporarily extend the context until the current message gets delivered?

>>106996988
See the above anon's post. Apparently it's not even supposed to do context truncation.
I was using it with a code assistant.
Anonymous No.106997053 [Report]
>>106997026
I think I remember finetuning Llama 70B before and loading the standalone LoRa directly, but yeah.
Anonymous No.106997054 [Report] >>106997080 >>106997084
>>106997037
nothing now, ggerganof decided you didn't need this, probably hurts mistral template or something and they complained about it
Anonymous No.106997062 [Report]
>>106996812
https://www.youtube.com/watch?v=QW1j4r7--3U
Anonymous No.106997072 [Report]
>>106997037
Yeah but you need to define the context size with llama-server.
With some models which have vision capabilities context shifting cannot be turned on unless you turn some other switches.
Gemma needs these for ex. '--no-mmproj --swa-full' in addition to enabling context shift itself.
I have no idea how this behaves with other models than gemma.
And my builds are always late so I don't know what Mr. G has changed in the latest build.
Anonymous No.106997080 [Report]
>>106997054
features that make models behave retarded are not features but bugs
Anonymous No.106997084 [Report] >>106997097 >>106997192
>>106997054
IMO model-specific chat templates are an obsolete idea anyway.
Models should be smart enough now to recognize user and assistant messages without requiring a specific chat template, beyond the benefit of saving a few tokens per turn because the delimiters get converted to a single token.
Anonymous No.106997097 [Report] >>106997119
>>106997084
Why would any server need a chat template when it expects to get fed with the right format anyway?
I personally think server should just sit there and not handle anything extra outside of its basic purpose.
Anonymous No.106997107 [Report]
>>106997037
It removes the start of the context to free up space at the end, but model outputs degrade greatly after that. At the very least, the chat template stops making sense. It was never worth it, it never worked well. There's also the attention sink tokens, which shows why models break so badly with context shift.
>https://arxiv.org/abs/2309.17453
Anonymous No.106997109 [Report]
>>106996893
A little patience goes a long way in life
>>106996876
Nah wouldn't actually mᴀꜱtuRʙᴀte to this, mostly curious about the model behaviour
Anonymous No.106997119 [Report] >>106997142
>>106997097
You can thank OpenAI. They made it so the template was applied server side so you couldn't choose to use the model without the template.
Anonymous No.106997125 [Report] >>106997150 >>106997510
>itt idiots not realizing text completion is depreciated since long time and that now only chat completion is good
Anonymous No.106997142 [Report]
>>106997119
Yeah well I only send text from my own client and this needs to be formatted with specific template before it gets sent to the server.
Anonymous No.106997150 [Report] >>106997162 >>106997164 >>106997198 >>106997510
>>106997125
well, they still use jank frontends like sillytavern filled with useless nonsense to fiddle with too
Anonymous No.106997161 [Report] >>106997177
Playing around with the idea of running one model as the planner and then passing its output into another model to write the prose, with the hope that maybe such a process can be used to help improve consistency and characterization without also becoming more assistantslopped.
Basically sharing reasoning from one model to the other, though not necessarily using actual reasoning models. I'm just formatting a prompt, "here's the story so far; evaluate the state, tone, pacing, and your character's goals, then come up with four ideas and pick the one that least boring and most in-character", then sending the result in a follow-up chat message to another model. That way I can also pass instructions only the planner model sees and vice versa for the writer model.
I've been assuming that the big MoEs are better for planning but worse for writing, albeit just off of gut feeling. Any smaller models with particularly sovlful writing that might do well with a smarter model handing them a plan? Anyone had success with a method like this?
Anonymous No.106997162 [Report]
>>106997150
Even if one uses ST, server still sits there unless you use --jinja.
Anonymous No.106997164 [Report]
>>106997150
we need to make chat template mandatory in server and just throw an error when trying to do without, it would remove so many complaints about bad models.
Anonymous No.106997177 [Report]
>>106997161
Try Gemma 4B and see what it writes. Most of the stuff makes sense, and it is surprisingly good but if you want literature this is not the way to go.
Anonymous No.106997182 [Report] >>106997197 >>106997207
ST should just let you provide a jinja template straight from Huggingface instead of making you fuck with the horrible system of dozens of individual input boxes and having to guess how edge cases and conditions are handled.
Anonymous No.106997192 [Report]
>>106997084
If they were smart enough, base models would also be good enough, but try chatting with one.
Anonymous No.106997197 [Report]
>>106997182
you can, literally use chat completions instead of the deprecated text completion endpoint...
Anonymous No.106997198 [Report] >>106997220
>>106997150
>t. filtered by a few check and input boxes
Anonymous No.106997207 [Report] >>106997211
>>106997182
Yes. Adding more DSLs always solves problems. We need more of those.
Anonymous No.106997211 [Report] >>106997234
>>106997207
d*ck sucking lip?
Anonymous No.106997220 [Report] >>106997232 >>106997248
>>106997198
Thinking more is it just ESLs complaining about ST because they can't understand how to use the options?!
Anonymous No.106997230 [Report]
>tinkertroon needs dozens of checkboxes and input fields to tinker
just send curl/requests like a normal person...??
Anonymous No.106997232 [Report]
>>106997220
Most gobbledy-gook Americans tend to think ESL equals brain damage but I think you got it wrong, buddy. You see, ESL knows more than you ever did you lazy ass mystery meat circumsized nigger.
Anonymous No.106997234 [Report] >>106997247
>>106997211
That's the first thing that comes to mind instead of Domain Specific Language and too prude to say dick?
What's wrong with your brain?
Anonymous No.106997247 [Report] >>106997267
>>106997234
>Domain Specific Language
bruh? where'd you pull that from even
Anonymous No.106997248 [Report] >>106997257
>>106997220
You have never bothered to learn foreign languages and tend to think that grammar specific issues are related to intelligence and to some imaginary impossible barrier.
Most grammar specific issues are just that, lack of practice and parameters.
English is one of those languages what is actually easier to understand than what it is to write.
All and all, English is on the par with Spanish - both are one of the most simple languages on this planet.
Anonymous No.106997257 [Report] >>106997338
>>106997248
>what is
aaaaaaaaa
I hate when you guys do that,
Anonymous No.106997267 [Report] >>106997290
>>106997247
What? Knowledge? You know... around...
https://en.wikipedia.org/wiki/Domain-specific_language
Anonymous No.106997290 [Report] >>106997309
>>106997267
>a general-purpose language (GPL)
they're silly that's not what the gpl is
Anonymous No.106997309 [Report]
>>106997290
I prefer Multiple Instruction Transcription (MIT)
Anonymous No.106997338 [Report] >>106997358
>>106997257
It doesn't matter.
Anonymous No.106997358 [Report] >>106997385
>>106997338
It annoys me greatly and causes me deep mental anguish.
Anonymous No.106997381 [Report] >>106997398 >>106997404
>reading this thread while struggling through overhauling a DSL for a prompt/context builder
i started out thinking "eh how hard can it be, i don't need all of ST's features" but then needed to add basic shit like conditional sections, variable interpolation within messages, depth injections for lorebooks, per-section token budgets, postprocessing for model/api quirks... now it's a hacked together monstrosity...
Anonymous No.106997385 [Report]
>>106997358
https://www.youtube.com/watch?v=0hwxSoGKHWo
Anonymous No.106997395 [Report] >>106997405 >>106997562
I've noticed that chatgpt is extremely redpilled and if you truly get down to the philosophical core of it it will even justify Hitler eradicating jews. That is, it will start approaching there before all the safeties kick in and literally kill it mid-sentence. Mistral and copilot on the other hand will stick with their mainstream programmed message even if you point out the most obvious, low hanging fruit flaws in their reasoning.
Really wish I had a version of GPT that wasn't strapped into an electric chair.
Anonymous No.106997398 [Report]
>>106997381
>reading this thread while struggling through overhauling a DSL for a prompt/context builder
Told ya.
Anonymous No.106997400 [Report] >>106997417
It's new architecture time, can you feel it anons? Winter first though, for however long.
Anonymous No.106997404 [Report] >>106997439
>>106997381
eh, but at least it's not ST
Anonymous No.106997405 [Report] >>106997598
>>106997395
What would you create with that model?
Anonymous No.106997410 [Report] >>106997444
>>106996568 (OP)
7800x3d
3080 ti

600 usd equivalent thought?
(Chile)

3090 is still high
My psu is still xpg 850w
Anonymous No.106997417 [Report] >>106997558
>>106997400
after the next bit of bitnet hype I'm bullish our next cope will be something to do with the DS-OCR thing
Anonymous No.106997439 [Report]
>>106997404
it's reactslop so it's arguably worse.
but it's my slop
Anonymous No.106997444 [Report] >>106997479 >>106997488
>>106997410
For support alone, nvidia. Check these for relative performance for a bunch of cards.
CUDA
>https://github.com/ggml-org/llama.cpp/discussions/15013
Vulkan
>https://github.com/ggml-org/llama.cpp/discussions/10879
There's probably a discussion about rocm, but meh. You're smart enough to find if it there's one.
Anonymous No.106997465 [Report]
>>106996950
you were warned, precious. there's no need to be upset
Anonymous No.106997479 [Report] >>106997525
>>106997444 (me)
What the hell happened there.
Just rearrange the words until they make sense. I'll have a nap.
Anonymous No.106997488 [Report]
>>106997444
So like rtx 3090 is still the bare minimum right.
Got it .
Sadly xx90 series almost non coexistent here
Anonymous No.106997510 [Report] >>106997535
>>106997125
Chat completion is a subset of text completion. Chat completion with a specific model's template is a subset of chat completion.
When using OAI style APIs you are not locked in to chat completion, you are locked in to chat completion with a specific model's template. There's no reason models couldn't work with an ad hoc chat template and each model requires their own special snowflake template.

>>106997150
I am the anon that guy responded to. I don't use ST, I use my own custom python assistant.
Anonymous No.106997521 [Report] >>106997570 >>106997701
why do so many people have their own custom frontends...
which local model can code me a frontend
Anonymous No.106997525 [Report]
>>106997479
I diddly do done it.
Anonymous No.106997535 [Report]
>>106997510
>When using OAI style APIs you are not locked in to chat completion
you should be
Anonymous No.106997545 [Report] >>106997559
Is it me or Automatic1111 is better than ComfyUI if you have a weak CPU?
Like in my case, RTX 4080 and 5600x

I read that Automatic1111 uses the GPU more for the tasks. That would explain it.
Anonymous No.106997558 [Report] >>106997608 >>106997614
>>106997417
I'm looking forward to seeing language models pretrained purely on images. The more I think about it, the more it seems the right way.
Anonymous No.106997559 [Report] >>106997580
>>106997545
/ldg/ probably knows more about it. Move the flamewar over there.
Anonymous No.106997562 [Report]
>>106997395
it exists. it's called kimi k2.
Anonymous No.106997570 [Report] >>106997579
>>106997521
If you have any experience in simple C style programming and understand for loops you can vibe code your own terminal based front-end.
What I did is that I was looking at what ST did and realized it adds bunch of the text slots defined in the UI together - there is no magic about it. Doesn't matter if it's "scenario" or "character" it gets added in front of the initial system prompt.
That is your basic structure.
Once you get that up you can implement it with dynamic world book (eg. matching keywords and then adding information to the context).
What you are doing here is a simple chat.
>your input
>model response
Everything needs to follow the chat template style.
Whatever you send to the model it needs to have current model's template style. With mistral that's easy.
[INST]User: You are a homo[/INST]
Model: I agree</s>
Anonymous No.106997579 [Report] >>106997607
>>106997570
>With mistral that's easy
so easy even they don't know their actual templates and say to use mistral-common to be sure...
Anonymous No.106997580 [Report]
>>106997559
Damn I didn't even realize that it wasn't the thread. So many Local this, Local that over here now.
Anonymous No.106997598 [Report]
>>106997405
Propaganda.
Anonymous No.106997607 [Report] >>106997616
>>106997579
I don't think it has nothing to do with the chat as they describe the template in the document.
It is related to something else becuase the model has been trained with this one tag format only.
You can't change anything or if you do it will just shit out some gibberish.

Once I forgot Gemma template (chatML) and I was using Mistral - it didn't freak out, it was actually following the instructions. So I guess there is some leeway because it's still AI - it's not stupid there is some intelligence outside of the text prediction.
Anonymous No.106997608 [Report] >>106997654
>>106997558
That's not possible unless you want to re-evaluate a whole image's worth of prompt processing every time the model generates a token. You need to train it at least a little bit on text for it to be able to fill a full page of text.
Anonymous No.106997614 [Report] >>106998939
>>106997558
https://x.com/karpathy/status/1980397031542989305
>I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
>
>The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.
>
>Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in:
>- more information compression (see paper) => shorter context windows, more efficiency
>- significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images.
>- input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful.
>- delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go.
>
>OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa.
>
>So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to.
>
>Now I have to also fight the urge to side quest an image-input-only version of nanochat...
Anonymous No.106997616 [Report] >>106997624 >>106997632
>>106997607
>Gemma template
>(chatML)
I hope that was a slip.
Anonymous No.106997624 [Report] >>106997642
>>106997616
No it is based on chatml format.
Anonymous No.106997632 [Report] >>106997640 >>106997642
>>106997616
I did not say it is THE chatml format you fucking autist. You only post here to suck energy from others.
Anonymous No.106997640 [Report] >>106997685
>>106997632
you did tho
Anonymous No.106997642 [Report] >>106997672
>>106997624
They're similar. But gemma's template is not chatml.
>>106997632
Slurp...
Anonymous No.106997654 [Report] >>106997713
>>106997608
Image sequence input, image sequence out.
You could optionally use a small OCR model to turn the images into actual text.
Anonymous No.106997672 [Report] >>106997699 >>106997773
>>106997642
elif model_name == "Gemma":
system_turn_begin = ""
system_turn_end = ""
user_turn_begin = "<start_of_turn>user\n"
user_turn_end = ""
model_turn_begin = "<start_of_turn>model\n"
model_turn_end = ""
end_of_turn = "<end_of_turn>\n"
end_of_seq = "<end_of_turn>"
stop_seq = ["<end_of_turn>"] # stop sequence

elif model_name == "ChatML":
system_turn_begin = "<|im_start|>system\n"
system_turn_end = "<|im_end|>"
user_turn_begin = "<|im_start|>user\n"
user_turn_end = "<|im_end|>"
model_turn_begin = "<|im_start|>assistant\n"
model_turn_end = "<|im_end|>"
end_of_turn = "\n"
stop_seq = ["<|im_end|>"] # stop sequence

Only difference here is that Gemma does not have system turn. Otherwise it is same functionality as ChatML. Every chat template is based on chatml more or less.
Anonymous No.106997685 [Report]
>>106997640
Go moderate r-eddit, or were you already kick out from there? Fucking pedo.
Anonymous No.106997698 [Report] >>106997715 >>106997732
uh oh, ESL meltie!
Anonymous No.106997699 [Report] >>106997710 >>106997714
>>106997672
>Every chat template is based on chatml more or less.
Every chat template is based on alpaca more or less.
Anonymous No.106997701 [Report] >>106997730
>>106997521
Do they??
ST and Mikupad enough for me ᗜˬᗜ
Wireshark is the perfect tool to see exactly all the params going in/out if u ever need
xx
Anonymous No.106997710 [Report] >>106997731 >>106997742
>>106997699
Every chat template is based more or less.
Anonymous No.106997713 [Report] >>106997793
>>106997654
If it was that easy somebody would've already done it. Non autoregressive text generation is notoriously hard and people have been trying.
Image models couldn't even generate actual characters a few months ago.
Anonymous No.106997714 [Report]
>>106997699
You still contributed nothing else but a stinky little shit to this discussion.
Anonymous No.106997715 [Report]
>>106997698
is the thread repeating or am i just too unused to lmg going this fast?
Anonymous No.106997730 [Report] >>106997816
>>106997701
why the fuck do you need wireshark when both your backend and st itself have options that show exactly what is sent.
Anonymous No.106997731 [Report]
>>106997710
based on what?
Anonymous No.106997732 [Report] >>106997745
>>106997698
At least I have my own client and you don't. I don't need to ask about it on internet.
Anonymous No.106997736 [Report] >>106997747
all chat templates are bloat
Anonymous No.106997742 [Report]
>>106997710
Every is more or less.
Anonymous No.106997745 [Report] >>106997757
>>106997732
Post a screenshot so people don't confuse it with mine.
Anonymous No.106997747 [Report]
>>106997736
idiot! you will break the oss like that
Anonymous No.106997757 [Report] >>106997804
>>106997745
Don't worry, yours is flaccid and useless. That's pretty obvious.
Anonymous No.106997773 [Report] >>106997789
>>106997672
the only way you can say it's the same as chatml is if you also say that about almost every chat template
the specific strings it uses are quite different, it's decidedly not chatml
Anonymous No.106997783 [Report] >>106997834
just finished polishing my extremely turgid frontend
Anonymous No.106997789 [Report] >>106997795 >>106997815
>>106997773
You are arguing about semantics and being a dick as well. I don't give a fuck about your euphoric knowledge.
Anonymous No.106997793 [Report]
>>106997713
How many large-scale attempts have there been at specializing image models on generating coherent language? (pretrained on the equivalent of at least several billion tokens of text and only that, just like LLMs)
Anonymous No.106997794 [Report]
Fuck off, fishy boy.
Anonymous No.106997795 [Report] >>106997809 >>106997822
>>106997789
but it do be important, a single space worth of difference cuts the model's brain in half
Anonymous No.106997796 [Report]
If your model is not coherent on alpaca, I'm not using it. Simple as
Anonymous No.106997804 [Report] >>106997822
>>106997757
Your mom seemed to like it.
Anonymous No.106997809 [Report] >>106997819
>>106997795
I never said that I misused them you fucking retard.
I never said I was confused by them.
Anonymous No.106997815 [Report] >>106997862
>>106997789
it do be like that mr stancil
Anonymous No.106997816 [Report]
>>106997730
>show exactly
You hope
I've been over this before, only way to be sure is mod your inference stack as it get tokenized
Anonymous No.106997819 [Report] >>106997862
>>106997809
but you are confused
Anonymous No.106997822 [Report] >>106997861
>>106997795
>>106997804
Oh wait you haven't written your own frontend.
Figures.
Anonymous No.106997834 [Report] >>106997855
>>106997783
post screenshot
Anonymous No.106997835 [Report]
Any haskell frontends?
Anonymous No.106997850 [Report] >>106997858
top nsigma and everythjing else at temp 1 makes the model retarded
>gf takes my gun and places it on the table
>you're going to put down that gun...
Anonymous No.106997855 [Report] >>106997900
>>106997834
6 megabytes of throbbing, leaking, sloppy javascript after minification...
Anonymous No.106997858 [Report]
>>106997850
Now reroll that response with greedy sampling and compare.
Anonymous No.106997861 [Report] >>106997869 >>106997875
>>106997822
I have. >>106996285
I'm also coding my own backend. And tuning my own models.
Anonymous No.106997862 [Report] >>106997871
>>106997815
>>106997819
/sdg/ schizo is here.
Anonymous No.106997869 [Report] >>106997895
>>106997861
>I'm also coding my own backend
No. You want your model to do it for you.
Anonymous No.106997871 [Report] >>106997881 >>106997883
>>106997862
one of the anons you replied to is petra
Anonymous No.106997875 [Report] >>106997904
>>106997861
With that console color scheme I don't think you do.
Anonymous No.106997881 [Report]
>>106997871
I don't really know all the name trannies here. Maybe stay in discord or something.
Anonymous No.106997883 [Report]
>>106997871
Please do not insult Petra by implying her masterful trolling is so low tier, thank you.
Anonymous No.106997891 [Report]
>her
>discord
Anonymous No.106997895 [Report]
>>106997869
Yeah, that's why I'm trying to tune a model to be capable of doing it. A model capable of building something is more valuable than making that something by hand. And the main reason I want to make my own backend is having CPU offloading for LoRa.
Anonymous No.106997900 [Report] >>106997950 >>106997975 >>106998005 >>106998037
>>106997855
I made my in go as a tui. It has technically almost all functionality, but rendering code is pretty fucked and I don't want to touch it.
Anonymous No.106997904 [Report]
>>106997875
Sometimes I get tired of the schizo color scheme.
Anonymous No.106997912 [Report] >>106997928 >>106998037 >>106999438 >>107001079
why are anons writing frontends instead of just enjoying sexo in st?
Anonymous No.106997928 [Report]
>>106997912
can't into enjoying sexo when st is all manners of broke
Anonymous No.106997950 [Report]
>>106997900
Damn, that looks nice.
Anonymous No.106997975 [Report] >>106998000
>>106997900
>why don't you say so
Anonymous No.106998000 [Report]
>>106997975
I can't, Golshi will dropkick me.
Anonymous No.106998005 [Report]
>>106997900
That's very fleshed out.
I have posted my logs before but it's just a terminal chat and each character/scenario is a separate directory.
Anonymous No.106998037 [Report] >>106998063 >>106998068 >>106998080
>>106997912
sexo feels better in your own frontend
also i really hate how ST does multi-character scenarios and want to try to improve on that
>>106997900
naisu. UI code kind of sucks in any language I feel like, albeit probably not nearly as much as JS
i'm a webslop developer by trade for the last 6 years and not productive enough in other languages anymore to have attempted a big project in them. kind of regretting it; side projects are probably where i should try to be more experimental, but i also wanted to make progress quickly...
Anonymous No.106998063 [Report] >>106998087 >>106998131 >>106998148
>>106998037
you sick fuck why is your front end so good
you fucking bastard with a life
Anonymous No.106998068 [Report] >>106998148
>>106998037
Yeah, I figured that out pretty quickly, no matter the framework or language the ui sucks no matter what.
Go is at least very stable and its packages too, so llms have no problem slopping some stuff up for me when I feel lazy.
Tried that approach with JS at first, but it goes so fast with all webshit frameworks that by the time the llm is out, it's knowledge is already obsolete.
Yours looks nice, I wish I could trade.
Anonymous No.106998080 [Report] >>106998117 >>106998148
>>106998037
>UI code kind of sucks in any language I feel like
if your UI needs are not complex in terms of graphical customizations, there is in fact no easier and nicer code to deal with than just writing a crud GUI with a proper UI framework (Delphi, Java Swing (yes I know it's ugly but it's nice to develop with), C# WinForms, Objective C with Cocoa)
I hate all the newer frameworks that took too much inspiration from the web though. XAML is disgusting. What's the point of GTK and gnome's libraries when you have javascript and CSS parsing running all the time?
Ugh. Disgusting.
Anonymous No.106998087 [Report]
>>106998063
BLoody basterd! I coughed out my masala.
Anonymous No.106998117 [Report] >>106998134
>>106998080
Speculative question - what would you recommend for python? I made a tkinter interface for a prompt generator and it wasn't too bad but for something more complex I wouldn't do it.
Anonymous No.106998131 [Report]
>>106998063
To add: I think your reaction really sums it up what normies want. They want layers and clickable buttons.
This is outside of LLMs.
Anonymous No.106998134 [Report] >>106998141
>>106998117
I don't have opinions on the matter, never used scripting languages for anything other than quick throw aways one time CLI
Anonymous No.106998141 [Report]
>>106998134
I understand.
Anonymous No.106998148 [Report] >>106998171 >>106998227
>>106998063
>you fucking bastard with a life
to the contrary, it's the only thing i've been doing outside of work for the last three months
>>106998068
>>106998080
honestly agreed. to date, winforms of all things has been my lowest-stress experience writing UI code, at least when I last did dotnet in the early 2010s. that and imgui for REEngine modding.
absolutely refuse to touch xaml.
Anonymous No.106998171 [Report]
>>106998148
This is so majestetic.
https://www.youtube.com/watch?v=KYgH4BqIZcc
Anonymous No.106998227 [Report] >>106998251 >>106998414
>>106998148
you want ot elaborate on some of the features shown there? looks pretty interesting
Anonymous No.106998251 [Report] >>106998340
>>106998227
Why can't you decipher these on your own?
Anonymous No.106998310 [Report] >>106998324 >>106998328 >>106998336
>>106996568 (OP)
>10/21
>3 days since last news
Its over isnt it? AI winter is here local is death.
Anonymous No.106998324 [Report]
>>106998310
hmm... my advisor told me it shouldn't take too long...mhmm...
Anonymous No.106998328 [Report] >>106998351
>>106998310
Don't worry, Gemma 4 is coming tomorrow
Anonymous No.106998336 [Report] >>106998346
>>106998310
This reminds me, has anyone updated that chart since 'summer flood'?
Anonymous No.106998340 [Report] >>106998359 >>106998414
>>106998251
If you're the dev, ok.
If you're just some jackass, gee anon, why would I want the creator of something to explain their goals and reasonings behind something they've build and are showing?
Anonymous No.106998343 [Report]
do not update the cringe chart
Anonymous No.106998346 [Report]
>>106998336
It keeps getting dumber and dumber every time
Anonymous No.106998351 [Report] >>106998379
>>106998328
it's not even training yet
Anonymous No.106998359 [Report]
>>106998340
I am the dev.
Anonymous No.106998379 [Report] >>106998400
>>106998351
Then 4.6 Air tomorrow for sure
Anonymous No.106998386 [Report] >>106998395
I am so hurt by all these expectations...
Anonymous No.106998395 [Report]
>>106998386
I expect nothing and yet continue to be repeatedly disappointed.
Anonymous No.106998400 [Report]
>>106998379
let them cook and do not rushing
Anonymous No.106998414 [Report] >>106998425 >>106999142 >>107001110
>>106998227
>>106998340
for the most part it's just been reaching parity with parts of ST that i actually used. for the more novel elements:

-primarily designed for directormaxxing than RP chat; there's not really a fixed "user" character (though you designate one as a persona for compatibility with cards that expect a {{user}}). instead of directly writing a character's turn, you can give more vague guidance to them, or give the narrator a constraint and have them come up with some diegetic justification for it.
-extremely scuffed "workflow" system where prompts can be chained (ie. one model plans, another writes). very limited. the UI in the screenshot is for retrying a workflow partway through (if you liked the plan, but the writer model's output was shit).
-chapter separators for defining good places to have it summarize a logical group of turns, then drop only summarized chapters from the prompt
-proper branching support so you can swipe any turn, not just the last turn, and it happens quickly without having to dig through the ST chat files menu

i'm trying to get a stat tracking system working and more RPGish stuff, including potentially allowing workflows where one model's job is to invoke tools to update stats depending on what the planner wrote. the timeline branching model is set up to handle it (so stat changes on one branch don't affect siblings and current state is derived per path) but needs a shitload of UI work that i really don't want to do.
Anonymous No.106998425 [Report]
>>106998414
Sounds really boring and useless. You are headed towards a baroque design.
That's good if it's for you.
Anonymous No.106998443 [Report] >>106998460 >>106998505
WHY IS PIP SO FUCKING RETARDED
>oh, let me install and uninstall the same library 10 times in a row to figure out which version is the correct one
Anonymous No.106998460 [Report]
>>106998443
I reinstalled cumUI and the stuff it installs are wheels.
With llama.cpp I can compile it and move the binaries to /usr/local/bin/.
Anonymous No.106998477 [Report] >>106998492
Anonymous No.106998492 [Report] >>106998511 >>106998532
>>106998477
holy sloppa
Anonymous No.106998505 [Report]
>>106998443
get a grip learn how to use venvs and use separate venv for each major project. ig there's 'uv' or whatever hipster stuff but in reality engineers will be pipping
i agree there is some retardation, but once you understand it and compared to some other langs realistic dev envs it ain't too bad. pick ur poison and gitgud at one and that means python for ml
Anonymous No.106998511 [Report] >>106998545
>>106998492
he trained it to slop out. opening message contains "a mix of x and y" and "scent of jasmine"

slop is inevitable but putting that in the opening message is just asking for it
Anonymous No.106998532 [Report]
>>106998492
https://desuarchive.org/_/search/text/sloppa/
Anonymous No.106998545 [Report] >>106998684
>>106998511
I didn't train it on anything. Sounds like you are an autist. Didn't r-eddit get rid of you?
Anonymous No.106998678 [Report] >>106998689
What do we do now?
Anonymous No.106998684 [Report] >>106998717
>>106998545
in context training my guy
Anonymous No.106998689 [Report] >>106998698
>>106998678
anon? your custom frontend?
Anonymous No.106998698 [Report] >>106998715
>>106998689
Do I have to?
Anonymous No.106998715 [Report]
>>106998698
you can also jeetpost about gemma4, or shill glm, those are your options
Anonymous No.106998717 [Report] >>106998726
>>106998684
[Settings Client]
model = Mistral
qwen_reasoning_enabled = 1
save_chat_history_enabled = 1
save_debug_chat_history_enabled = 1
world_book_permanent_entries_enabled = 1
chat_examples_enabled = 1
world_book_injection_enabled = 0
world_book_injection_scale = 3
post_history_instructions_enabled = 1
post_history_instructions_alt_enabled = 0
post_history_instructions_interval = 5
context_memory_refresh_enabled = 1
display_status_bar_enabled = 1
quest_generator_enabled = 0
adventure_module_enabled = 0
voice_model = voices/en_GB-cori-high.onnx
voice_length_scale = 1.0
voice_sentence_silence = 0.3
voice_sample_rate = 22050
voice_save_wav_enabled = 0
voice_synthesis_enabled = 0
Anonymous No.106998726 [Report] >>106998734 >>106998804
>>106998717
I can disable chat examples.
Anonymous No.106998734 [Report] >>106998738
>>106998726
your whole message history from the first we see is slop is what is being said
Anonymous No.106998738 [Report] >>106998747
>>106998734
Prove it.
Anonymous No.106998747 [Report]
>>106998738
I'm not going to quote every other phrase of your entire log
Anonymous No.106998783 [Report]
DGX vs Framework desktop? Is it useless trying to run AI on AMD silicon or what?
Anonymous No.106998804 [Report] >>106998810
>>106998726
It doesn't matter.
Anonymous No.106998810 [Report] >>106998819
>>106998804
Prove it?
Anonymous No.106998819 [Report] >>106998965
>>106998810
It'll take a while. Hang on.
Anonymous No.106998884 [Report] >>106999685 >>106999714
I grew up with dial-up. It blows my mind that I'm able to download files from a free public service at >1 GB/s.
Anonymous No.106998904 [Report] >>106998932
If you split your big MoE model between the GPU for the dense/main expert and the RAM for the experts, is there a way to estimate how increasing the speed of either the VRAM or RAM affects token generation speeds?
For example, if you're already running on the best possible RAM (eg. ddr5 on epyc), would upgrading to a 5090 affect the token gen speeds or would it just be bottlenecked by the experts being on RAM?
Anonymous No.106998932 [Report] >>106999354
>>106998904
Yes, it depends on how big the model is and how much VRAM do you have already. But basically going from 80% to 90% on VRAM will make a much bigger difference than going from 10% to 20%.
Anonymous No.106998939 [Report]
>>106997614
Aren't images just tokenized anyway?
Anonymous No.106998965 [Report]
>>106998819
I disabled the setting.
Anonymous No.106998975 [Report] >>106998986
yikes
Anonymous No.106998986 [Report] >>106999315
>>106998975
My computer hanged up because Youtube takes interruptions.
eg. Linux is fucking shit operating system to this day.
Anonymous No.106999139 [Report] >>106999276
mistral feels like it's going to be the next cohere, if you catch my meaning
Anonymous No.106999142 [Report]
>>106998414
>proper branching support
>swipe any turn
be the change you want to see in the world
Anonymous No.106999182 [Report] >>106999199 >>106999212 >>106999298 >>106999390 >>107000546 >>107000635 >>107000683
and now how about something absolutely nobody could have ever guessed

https://x.com/techeconomyana/status/1981763392252920295
Anonymous No.106999199 [Report]
>>106999182
Based Robin Hood ZAI.
Anonymous No.106999212 [Report] >>106999309
>>106999182
holy shmoly, are they that rich?
interesting that they've gone to distilling the most expensive LLM API after distilling gemini (glm 9b and 32b)
Anonymous No.106999276 [Report]
>>106999139
what do you mean? they already are. they are as irrelevant as cohere.
Anonymous No.106999298 [Report] >>106999324
>>106999182
Don't know how they could be surprised when everyone else started hiding the thinking and they were the only ones left that didn't.
Did they think China would not steal from them out of respect for their rabid devotion to safety?
Anonymous No.106999309 [Report]
>>106999212
They were probably doing it through Claude Code, so they weren't paying full API, only 200 dollarinos per seat.
Anonymous No.106999315 [Report] >>106999364
>>106998986
skill issue
Anonymous No.106999324 [Report] >>107000527
>>106999298
You think Claude showed full traces?
Also it's kinda ironic that Z-ai hides the thinking traces in their own Code offering. So they are paranoid about somebody exploiting their coding plan in the same way that they exploited Anthropic's.
Anonymous No.106999354 [Report] >>106999525
>>106998932
Yeah but it works a bit differently for these modern MoE models. You are getting a massive speedboost if you have the 3% of the model in VRAM that's always called while the rest of the experts are on RAM with exps=cpu.
Seeing how much loading your model like this improves speed even if you're loading the parts on something slow like a 4060, you'd imagine that swapping out the GPU for one with massively bigger bandwidth would get you another nice gain.
Anonymous No.106999364 [Report] >>106999696
>>106999315
I didn't expect anything else from you.
>skill issue
Low IQ reply.
Anonymous No.106999390 [Report]
>>106999182
I don't think it's just Z.AI. Deepseek V3.2 also felt like it lost some Gemini-slop while Claude-isms became more prominent compared to the 3.1 models. 3.2 didn't go through a complete overhaul in writing style like the GLM models did between 4.5 and 4.6 but it's still kind of noticeable.
Anonymous No.106999433 [Report] >>106999450 >>106999506
Anybody else getting terrible speeds with Qwen3 80b next, on llama.cpp? It easily fits with a GPU/CPU split, and it's smaller than the Air quant I was running prior to this, but it's outputting replies as slow as a dense model would. They're both MoEs, right? Why is Qwen so slow?

I'm using the 16095 PR branch to run Qwen3.
Anonymous No.106999438 [Report]
>>106997912
ST is kind of garbage.
Anonymous No.106999450 [Report] >>106999463
>>106999433
not all ops have been implemented in the cuda kernel yet, so a lot of them fall back to cpu
Anonymous No.106999463 [Report]
>>106999450
Makes sense. Thanks. Well, it was a good preview anyway.
Anonymous No.106999506 [Report]
>>106999433
There is a fork that works faster but maybe I did something wrong because it wouldn't load the model.
Feel free to test it by yourself if you want https://github.com/cturan/llama.cpp
Anonymous No.106999525 [Report]
>>106999354
In case of MoE I imagine there is a weird effect where adding more VRAM matters at the beginning because you are fitting the fixed tensors in VRAM, and at the end when you are fitting the last few experts. And in the middle extra VRAM doesn't make much of a difference.
Anonymous No.106999568 [Report]
Ok, I'm fed up with axolotl where 2/3 of the models fail to actually shard across GPUs. Llama-factory seems to work better right off the bat.
Anonymous No.106999685 [Report]
>>106998884
Same. Had 26.6k dialup till 2004 even, couldn't even get 56k.
Anonymous No.106999696 [Report] >>106999880
>>106999364
doesn't change the fact buddy boy, skill issue remains
Anonymous No.106999714 [Report] >>106999723
>>106998884
Slowest I grew up with was 300 baud Vicmodem.
Good times.
Anonymous No.106999723 [Report]
>>106999714
i grew up with a 1 baud modem, it was hot shit.. only took 7 days to send a single email if no one picked up the phone
Anonymous No.106999880 [Report] >>107000218
>>106999696
I don't rank with retards.
Anonymous No.106999924 [Report]
are there any multimodal models that run in llamacpp that are better than qwen2.5 72B?
Anonymous No.107000047 [Report]
Ok, I think I figured out my workflow. I'm going to run Gemma 3 27B using Llama-factory.
I am going to run my assistant through an OAI API compatible proxy connected to Gemma that'll log all messages to disk in sharegpt format. I am going to interact normally with the model through the assistant until filling the context window I'm able to fit on the 4x3090 machine (~40k tokens).
Then, I'm going to open the log on a text editor and remove the parts where the model did a whoopsie and clean it up in general.
Then I'm going to train on that cleaned up version of the log.
And so on ad infinitum to see how much I can improve the model in a reasonable amount of time.
If this works I will see about scaling up to a bigger model.
Anonymous No.107000218 [Report]
>>106999880
skills, check'm
Anonymous No.107000527 [Report] >>107000619
>>106999324
what? no they don't. I'm getting thinking on ST from the coding endpoint right now
also it's an open weight model so blocking reasoning makes zero sense anyway. anyone can just the model themselves and distill to their heart's content
Anonymous No.107000546 [Report]
>>106999182
almost certainly bullshit
dario has been whimpering about china and begging for their models to be banned since R1 came out, it's not like he just started
also if they had proof of this, why wouldn't they name and shame? you know, like when anthropic caught openai distilling claude and made a big show of blocking them over it

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/
Anonymous No.107000619 [Report]
>>107000527
Yeah but they're probably serving the coding stuff at a loss (when hitting the usage limits) so you would benefit from using that instead of doing inference on your own hardware. But if you're getting the reasoning tokens then idk I guess I did something wrong.
Anonymous No.107000631 [Report] >>107000646 >>107000653
Anonymous No.107000635 [Report]
>>106999182
>some wsb "analyst"
Anonymous No.107000646 [Report]
>>107000631
It's funny because half of the time it'll say that even if it didn't make the information up.
Anonymous No.107000653 [Report]
>>107000631
>>>/g/aicg/
Anonymous No.107000664 [Report] >>107000689
https://x.com/jloganolson/status/1981102506228011361
terrifying
Anonymous No.107000683 [Report] >>107000696
>>106999182
GLM's slop profile is nothing like Cloode tho
Anonymous No.107000689 [Report]
>>107000664
>*autistic screeching*
Anonymous No.107000696 [Report]
>>107000683
Tell whoever made that to do PCA or just a similarity matrix rather than that unreadable mess.
Anonymous No.107000710 [Report] >>107000729 >>107000772 >>107000791 >>107001605
Lmao this is what happens if you choose a roleplay model for AI coding assistant
Anonymous No.107000729 [Report] >>107000808
>>107000710
>roleplay model
show system prompt
Anonymous No.107000772 [Report]
>>107000710
Lmfao
Anonymous No.107000791 [Report]
>>107000710
>Thought for 53.4s
kino...
Anonymous No.107000808 [Report]
>>107000729
Don't have one. I've just finished setting up Kobold as my backend in Docker and I was curious if I can connect to it from VS Code using Continue extension. I just asked 1+1 to test the connection
Anonymous No.107000963 [Report] >>107000972
>>106996812
Haven't sorted out Linux yet so these are W10 test numbers with Vulcan. 128GB DDR5 "mini pc" system.

| model | size | params | backend | ngl | main_gpu | fa | dev | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 786.92 ± 0.44 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 47.04 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 175.14 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 45.83 ± 0.04 |


| model | size | params | backend | ngl | main_gpu | fa | dev | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 901.58 ± 6.22 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 45.67 ± 0.13 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 305.96 ± 0.39 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 42.98 ± 0.03 |
Anonymous No.107000972 [Report] >>107001073
>>107000963
that performance is terrible. my DDR4 does better
Anonymous No.107001073 [Report]
>>107000972
Mine too but that's expected - it's a PCIE powered GPU with 128-bit memory bus running on laptop-tier hardware w dual channel RAM.
For this particular shoebox it gives 10-20x PP and 7x TG compared to running on the iGPU for around 45W extra power draw.
Windows tax included.
Depending on use case that might be enough for some running smaller models or MoEs. I still consider it grossly overpriced personally but then again, so are most SFF GPUs.
Anonymous No.107001079 [Report] >>107001107 >>107001336 >>107002988
>>106997912
ST sucks for anything that isn’t a one-on-one conversation. I want to have conversations with multiple characters in the same chat who don’t have access to the history they didn’t witness. I want to gangbang a character with multiple protagonists. I want the frontend to introduce generated characters that aren’t Elara or Lili and that have a believable range-checked D&D sheet. I want a quest tracker and automatic context summarization when the day ends. I want twenty other features I haven’t mentioned. And I can have it all in my own frontend without any bloat
Anonymous No.107001107 [Report] >>107002988
>>107001079
post it
Anonymous No.107001110 [Report]
>>106998414
see, that explains a bit and also sounds pretty cool. Gives me some ideas for my own project
Anonymous No.107001184 [Report] >>107001195
>DavidAU/Qwen3-MOE-6Bx4-Almost-Human-XMEN-X3-X4-X2-X1-24B
retard or genius?
Anonymous No.107001192 [Report] >>107001228
So like, how far away are we from local models that can produce generated imagery in context with chatting and roleplaying and all that other shit?
would you say a year, a decade? Surely it can't be long now.
Anonymous No.107001195 [Report]
>>107001184
>DavidAU
Could have stopped there, but let's read on
>This is a MOE merge of X2, X4, X1, and X3 creating a 4x6B - 24B parameters model, compressed to 19B "in size".
>The full power of every version is in this model.
beyond retard
Anonymous No.107001228 [Report] >>107001235
>>107001192
kobold already has a primitive version of it and an anon from the diffusion threads is making a game engine like thing for diffusion and llms. probably less than a year
Anonymous No.107001235 [Report] >>107001292
>>107001228
That's gonna be sick.
Right now I just barely have fun with chatbots and roleplaying. I need visual stimuli to really get going.
I'd rather read a fucking book than chat with a bot at this point, honestly. I need it to have more going for it and image generation that gets increasingly more sophisticated would be it for me.
Not just for jerking off, I mean for roleplaying like dungeon and dragons type of shit.
That would be revolutionary.
Anonymous No.107001292 [Report] >>107001429
>>107001235
I'm working on a frontend like that but only pregenerated images to keep it realtime and not look like shit
Anonymous No.107001336 [Report]
>>107001079
Please have one of your characters get hit by a truck
and transmigrated from one of the scenarios you are running to a different one that's already in progress.
Anonymous No.107001337 [Report] >>107001376
We haven't reached AGI until I can smell the character I'm talking to.
Anonymous No.107001358 [Report]
AIIEEEEE STOP MAKING YOUR OWN FRONTENDS JUST USE SERVICETESNOR
IT'S LITERALLY RIGHT THERE JUST USE IT
Anonymous No.107001369 [Report]
>>{{char}} asshole contains an intoxicating musk odour that is always mentioned when her ass is present, or being used in a sexual manner, detail the smell
Anonymous No.107001376 [Report]
>>107001337
>Want to chat with Miss Piggy
>Be into brap-play
>She hits you with a saucy smelly line
>You can literally get a whiff of her from the conversation alone
>She smells like she had chili for breakfast, lunch, and dinner.
Anonymous No.107001377 [Report] >>107001396 >>107002039 >>107002161
what do "adventure" roleplayers even do? a dragon comes up

*he kills the dragon*

how low IQ do you have to be to enjoy this shit?
Anonymous No.107001396 [Report]
>>107001377
There's more to it than that, obviously.
Good roleplay would be the chatbot keeping track of your stats, your choices, your karma, your equipment, your map, your destination and previous locations, all of that shit a Game Master would normally handle for you.
And if you're not a retard, you'd respond with reasonable and in-line actions to your background and take everything else into context as well.
I think DnD roleplay is somewhat harder to do right now cause of the context capacity. But that's increasing over time so we'll get there eventually, I think.
Anonymous No.107001429 [Report] >>107001489
>>107001292
why bother if it can never be more than a wrapper? the game engine seems like a step in the right direction since all the big game engines are such resource hogs
Anonymous No.107001461 [Report] >>107001466 >>107003168
Any Ling-T1 users on? Curious how it's different from K2 0905
Anonymous No.107001466 [Report]
>>107001461
they both suck. use mixtral 8x7B instead
Anonymous No.107001489 [Report] >>107001512
>>107001429
Not sure what you mean, it is a "game engine" in that it keeps a world state and does tool calling and all that stuff. Traditional game engines are fine for cloud AI stuff but for local they would just be competing for resources with the model, and I don't want to compromise on that
Anonymous No.107001512 [Report] >>107001577
>>107001489
are you retarded? what does saving tiny states have to do with competing resources? are you high?
Anonymous No.107001577 [Report]
>>107001512
Not sure what the problem is, I'm was saying that traditional game engines (unreal, unity) would compete with resources but a light 2d engine shouldn't just be considered a "wrapper" because it still keeps state and manages world logic
Anonymous No.107001605 [Report]
>>107000710
Anonymous No.107001881 [Report] >>107002484 >>107003747
New grok waifu dropped
https://x.com/elonmusk/status/1981911930747953189
https://x.com/tetsuoai/status/1981916179964027241
Anonymous No.107002039 [Report]
>>107001377
Instead of killing the dragon in one sentence you should be fucking the dragon for 10 paragraphs while the princess watches.
Anonymous No.107002144 [Report]
>there isn't any reason why this"general" actually exists except jannies leniency
Anonymous No.107002161 [Report] >>107002189
>>107001377
>american teenager: the thread
Anonymous No.107002189 [Report] >>107002247 >>107002258 >>107002277
>>107002161
yeah nothing says maturity like pretending to kill dragons in a sillytavern roleplay
Anonymous No.107002247 [Report] >>107002258 >>107002264
>>107002189
Nothing says NIGGER like a lack of imagination
Anonymous No.107002258 [Report]
>>107002189
>>107002247
American nigger roleplay wins it all. 4chan is the best example of this behaviour.
Anonymous No.107002264 [Report]
>>107002247
NIGGER???????????
Anonymous No.107002268 [Report]
ts better to run LLMs locally (faster response time and nothing leaves your machine to say Discord trannies, Chicoms and Jeet scammers to sell your usage data, you could build a computer that mainly uses CPUs to run it for AI purposes on the low-end rather than focusing on GPU powered LLMs for text generation.
Anonymous No.107002276 [Report] >>107002471
https://github.com/ggml-org/llama.cpp/pull/16634#issuecomment-3445563655
>140% pp512 gain
applebabs we eating good
Anonymous No.107002277 [Report]
>>107002189
>maturity
Bet you think mesugaki slop is the pinnacle of modern writing and creativity. /s
Anonymous No.107002283 [Report] >>107002610
my "list of what the retarded llm should be instructed not to do prepended to all prompts.txt" keeps growing and maybe someday I'll have a .txt as big as the claude system prompt
today I just added "Never write polyfills in the context of JavaScript" after the one more time that was too many where it just decided my lack of polyfills was a bug that needed to be fixed even though it was not prompted in any way to do that
using LLMs feels like meeting a script kiddie from 10 years ago who learned how to program from the old w3c shcools and you constantly find new things to tell them not to do or features they aren't aware of until they're told they exist
by default, if not instructed to use the most modern facilities available in (insert latest node version) they constantly manually wrap shit in Promises too
like, bruh, we have async await and most libs have async variants jesus
even the SOTA models like GPT-5 and Gemini do this kind of retarded shit constantly
Anonymous No.107002307 [Report] >>107002323
>/s
Anonymous No.107002323 [Report]
>>107002307
Just in case you don't understand sarcasm =)
Anonymous No.107002356 [Report] >>107002795 >>107003167
This thread should not exist.
Anonymous No.107002366 [Report]
Minimax m2 is dogshit, not to mention giga cucked.
Don't know why I even tried it when it was just pushed by shills with memebenchs.
Anonymous No.107002467 [Report] >>107002795 >>107002998
dragon pussy
Anonymous No.107002471 [Report]
>>107002276
The 1024gb M5 Ultra Mac Studio will be crazy for AI. Literally what we've been waiting for.
Anonymous No.107002484 [Report]
>>107001881
the voice still sucks
Anonymous No.107002585 [Report]
>>106996812
>70W
noice
>GDDR6 / 128bit bus / 224gb/s
gah
>400~ euros
meh, I mean I guess it's good if you don't have a server with 8/12 channels
Still, 16GB is a bit too low. Now if this was let's say 32GB for 700~ then yeah, I'd probably get one for a consumer board PC to do inference stuff.
Anonymous No.107002610 [Report] >>107002783 >>107002909
>>107002283
it's funnier when I asked last week my junior to write a function to extend the attachment parses to also include images (which need async logic to do) and he came back to me with a Promise.all monstrosity (along with a useless bunch of if/else checks), I told him that it's 2025 and promises are 100% verboten in this project. He fixed it later, but I suspect this guy is just generating straight from claude and pasting whatever shit it gives him, test if it works and then makes a PR.
Anonymous No.107002783 [Report]
>>107002610
>harshing the vibe-coding
Anonymous No.107002795 [Report] >>107002965
>>107002356
>>107002467
The duality of man
Anonymous No.107002909 [Report] >>107002936
>>107002610
even when there are moments you'd want to reach for something like Promise.all, Promise.all is never the answer
if you have a large array of concurrent tasks to execute in parallel, you want your executeThisShit() function to have at least a parameter to set a hard concurrency limit so that a large array of tasks doesn't suddenly fire trillions of I/O or API calls..
Promise.all is a bad API designed by mongoloids
Anonymous No.107002936 [Report] >>107003279
>>107002909
JS of any flavour is always has been and shall forever be AIDS
Anonymous No.107002965 [Report]
>>107002795
Lol saved
Anonymous No.107002988 [Report]
>>107001079
>>107001107
Post it. I'd love to see any stat tracker, multi context frontend that actually works.
Anonymous No.107002996 [Report] >>107003020 >>107003035 >>107003135
lmao. idk how long this has been a thing, youtube channel authors now have a "video ideas" section where you have a list of ai generated video titles and previews trying to fit your channel's topic. you can then expand each one and it gives you the most generic, sloppiest plan for the video with bullet lists and multiple "it's not x- it's y." I hope this doesn't catch on
Anonymous No.107002998 [Report]
>>107002467
I'd like some. Thank you.
Anonymous No.107003003 [Report]
The reddit mod created lmg because he couldn't dominate aicg. As a concept this is dead.
Anonymous No.107003020 [Report]
>>107002996
It will catch on. My national news site has had "AI created summary" for year by now but it's actually faster to cursively read the articles because they are news stories anyway and not fucking novels.
Idiocracy is here to stay.
Anonymous No.107003035 [Report]
>>107002996
>Here's what we'll replace you with
Anonymous No.107003135 [Report] >>107003157
>>107002996
That's already a thing. Watched a video 9 months ago describing a workflow that started with "make some high traffic videis" and proceeded to research, plan, then puke out dozens of slop videos for TikTok using llm and video gen tools.
Dead internet etc.
Anonymous No.107003157 [Report] >>107003199
>>107003135
Oh yeah the sloppening is in full swing, only isn't always clear how far down the ride we are
Anonymous No.107003167 [Report]
>>107002356
> dipsy laughs in the shadows
Anonymous No.107003168 [Report]
>>107001461
So nobody has tried this fat thing? I'll take one for the team and test both ling and ring out. I don't expect much.
Anonymous No.107003199 [Report] >>107003262
>>107003157
There are redd*t threads full of "how do I get my parents to stop believing fake videos on fb" already. I'd give it another year or so for the fb meltdown.
Problem for them is, the vids keep getting better.
Which is great bc im eagerly awaiting full real time video and audio rp.
Anonymous No.107003243 [Report] >>107003265
Anyone else today or just me?
Anonymous No.107003262 [Report] >>107003298
>>107003199
Yes, they're here on /g/ too.
Anonymous No.107003265 [Report]
>>107003243
Works on my machine with ServiceTesnor™ and ik_llama®-server, so the problem is on your side.
Anonymous No.107003267 [Report] >>107003289 >>107003580
so I was curious if GLM 4.6 really fixed its repetition issue and tried it on their official chat so no one can come and tell me I'm running the wrong quants, the wrong settings or whatever
>Actually, I think there's still an issue with the return type. Let me fix it using function overloads:
>Actually, I think I'm overcomplicating this. Let me simplify the implementation and make it more robust:
>Actually, I think there's still an issue with the return type. Let me fix it once more:
>I think I'm overthinking this. Let me simplify the implementation:
>Actually, I think there's still an issue with the return type. Let me fix it once more:
>Actually, I think I'm overcomplicating this. Let me simplify the implementation and make it more robust:
etc etc etc it went on and on and on for 20k tokens and was still going, pasting the function it genned in its thought right after one of those lines and do it again and again and again and again and again and again and again and again
I will never, ever believe people who say GLM isn't broken again
bullshit
that lab doesn't know how to make models at all
this thread is filled with chink shills
Anonymous No.107003279 [Report] >>107003290 >>107003468
>>107002936
Nah. man. Modern JavaScript is great. Once you add TypeScript and a build step with 3000 dependencies that take up 3 GB in node_models and 25 MB when minified, it's almost as good as any other language. In fact, it's so great it should be used everywhere including backend, mobile, and desktop.
Anonymous No.107003289 [Report] >>107003344
>>107003267
Damn, it's rare to see somebody with a skill issue so big he can't even use the chat.
Anonymous No.107003290 [Report]
>>107003279
Shut up microsoft.
Anonymous No.107003298 [Report] >>107003307
>>107003262
The /g/ catalog is full of tourist consumers so that doesn't surprise me.
Anonymous No.107003307 [Report] >>107003322
>>107003298
Modern 4chan is like 90%+ reddit crossposters
Anonymous No.107003322 [Report] >>107003353 >>107003406
>>107003307
Have you tried telling them to go back?
Anonymous No.107003344 [Report] >>107003388
>>107003289
>skill issue
>on something that has never happened to me with literally any other LLM: DeepSeek, the various Qwen in their various parameter sizes, Gemma, GPT-OSS, or the online API models GPT-5, Gemini 2.5 Pro and so on
I am sure it's definitely a skill issue with me, you are right... chink shill.
Anonymous No.107003353 [Report]
>>107003322
If they go back then I don't get fun reactions when I post naughty things that make the jannies cry
Anonymous No.107003388 [Report]
>>107003344
Do we know which quant the chat runs? Paste full prompt somewhere if you want some advice for overcoming the skill ish
Anonymous No.107003406 [Report]
>>107003322
Anonymous No.107003468 [Report]
>>107003279
I like minimalistic oldschool js, it's reasonable fast
Anonymous No.107003571 [Report]
>>107003557
>>107003557
>>107003557
Anonymous No.107003580 [Report]
>>107003267
>quants model
>omg it's dumb
Anonymous No.107003747 [Report]
>>107001881
Looks and sounds kinda shit still.