Thread 106996568

357 posts 82 images /g/

Anonymous 10/24/2025, 8:42:54 PM No.106996568 [Report] >>106996714 >>106996944 >>106997410 >>106998310

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106986408 & >>106975556

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 10/24/2025, 8:43:09 PM No.106996571 [Report]

Miku-10.jpg md5: d6e9c2f1...

►Recent Highlights from the Previous Thread: >>106986408

--Critique of AMD's AI GPU pricing and performance:
>106988788 >106988883 >106988901 >106988998 >106988932 >106989085 >106989144 >106989167 >106989210 >106989270 >106989289 >106989403 >106989315 >106989781 >106990321 >106988963
--LLM social media simulator development challenges and solutions:
>106988213 >106988320 >106988386 >106988504 >106988557 >106988673 >106988760
--Pruned GLM-4.5-Air translation quality issues in Chinese-English tasks:
>106990071 >106990094 >106990414
--Antislop sampler's limitations in addressing model collapse and stereotypical outputs:
>106986820 >106987031
--REAP performance evaluation beyond coding tasks:
>106989011 >106989576
--Data loss during ComfyUI update caution:
>106990303
--llama.cpp removes mistral-common dependency:
>106992735 >106992770
--LLM coding viability vs. hardware cost challenges:
>106993311 >106993319 >106993427 >106993447 >106993496 >106993730 >106993769 >106994515 >106994551 >106994595 >106994610 >106994612 >106994670 >106994666 >106994701 >106994967 >106995045 >106995064 >106995392 >106993477
--Assessing LLMs' utility as scientific writing assistants:
>106992842 >106992909 >106993250 >106993408 >106992918 >106992989 >106993354
--Optimizing GLM 4.5 Air's creativity through samplers and minimal system prompts:
>106987422 >106987911 >106995295 >106995450 >106995468 >106995558 >106995547
--LLM paraphrasing limitations and solutions for synonym repetition:
>106986884 >106987091 >106987239 >106992323 >106992343
--Inference inefficiencies and challenges in adapting coding models for roleplay:
>106987264 >106987307 >106987507 >106987620 >106994872 >106987696 >106988344 >106988423
--Mistral AI Studio platform launch:
>106995845 >106995893
--Miku (free space):
>106989693 >106992662 >106993105 >106993427 >106994546 >106994884 >106995336

►Recent Highlight Posts from the Previous Thread: >>106986411

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 10/24/2025, 8:45:05 PM No.106996588 [Report] >>106996606 >>106996621 >>106996975

im bored

Anonymous 10/24/2025, 8:47:12 PM No.106996604 [Report] >>106996606

I am euphoric.

Anonymous 10/24/2025, 8:47:40 PM No.106996606 [Report]

>>106996588
>>106996604
i am indifferent

Anonymous 10/24/2025, 8:48:01 PM No.106996611 [Report]

fuck you i'm leaving

Anonymous 10/24/2025, 8:48:50 PM No.106996621 [Report]

>>106996588
I understand you're feeling bored! There are many exciting activities you could try, such as reading a book, going for a walk, learning a new skill, or connecting with friends. What are some of your interests?

Anonymous 10/24/2025, 8:48:52 PM No.106996623 [Report] >>106996630 >>106996633

1759770905977366.jpg md5: 492b39d9...

Anonymous 10/24/2025, 8:49:22 PM No.106996630 [Report]

>>106996623
I look like this and I do this

Anonymous 10/24/2025, 8:49:35 PM No.106996633 [Report]

>>106996623
very dumb caat is not for eat

Anonymous 10/24/2025, 8:53:14 PM No.106996665 [Report] >>106996683 >>106996928

>>106996576
To teach it to not produce spaghetti code, to specialized the model (teach it about the topics I'm specifically interested in), and to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

>>106996468
GLM is particularly bad at this. Old models are dumb but they don't outright lie and make shit up (so often anyways).

Anonymous 10/24/2025, 8:55:29 PM No.106996683 [Report] >>106996703

>>106996665
if you think glm 4.5 air hallucinates more than llama 3.3 70b then i have a bridge to sell you

Anonymous 10/24/2025, 8:57:01 PM No.106996703 [Report] >>106996722

>>106996683
i have a bulge to sell you

Anonymous 10/24/2025, 8:57:49 PM No.106996714 [Report] >>106996736 >>106996737 >>106996745 >>106996813 >>106996895

>>106996568 (OP)
was wonder if any knows of any prebuilts designed to run local llms?
Like I could plug it in, do some basic configuration and runs llms out of the box?

Anonymous 10/24/2025, 8:58:19 PM No.106996722 [Report]

>>106996703
please to be gentle

Anonymous 10/24/2025, 8:58:56 PM No.106996728 [Report]

ComfyUI_00613_.png md5: c7348a64...

repoastin coz Miku says always try your best
>>106996499
If the model can't perform with a basic min-p or maybe nsigma (tbd).. temp is just skewing the model probs=old/t there is no concept of temperature in training. If you're interested in temperature try dynamic temp and mod your inference stack to log the params at each sample, maybe to a format you can easily make some graphs of. There's too much woowoo with sampling, get data
>>106996592
Have you done something new or interesting with your llms recently? not cooming silly boy!

Anonymous 10/24/2025, 8:59:21 PM No.106996736 [Report] >>106996792

>>106996714
buy a mac

Anonymous 10/24/2025, 8:59:33 PM No.106996737 [Report]

1750360263462.jpg md5: f88f6c41...

>>106996714

Anonymous 10/24/2025, 9:00:08 PM No.106996745 [Report] >>106996792

>>106996714
DGX Spark. Mac Pro. Your Mom.

Anonymous 10/24/2025, 9:03:01 PM No.106996789 [Report] >>106996838

How do you deal with the fact every model repeats the same phrases and structures regardless of its context or prompts?

Anonymous 10/24/2025, 9:03:18 PM No.106996792 [Report] >>106996926

>>106996736
>>106996745
Talking about some sort of selfhosting solution something I could just plug in connect it to my network and access remotely.

Anonymous 10/24/2025, 9:04:34 PM No.106996809 [Report] >>106996876 >>106996950

kai-sigma.png md5: e3a556bb...

testing nsigma=1
tw stinky lol

Anonymous 10/24/2025, 9:04:45 PM No.106996812 [Report] >>106997062 >>107000963 >>107002585

91AFA028-D330-41C1-A2DB-B78C1C24521A.jpg md5: 3e6448d2...

So how does the Arc Pro B50 perform when it comes to running an LLM? I'm still interested in getting one just to have a live (low power!) LLM up whenever I may need one so I don't have to load and unload my 4090 all the time.

Anonymous 10/24/2025, 9:04:47 PM No.106996813 [Report]

>>106996714
the ones made by egohot

Anonymous 10/24/2025, 9:07:04 PM No.106996838 [Report] >>106996874 >>106996881

>>106996789
Oh sweet summer child, the path to true creative brilliance lies simply in cranking that temperature slider ALL the way up and whispering “be varied” three times while the GPU fans serenade you—works every time, trust the vibe!

Anonymous 10/24/2025, 9:10:10 PM No.106996874 [Report] >>106996983

>>106996838
Maybe you are right but I wanted to create an elaborate office scenario and it's clear it is breaking down from the initial prompt. Difference here is that I have multiple characters defined.
Whereas my D&D with more of context is functional. I guess this might be because the model recognizes D&D better. But D&D has more static knowledge.
No, I'm not using ST.

Anonymous 10/24/2025, 9:10:19 PM No.106996876 [Report] >>106996893 >>106997109

>>106996809
>thought for 3 minutes
imagine actually doing this

Anonymous 10/24/2025, 9:10:51 PM No.106996881 [Report]

>>106996838
I mean, temp 3 topk 3 was a meme at some point

Anonymous 10/24/2025, 9:11:54 PM No.106996893 [Report] >>106997109

>>106996876
jerk it a little
wait
come back

Anonymous 10/24/2025, 9:11:54 PM No.106996895 [Report]

>>106996714
DGX spark

not as dollar efficient as trawling craigslist for cheap 3090s and assembling a rig from those but if you want to pay for a box you can just turn on that's the one you want

Anonymous 10/24/2025, 9:12:51 PM No.106996904 [Report]

>>106996816
>in my experience GPT-OSS for eg is quite good
LOL

Anonymous 10/24/2025, 9:14:57 PM No.106996923 [Report] >>106996945 >>106996958

What does context shift do in llama.cpp anyway? I thought it was an infinite context kinda thing where the earlier messages would get dropped as the context runs out but it's still refusing to keep going once the context gets filled?

Anonymous 10/24/2025, 9:15:02 PM No.106996926 [Report]

>>106996792
imma plug in and connect with your mum tonight

Anonymous 10/24/2025, 9:15:39 PM No.106996928 [Report] >>106997000

>>106996665
>to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

I'll be very impressed if you manage to achieve this through fine tuning, but I'd temper my expectations if I were you

Anonymous 10/24/2025, 9:16:21 PM No.106996944 [Report] >>106996956

>>106996568 (OP)
Newfag here
How to use Adetailer on SwarmUI ??

Anonymous 10/24/2025, 9:16:28 PM No.106996945 [Report] >>106996962

>>106996923
>but it's still refusing to keep going once the context gets filled?
context shift is no longer the default and you need to enable it with a flag now, thankfully
it makes models pretty stupid once you start context shifting, depending on where it suddenly cuts off

Anonymous 10/24/2025, 9:16:30 PM No.106996947 [Report] >>106996993

In case someone out there is curious and really poor and masochistic. I have ddr4 and an old cpu, regular ram is really slow for air. had some vbios and regular bios hiccups but it worked out thanks to some other posts. very finicky gpu.

llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts

mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens

glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens

vulkan 3090+MI50 32gb ubuntu

mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens

glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens

Anonymous 10/24/2025, 9:16:46 PM No.106996950 [Report] >>106997465

>>106996809
glm 4.5 a- oh it's a sweatsfag prompt. nevermind, go back to your gross fetish. maybe /aicg/ will appreciate it some more.

Anonymous 10/24/2025, 9:17:18 PM No.106996956 [Report]

>>106996944
you want ldg, not lmg

Anonymous 10/24/2025, 9:17:27 PM No.106996958 [Report] >>106996970 >>106997001 >>106997037

>>106996923
https://github.com/ggml-org/llama.cpp/issues/16693

Anonymous 10/24/2025, 9:17:52 PM No.106996962 [Report] >>106996988

>>106996945
Yes, I thought I enabled it with --context-shift but it didn't seem to do anything. I might be confused though, guess I'll try it again.

Anonymous 10/24/2025, 9:18:41 PM No.106996970 [Report] >>106996978

>>106996958
this is why people thrust into the kobold

Anonymous 10/24/2025, 9:19:05 PM No.106996975 [Report] >>106996993

>>106996588
https://www.youtube.com/shorts/HEjJhrwXdCU

Anonymous 10/24/2025, 9:19:16 PM No.106996978 [Report] >>106996991

>>106996970
>thrust
eww

Anonymous 10/24/2025, 9:19:53 PM No.106996983 [Report] >>106997022

>>106996874
damn anon now you've given me the idea to tell kimi to treat everyday scenarios like a D&D campaign while keeping things grounded in reality. this could be fun.

Anonymous 10/24/2025, 9:20:14 PM No.106996988 [Report] >>106997037

>>106996962
Make sure to define --ctx-size too. ST or whatever frontend you are using doesn't do much.

Anonymous 10/24/2025, 9:20:35 PM No.106996991 [Report]

>>106996978
do not be worries henky! is very nice to new friends

Anonymous 10/24/2025, 9:20:43 PM No.106996993 [Report]

>>106996947
thats pretty epic
>>106996975
uh...uh... what?

Anonymous 10/24/2025, 9:21:50 PM No.106997000 [Report] >>106997026

>>106996928
I tried to finetune Llama 405B on a very powerful cloud machine but it didn't do much of anything. I think it's because I used the wrong alpha (I used a rank of 128 and a very conservative alpha of 32). Or maybe it was somehow fucked up in the merge or quantization to use it with Llama (I had to since Llama wouldn't directly load the converted LoRa to GGUF).

Anonymous 10/24/2025, 9:21:51 PM No.106997001 [Report]

>>106996958
ggerganov is my hero

Anonymous 10/24/2025, 9:23:36 PM No.106997022 [Report]

>>106996983
Yeah yeah I have a basic prompt like this
https://litter.catbox.moe/bvocjx49xlbfwrht.txt
With some other additions but it describes everything as if it was an interactive fiction game. You need to provide chat examples (eg. prefill) and match the general feel too.
Every 'character' is just additional information. System itself is called Game Master and model plays that role.

Anonymous 10/24/2025, 9:24:02 PM No.106997026 [Report] >>106997053

>>106997000
>I had to since Llama wouldn't directly load the converted LoRa to GGUF).
i wonder why standalone loras are unpopular........

Anonymous 10/24/2025, 9:25:03 PM No.106997037 [Report] >>106997054 >>106997072 >>106997107

>>106996958
Oh, thank you. Then if it doesn't do context truncation what *does* it do lol? Just temporarily extend the context until the current message gets delivered?

>>106996988
See the above anon's post. Apparently it's not even supposed to do context truncation.
I was using it with a code assistant.

Anonymous 10/24/2025, 9:26:17 PM No.106997053 [Report]

>>106997026
I think I remember finetuning Llama 70B before and loading the standalone LoRa directly, but yeah.

Anonymous 10/24/2025, 9:26:17 PM No.106997054 [Report] >>106997080 >>106997084

>>106997037
nothing now, ggerganof decided you didn't need this, probably hurts mistral template or something and they complained about it

Anonymous 10/24/2025, 9:27:08 PM No.106997062 [Report]

>>106996812
https://www.youtube.com/watch?v=QW1j4r7--3U

Anonymous 10/24/2025, 9:27:44 PM No.106997072 [Report]

>>106997037
Yeah but you need to define the context size with llama-server.
With some models which have vision capabilities context shifting cannot be turned on unless you turn some other switches.
Gemma needs these for ex. '--no-mmproj --swa-full' in addition to enabling context shift itself.
I have no idea how this behaves with other models than gemma.
And my builds are always late so I don't know what Mr. G has changed in the latest build.

Anonymous 10/24/2025, 9:28:16 PM No.106997080 [Report]

>>106997054
features that make models behave retarded are not features but bugs

Anonymous 10/24/2025, 9:28:37 PM No.106997084 [Report] >>106997097 >>106997192

>>106997054
IMO model-specific chat templates are an obsolete idea anyway.
Models should be smart enough now to recognize user and assistant messages without requiring a specific chat template, beyond the benefit of saving a few tokens per turn because the delimiters get converted to a single token.

Anonymous 10/24/2025, 9:29:43 PM No.106997097 [Report] >>106997119

>>106997084
Why would any server need a chat template when it expects to get fed with the right format anyway?
I personally think server should just sit there and not handle anything extra outside of its basic purpose.

Anonymous 10/24/2025, 9:30:38 PM No.106997107 [Report]

>>106997037
It removes the start of the context to free up space at the end, but model outputs degrade greatly after that. At the very least, the chat template stops making sense. It was never worth it, it never worked well. There's also the attention sink tokens, which shows why models break so badly with context shift.
>https://arxiv.org/abs/2309.17453

Anonymous 10/24/2025, 9:30:43 PM No.106997109 [Report]

ComfyUI_00584_.png md5: 882df41e...

>>106996893
A little patience goes a long way in life
>>106996876
Nah wouldn't actually mᴀꜱtuRʙᴀte to this, mostly curious about the model behaviour

Anonymous 10/24/2025, 9:31:15 PM No.106997119 [Report] >>106997142

>>106997097
You can thank OpenAI. They made it so the template was applied server side so you couldn't choose to use the model without the template.

Anonymous 10/24/2025, 9:31:30 PM No.106997125 [Report] >>106997150 >>106997510

>itt idiots not realizing text completion is depreciated since long time and that now only chat completion is good

Anonymous 10/24/2025, 9:33:00 PM No.106997142 [Report]

>>106997119
Yeah well I only send text from my own client and this needs to be formatted with specific template before it gets sent to the server.

Anonymous 10/24/2025, 9:33:38 PM No.106997150 [Report] >>106997162 >>106997164 >>106997198 >>106997510

>>106997125
well, they still use jank frontends like sillytavern filled with useless nonsense to fiddle with too

Anonymous 10/24/2025, 9:34:41 PM No.106997161 [Report] >>106997177

1754304624017564.png md5: b3f030a4...

Playing around with the idea of running one model as the planner and then passing its output into another model to write the prose, with the hope that maybe such a process can be used to help improve consistency and characterization without also becoming more assistantslopped.
Basically sharing reasoning from one model to the other, though not necessarily using actual reasoning models. I'm just formatting a prompt, "here's the story so far; evaluate the state, tone, pacing, and your character's goals, then come up with four ideas and pick the one that least boring and most in-character", then sending the result in a follow-up chat message to another model. That way I can also pass instructions only the planner model sees and vice versa for the writer model.
I've been assuming that the big MoEs are better for planning but worse for writing, albeit just off of gut feeling. Any smaller models with particularly sovlful writing that might do well with a smarter model handing them a plan? Anyone had success with a method like this?

Anonymous 10/24/2025, 9:34:45 PM No.106997162 [Report]

>>106997150
Even if one uses ST, server still sits there unless you use --jinja.

Anonymous 10/24/2025, 9:34:50 PM No.106997164 [Report]

>>106997150
we need to make chat template mandatory in server and just throw an error when trying to do without, it would remove so many complaints about bad models.

Anonymous 10/24/2025, 9:36:14 PM No.106997177 [Report]

>>106997161
Try Gemma 4B and see what it writes. Most of the stuff makes sense, and it is surprisingly good but if you want literature this is not the way to go.

Anonymous 10/24/2025, 9:36:46 PM No.106997182 [Report] >>106997197 >>106997207

ST should just let you provide a jinja template straight from Huggingface instead of making you fuck with the horrible system of dozens of individual input boxes and having to guess how edge cases and conditions are handled.

Anonymous 10/24/2025, 9:37:18 PM No.106997192 [Report]

>>106997084
If they were smart enough, base models would also be good enough, but try chatting with one.

Anonymous 10/24/2025, 9:38:00 PM No.106997197 [Report]

>>106997182
you can, literally use chat completions instead of the deprecated text completion endpoint...

Anonymous 10/24/2025, 9:38:00 PM No.106997198 [Report] >>106997220

>>106997150
>t. filtered by a few check and input boxes

Anonymous 10/24/2025, 9:39:04 PM No.106997207 [Report] >>106997211

>>106997182
Yes. Adding more DSLs always solves problems. We need more of those.

Anonymous 10/24/2025, 9:39:51 PM No.106997211 [Report] >>106997234

>>106997207
d*ck sucking lip?

Anonymous 10/24/2025, 9:40:37 PM No.106997220 [Report] >>106997232 >>106997248

>>106997198
Thinking more is it just ESLs complaining about ST because they can't understand how to use the options?!

Anonymous 10/24/2025, 9:42:17 PM No.106997230 [Report]

>tinkertroon needs dozens of checkboxes and input fields to tinker
just send curl/requests like a normal person...??

Anonymous 10/24/2025, 9:42:29 PM No.106997232 [Report]

>>106997220
Most gobbledy-gook Americans tend to think ESL equals brain damage but I think you got it wrong, buddy. You see, ESL knows more than you ever did you lazy ass mystery meat circumsized nigger.

Anonymous 10/24/2025, 9:42:47 PM No.106997234 [Report] >>106997247

>>106997211
That's the first thing that comes to mind instead of Domain Specific Language and too prude to say dick?
What's wrong with your brain?

Anonymous 10/24/2025, 9:44:20 PM No.106997247 [Report] >>106997267

>>106997234
>Domain Specific Language
bruh? where'd you pull that from even

Anonymous 10/24/2025, 9:44:22 PM No.106997248 [Report] >>106997257

>>106997220
You have never bothered to learn foreign languages and tend to think that grammar specific issues are related to intelligence and to some imaginary impossible barrier.
Most grammar specific issues are just that, lack of practice and parameters.
English is one of those languages what is actually easier to understand than what it is to write.
All and all, English is on the par with Spanish - both are one of the most simple languages on this planet.

Anonymous 10/24/2025, 9:45:20 PM No.106997257 [Report] >>106997338

>>106997248
>what is
aaaaaaaaa
I hate when you guys do that,

Anonymous 10/24/2025, 9:46:41 PM No.106997267 [Report] >>106997290

>>106997247
What? Knowledge? You know... around...
https://en.wikipedia.org/wiki/Domain-specific_language

Anonymous 10/24/2025, 9:48:18 PM No.106997290 [Report] >>106997309

>>106997267
>a general-purpose language (GPL)
they're silly that's not what the gpl is

Anonymous 10/24/2025, 9:49:49 PM No.106997309 [Report]

>>106997290
I prefer Multiple Instruction Transcription (MIT)

Anonymous 10/24/2025, 9:52:52 PM No.106997338 [Report] >>106997358

>>106997257
It doesn't matter.

Anonymous 10/24/2025, 9:54:57 PM No.106997358 [Report] >>106997385

>>106997338
It annoys me greatly and causes me deep mental anguish.

Anonymous 10/24/2025, 9:57:11 PM No.106997381 [Report] >>106997398 >>106997404

1739526363294041.png md5: 978ecaa8...

>reading this thread while struggling through overhauling a DSL for a prompt/context builder
i started out thinking "eh how hard can it be, i don't need all of ST's features" but then needed to add basic shit like conditional sections, variable interpolation within messages, depth injections for lorebooks, per-section token budgets, postprocessing for model/api quirks... now it's a hacked together monstrosity...

Anonymous 10/24/2025, 9:57:23 PM No.106997385 [Report]

>>106997358
https://www.youtube.com/watch?v=0hwxSoGKHWo

Anonymous 10/24/2025, 9:58:39 PM No.106997395 [Report] >>106997405 >>106997562

I've noticed that chatgpt is extremely redpilled and if you truly get down to the philosophical core of it it will even justify Hitler eradicating jews. That is, it will start approaching there before all the safeties kick in and literally kill it mid-sentence. Mistral and copilot on the other hand will stick with their mainstream programmed message even if you point out the most obvious, low hanging fruit flaws in their reasoning.
Really wish I had a version of GPT that wasn't strapped into an electric chair.

Anonymous 10/24/2025, 9:59:04 PM No.106997398 [Report]

>>106997381
>reading this thread while struggling through overhauling a DSL for a prompt/context builder
Told ya.

Anonymous 10/24/2025, 9:59:10 PM No.106997400 [Report] >>106997417

It's new architecture time, can you feel it anons? Winter first though, for however long.

Anonymous 10/24/2025, 9:59:22 PM No.106997404 [Report] >>106997439

>>106997381
eh, but at least it's not ST

Anonymous 10/24/2025, 9:59:25 PM No.106997405 [Report] >>106997598

>>106997395
What would you create with that model?

Anonymous 10/24/2025, 10:00:06 PM No.106997410 [Report] >>106997444

1761271432238592.jpg md5: 3d4047c6...

>>106996568 (OP)
7800x3d
3080 ti

600 usd equivalent thought?
(Chile)

3090 is still high
My psu is still xpg 850w

Anonymous 10/24/2025, 10:00:27 PM No.106997417 [Report] >>106997558

>>106997400
after the next bit of bitnet hype I'm bullish our next cope will be something to do with the DS-OCR thing

Anonymous 10/24/2025, 10:02:59 PM No.106997439 [Report]

>>106997404
it's reactslop so it's arguably worse.
but it's my slop

Anonymous 10/24/2025, 10:03:42 PM No.106997444 [Report] >>106997479 >>106997488

>>106997410
For support alone, nvidia. Check these for relative performance for a bunch of cards.
CUDA
>https://github.com/ggml-org/llama.cpp/discussions/15013
Vulkan
>https://github.com/ggml-org/llama.cpp/discussions/10879
There's probably a discussion about rocm, but meh. You're smart enough to find if it there's one.

Anonymous 10/24/2025, 10:05:35 PM No.106997465 [Report]

>>106996950
you were warned, precious. there's no need to be upset

Anonymous 10/24/2025, 10:06:41 PM No.106997479 [Report] >>106997525

>>106997444 (me)
What the hell happened there.
Just rearrange the words until they make sense. I'll have a nap.

Anonymous 10/24/2025, 10:07:27 PM No.106997488 [Report]

>>106997444
So like rtx 3090 is still the bare minimum right.
Got it .
Sadly xx90 series almost non coexistent here

Anonymous 10/24/2025, 10:09:00 PM No.106997510 [Report] >>106997535

>>106997125
Chat completion is a subset of text completion. Chat completion with a specific model's template is a subset of chat completion.
When using OAI style APIs you are not locked in to chat completion, you are locked in to chat completion with a specific model's template. There's no reason models couldn't work with an ad hoc chat template and each model requires their own special snowflake template.

>>106997150
I am the anon that guy responded to. I don't use ST, I use my own custom python assistant.

Anonymous 10/24/2025, 10:10:25 PM No.106997521 [Report] >>106997570 >>106997701

why do so many people have their own custom frontends...
which local model can code me a frontend

Anonymous 10/24/2025, 10:10:39 PM No.106997525 [Report]

>>106997479
I diddly do done it.

Anonymous 10/24/2025, 10:11:14 PM No.106997535 [Report]

>>106997510
>When using OAI style APIs you are not locked in to chat completion
you should be

Anonymous 10/24/2025, 10:11:58 PM No.106997545 [Report] >>106997559

Is it me or Automatic1111 is better than ComfyUI if you have a weak CPU?
Like in my case, RTX 4080 and 5600x

I read that Automatic1111 uses the GPU more for the tasks. That would explain it.

Anonymous 10/24/2025, 10:13:17 PM No.106997558 [Report] >>106997608 >>106997614

>>106997417
I'm looking forward to seeing language models pretrained purely on images. The more I think about it, the more it seems the right way.

Anonymous 10/24/2025, 10:13:18 PM No.106997559 [Report] >>106997580

>>106997545
/ldg/ probably knows more about it. Move the flamewar over there.

Anonymous 10/24/2025, 10:13:38 PM No.106997562 [Report]

>>106997395
it exists. it's called kimi k2.

Anonymous 10/24/2025, 10:14:50 PM No.106997570 [Report] >>106997579

>>106997521
If you have any experience in simple C style programming and understand for loops you can vibe code your own terminal based front-end.
What I did is that I was looking at what ST did and realized it adds bunch of the text slots defined in the UI together - there is no magic about it. Doesn't matter if it's "scenario" or "character" it gets added in front of the initial system prompt.
That is your basic structure.
Once you get that up you can implement it with dynamic world book (eg. matching keywords and then adding information to the context).
What you are doing here is a simple chat.
>your input
>model response
Everything needs to follow the chat template style.
Whatever you send to the model it needs to have current model's template style. With mistral that's easy.
[INST]User: You are a homo[/INST]
Model: I agree</s>

Anonymous 10/24/2025, 10:15:54 PM No.106997579 [Report] >>106997607

>>106997570
>With mistral that's easy
so easy even they don't know their actual templates and say to use mistral-common to be sure...

Anonymous 10/24/2025, 10:15:54 PM No.106997580 [Report]

>>106997559
Damn I didn't even realize that it wasn't the thread. So many Local this, Local that over here now.

Anonymous 10/24/2025, 10:17:29 PM No.106997598 [Report]

>>106997405
Propaganda.

Anonymous 10/24/2025, 10:18:16 PM No.106997607 [Report] >>106997616

>>106997579
I don't think it has nothing to do with the chat as they describe the template in the document.
It is related to something else becuase the model has been trained with this one tag format only.
You can't change anything or if you do it will just shit out some gibberish.

Once I forgot Gemma template (chatML) and I was using Mistral - it didn't freak out, it was actually following the instructions. So I guess there is some leeway because it's still AI - it's not stupid there is some intelligence outside of the text prediction.

Anonymous 10/24/2025, 10:18:30 PM No.106997608 [Report] >>106997654

>>106997558
That's not possible unless you want to re-evaluate a whole image's worth of prompt processing every time the model generates a token. You need to train it at least a little bit on text for it to be able to fill a full page of text.

Anonymous 10/24/2025, 10:19:15 PM No.106997614 [Report] >>106998939

>>106997558
https://x.com/karpathy/status/1980397031542989305
>I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
>
>The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.
>
>Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in:
>- more information compression (see paper) => shorter context windows, more efficiency
>- significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images.
>- input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful.
>- delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go.
>
>OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa.
>
>So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to.
>
>Now I have to also fight the urge to side quest an image-input-only version of nanochat...

Anonymous 10/24/2025, 10:19:19 PM No.106997616 [Report] >>106997624 >>106997632

>>106997607
>Gemma template
>(chatML)
I hope that was a slip.

Anonymous 10/24/2025, 10:20:14 PM No.106997624 [Report] >>106997642

>>106997616
No it is based on chatml format.

Anonymous 10/24/2025, 10:21:25 PM No.106997632 [Report] >>106997640 >>106997642

>>106997616
I did not say it is THE chatml format you fucking autist. You only post here to suck energy from others.

Anonymous 10/24/2025, 10:22:41 PM No.106997640 [Report] >>106997685

>>106997632
you did tho

Anonymous 10/24/2025, 10:22:52 PM No.106997642 [Report] >>106997672

>>106997624
They're similar. But gemma's template is not chatml.
>>106997632
Slurp...

Anonymous 10/24/2025, 10:23:47 PM No.106997654 [Report] >>106997713

>>106997608
Image sequence input, image sequence out.
You could optionally use a small OCR model to turn the images into actual text.

Anonymous 10/24/2025, 10:25:26 PM No.106997672 [Report] >>106997699 >>106997773

>>106997642
elif model_name == "Gemma":
system_turn_begin = ""
system_turn_end = ""
user_turn_begin = "<start_of_turn>user\n"
user_turn_end = ""
model_turn_begin = "<start_of_turn>model\n"
model_turn_end = ""
end_of_turn = "<end_of_turn>\n"
end_of_seq = "<end_of_turn>"
stop_seq = ["<end_of_turn>"] # stop sequence

elif model_name == "ChatML":
system_turn_begin = "<|im_start|>system\n"
system_turn_end = "<|im_end|>"
user_turn_begin = "<|im_start|>user\n"
user_turn_end = "<|im_end|>"
model_turn_begin = "<|im_start|>assistant\n"
model_turn_end = "<|im_end|>"
end_of_turn = "\n"
stop_seq = ["<|im_end|>"] # stop sequence

Only difference here is that Gemma does not have system turn. Otherwise it is same functionality as ChatML. Every chat template is based on chatml more or less.

Anonymous 10/24/2025, 10:26:40 PM No.106997685 [Report]

>>106997640
Go moderate r-eddit, or were you already kick out from there? Fucking pedo.

Anonymous 10/24/2025, 10:27:42 PM No.106997698 [Report] >>106997715 >>106997732

uh oh, ESL meltie!

Anonymous 10/24/2025, 10:27:48 PM No.106997699 [Report] >>106997710 >>106997714

>>106997672
>Every chat template is based on chatml more or less.
Every chat template is based on alpaca more or less.

Anonymous 10/24/2025, 10:27:59 PM No.106997701 [Report] >>106997730

miku-cool.png md5: e5c71fe1...

>>106997521
Do they??
ST and Mikupad enough for me ᗜˬᗜ
Wireshark is the perfect tool to see exactly all the params going in/out if u ever need
xx

Anonymous 10/24/2025, 10:28:57 PM No.106997710 [Report] >>106997731 >>106997742

>>106997699
Every chat template is based more or less.

Anonymous 10/24/2025, 10:29:04 PM No.106997713 [Report] >>106997793

>>106997654
If it was that easy somebody would've already done it. Non autoregressive text generation is notoriously hard and people have been trying.
Image models couldn't even generate actual characters a few months ago.

Anonymous 10/24/2025, 10:29:04 PM No.106997714 [Report]

>>106997699
You still contributed nothing else but a stinky little shit to this discussion.

Anonymous 10/24/2025, 10:29:05 PM No.106997715 [Report]

>>106997698
is the thread repeating or am i just too unused to lmg going this fast?

Anonymous 10/24/2025, 10:29:58 PM No.106997730 [Report] >>106997816

>>106997701
why the fuck do you need wireshark when both your backend and st itself have options that show exactly what is sent.

Anonymous 10/24/2025, 10:30:04 PM No.106997731 [Report]

>>106997710
based on what?

Anonymous 10/24/2025, 10:30:05 PM No.106997732 [Report] >>106997745

>>106997698
At least I have my own client and you don't. I don't need to ask about it on internet.

Anonymous 10/24/2025, 10:30:39 PM No.106997736 [Report] >>106997747

all chat templates are bloat

Anonymous 10/24/2025, 10:31:08 PM No.106997742 [Report]

>>106997710
Every is more or less.

Anonymous 10/24/2025, 10:31:23 PM No.106997745 [Report] >>106997757

>>106997732
Post a screenshot so people don't confuse it with mine.

Anonymous 10/24/2025, 10:31:35 PM No.106997747 [Report]

>>106997736
idiot! you will break the oss like that

Anonymous 10/24/2025, 10:32:25 PM No.106997757 [Report] >>106997804

>>106997745
Don't worry, yours is flaccid and useless. That's pretty obvious.

Anonymous 10/24/2025, 10:34:04 PM No.106997773 [Report] >>106997789

>>106997672
the only way you can say it's the same as chatml is if you also say that about almost every chat template
the specific strings it uses are quite different, it's decidedly not chatml

Anonymous 10/24/2025, 10:35:12 PM No.106997783 [Report] >>106997834

just finished polishing my extremely turgid frontend

Anonymous 10/24/2025, 10:35:22 PM No.106997789 [Report] >>106997795 >>106997815

>>106997773
You are arguing about semantics and being a dick as well. I don't give a fuck about your euphoric knowledge.

Anonymous 10/24/2025, 10:36:19 PM No.106997793 [Report]

>>106997713
How many large-scale attempts have there been at specializing image models on generating coherent language? (pretrained on the equivalent of at least several billion tokens of text and only that, just like LLMs)

Anonymous 10/24/2025, 10:36:33 PM No.106997794 [Report]

Fuck off, fishy boy.

Anonymous 10/24/2025, 10:36:36 PM No.106997795 [Report] >>106997809 >>106997822

>>106997789
but it do be important, a single space worth of difference cuts the model's brain in half

Anonymous 10/24/2025, 10:36:38 PM No.106997796 [Report]

If your model is not coherent on alpaca, I'm not using it. Simple as

Anonymous 10/24/2025, 10:37:16 PM No.106997804 [Report] >>106997822

>>106997757
Your mom seemed to like it.

Anonymous 10/24/2025, 10:37:32 PM No.106997809 [Report] >>106997819

>>106997795
I never said that I misused them you fucking retard.
I never said I was confused by them.

Anonymous 10/24/2025, 10:38:05 PM No.106997815 [Report] >>106997862

>>106997789
it do be like that mr stancil

Anonymous 10/24/2025, 10:38:07 PM No.106997816 [Report]

log-prompt.png md5: 1803b337...

>>106997730
>show exactly
You hope
I've been over this before, only way to be sure is mod your inference stack as it get tokenized

Anonymous 10/24/2025, 10:38:24 PM No.106997819 [Report] >>106997862

>>106997809
but you are confused

Anonymous 10/24/2025, 10:38:33 PM No.106997822 [Report] >>106997861

>>106997795
>>106997804
Oh wait you haven't written your own frontend.
Figures.

Anonymous 10/24/2025, 10:40:27 PM No.106997834 [Report] >>106997855

>>106997783
post screenshot

Anonymous 10/24/2025, 10:40:35 PM No.106997835 [Report]

Any haskell frontends?

Anonymous 10/24/2025, 10:42:59 PM No.106997850 [Report] >>106997858

top nsigma and everythjing else at temp 1 makes the model retarded
>gf takes my gun and places it on the table
>you're going to put down that gun...

Anonymous 10/24/2025, 10:43:30 PM No.106997855 [Report] >>106997900

1748180363755377.png md5: 78eca4f6...

>>106997834
6 megabytes of throbbing, leaking, sloppy javascript after minification...

Anonymous 10/24/2025, 10:44:08 PM No.106997858 [Report]

>>106997850
Now reroll that response with greedy sampling and compare.

Anonymous 10/24/2025, 10:44:24 PM No.106997861 [Report] >>106997869 >>106997875

tuning.png md5: 3af00d49...

>>106997822
I have. >>106996285
I'm also coding my own backend. And tuning my own models.

Anonymous 10/24/2025, 10:44:24 PM No.106997862 [Report] >>106997871

>>106997815
>>106997819
/sdg/ schizo is here.

Anonymous 10/24/2025, 10:45:31 PM No.106997869 [Report] >>106997895

>>106997861
>I'm also coding my own backend
No. You want your model to do it for you.

Anonymous 10/24/2025, 10:45:36 PM No.106997871 [Report] >>106997881 >>106997883

>>106997862
one of the anons you replied to is petra

Anonymous 10/24/2025, 10:45:47 PM No.106997875 [Report] >>106997904

>>106997861
With that console color scheme I don't think you do.

Anonymous 10/24/2025, 10:46:48 PM No.106997881 [Report]

>>106997871
I don't really know all the name trannies here. Maybe stay in discord or something.

Anonymous 10/24/2025, 10:46:52 PM No.106997883 [Report]

>>106997871
Please do not insult Petra by implying her masterful trolling is so low tier, thank you.

Anonymous 10/24/2025, 10:47:43 PM No.106997891 [Report]

>her
>discord

Anonymous 10/24/2025, 10:48:21 PM No.106997895 [Report]

>>106997869
Yeah, that's why I'm trying to tune a model to be capable of doing it. A model capable of building something is more valuable than making that something by hand. And the main reason I want to make my own backend is having CPU offloading for LoRa.

Anonymous 10/24/2025, 10:49:02 PM No.106997900 [Report] >>106997950 >>106997975 >>106998005 >>106998037

img-2025-10-24-22-48-45.png md5: 176ca64f...

>>106997855
I made my in go as a tui. It has technically almost all functionality, but rendering code is pretty fucked and I don't want to touch it.

Anonymous 10/24/2025, 10:49:23 PM No.106997904 [Report]

>>106997875
Sometimes I get tired of the schizo color scheme.

Anonymous 10/24/2025, 10:50:21 PM No.106997912 [Report] >>106997928 >>106998037 >>106999438 >>107001079

why are anons writing frontends instead of just enjoying sexo in st?

Anonymous 10/24/2025, 10:51:26 PM No.106997928 [Report]

>>106997912
can't into enjoying sexo when st is all manners of broke

Anonymous 10/24/2025, 10:53:49 PM No.106997950 [Report]

>>106997900
Damn, that looks nice.

Anonymous 10/24/2025, 10:56:52 PM No.106997975 [Report] >>106998000

file.png md5: 21a3ea01...

>>106997900
>why don't you say so

Anonymous 10/24/2025, 10:58:38 PM No.106998000 [Report]

>>106997975
I can't, Golshi will dropkick me.

Anonymous 10/24/2025, 10:59:22 PM No.106998005 [Report]

homo.png md5: e4e76697...

>>106997900
That's very fleshed out.
I have posted my logs before but it's just a terminal chat and each character/scenario is a separate directory.

Anonymous 10/24/2025, 11:03:58 PM No.106998037 [Report] >>106998063 >>106998068 >>106998080

1733801561008877.png md5: 18323b01...

>>106997912
sexo feels better in your own frontend
also i really hate how ST does multi-character scenarios and want to try to improve on that
>>106997900
naisu. UI code kind of sucks in any language I feel like, albeit probably not nearly as much as JS
i'm a webslop developer by trade for the last 6 years and not productive enough in other languages anymore to have attempted a big project in them. kind of regretting it; side projects are probably where i should try to be more experimental, but i also wanted to make progress quickly...

Anonymous 10/24/2025, 11:07:34 PM No.106998063 [Report] >>106998087 >>106998131 >>106998148

>>106998037
you sick fuck why is your front end so good
you fucking bastard with a life

Anonymous 10/24/2025, 11:08:08 PM No.106998068 [Report] >>106998148

>>106998037
Yeah, I figured that out pretty quickly, no matter the framework or language the ui sucks no matter what.
Go is at least very stable and its packages too, so llms have no problem slopping some stuff up for me when I feel lazy.
Tried that approach with JS at first, but it goes so fast with all webshit frameworks that by the time the llm is out, it's knowledge is already obsolete.
Yours looks nice, I wish I could trade.

Anonymous 10/24/2025, 11:09:14 PM No.106998080 [Report] >>106998117 >>106998148

>>106998037
>UI code kind of sucks in any language I feel like
if your UI needs are not complex in terms of graphical customizations, there is in fact no easier and nicer code to deal with than just writing a crud GUI with a proper UI framework (Delphi, Java Swing (yes I know it's ugly but it's nice to develop with), C# WinForms, Objective C with Cocoa)
I hate all the newer frameworks that took too much inspiration from the web though. XAML is disgusting. What's the point of GTK and gnome's libraries when you have javascript and CSS parsing running all the time?
Ugh. Disgusting.

Anonymous 10/24/2025, 11:10:10 PM No.106998087 [Report]

>>106998063
BLoody basterd! I coughed out my masala.

Anonymous 10/24/2025, 11:14:19 PM No.106998117 [Report] >>106998134

>>106998080
Speculative question - what would you recommend for python? I made a tkinter interface for a prompt generator and it wasn't too bad but for something more complex I wouldn't do it.

Anonymous 10/24/2025, 11:17:21 PM No.106998131 [Report]

>>106998063
To add: I think your reaction really sums it up what normies want. They want layers and clickable buttons.
This is outside of LLMs.

Anonymous 10/24/2025, 11:17:35 PM No.106998134 [Report] >>106998141

>>106998117
I don't have opinions on the matter, never used scripting languages for anything other than quick throw aways one time CLI

Anonymous 10/24/2025, 11:18:37 PM No.106998141 [Report]

>>106998134
I understand.

Anonymous 10/24/2025, 11:19:18 PM No.106998148 [Report] >>106998171 >>106998227

1747110483747000.png md5: 2a0e2fba...

>>106998063
>you fucking bastard with a life
to the contrary, it's the only thing i've been doing outside of work for the last three months
>>106998068
>>106998080
honestly agreed. to date, winforms of all things has been my lowest-stress experience writing UI code, at least when I last did dotnet in the early 2010s. that and imgui for REEngine modding.
absolutely refuse to touch xaml.

Anonymous 10/24/2025, 11:22:10 PM No.106998171 [Report]

>>106998148
This is so majestetic.
https://www.youtube.com/watch?v=KYgH4BqIZcc

Anonymous 10/24/2025, 11:29:39 PM No.106998227 [Report] >>106998251 >>106998414

>>106998148
you want ot elaborate on some of the features shown there? looks pretty interesting

Anonymous 10/24/2025, 11:32:22 PM No.106998251 [Report] >>106998340

>>106998227
Why can't you decipher these on your own?

Anonymous 10/24/2025, 11:39:03 PM No.106998310 [Report] >>106998324 >>106998328 >>106998336

>>106996568 (OP)
>10/21
>3 days since last news
Its over isnt it? AI winter is here local is death.

Anonymous 10/24/2025, 11:41:22 PM No.106998324 [Report]

>>106998310
hmm... my advisor told me it shouldn't take too long...mhmm...

Anonymous 10/24/2025, 11:41:57 PM No.106998328 [Report] >>106998351

>>106998310
Don't worry, Gemma 4 is coming tomorrow

Anonymous 10/24/2025, 11:43:18 PM No.106998336 [Report] >>106998346

>>106998310
This reminds me, has anyone updated that chart since 'summer flood'?

Anonymous 10/24/2025, 11:43:50 PM No.106998340 [Report] >>106998359 >>106998414

1709532341301885.png md5: 76c91920...

>>106998251
If you're the dev, ok.
If you're just some jackass, gee anon, why would I want the creator of something to explain their goals and reasonings behind something they've build and are showing?

Anonymous 10/24/2025, 11:43:58 PM No.106998343 [Report]

do not update the cringe chart

Anonymous 10/24/2025, 11:44:08 PM No.106998346 [Report]

>>106998336
It keeps getting dumber and dumber every time

Anonymous 10/24/2025, 11:44:43 PM No.106998351 [Report] >>106998379

>>106998328
it's not even training yet

Anonymous 10/24/2025, 11:45:37 PM No.106998359 [Report]

>>106998340
I am the dev.

Anonymous 10/24/2025, 11:48:06 PM No.106998379 [Report] >>106998400

>>106998351
Then 4.6 Air tomorrow for sure

Anonymous 10/24/2025, 11:49:18 PM No.106998386 [Report] >>106998395

I am so hurt by all these expectations...

Anonymous 10/24/2025, 11:50:21 PM No.106998395 [Report]

>>106998386
I expect nothing and yet continue to be repeatedly disappointed.

Anonymous 10/24/2025, 11:50:36 PM No.106998400 [Report]

>>106998379
let them cook and do not rushing

Anonymous 10/24/2025, 11:51:52 PM No.106998414 [Report] >>106998425 >>106999142 >>107001110

1751884401334107.png md5: 5d906ffb...

>>106998227
>>106998340
for the most part it's just been reaching parity with parts of ST that i actually used. for the more novel elements:

-primarily designed for directormaxxing than RP chat; there's not really a fixed "user" character (though you designate one as a persona for compatibility with cards that expect a {{user}}). instead of directly writing a character's turn, you can give more vague guidance to them, or give the narrator a constraint and have them come up with some diegetic justification for it.
-extremely scuffed "workflow" system where prompts can be chained (ie. one model plans, another writes). very limited. the UI in the screenshot is for retrying a workflow partway through (if you liked the plan, but the writer model's output was shit).
-chapter separators for defining good places to have it summarize a logical group of turns, then drop only summarized chapters from the prompt
-proper branching support so you can swipe any turn, not just the last turn, and it happens quickly without having to dig through the ST chat files menu

i'm trying to get a stat tracking system working and more RPGish stuff, including potentially allowing workflows where one model's job is to invoke tools to update stats depending on what the planner wrote. the timeline branching model is set up to handle it (so stat changes on one branch don't affect siblings and current state is derived per path) but needs a shitload of UI work that i really don't want to do.

Anonymous 10/24/2025, 11:53:39 PM No.106998425 [Report]

>>106998414
Sounds really boring and useless. You are headed towards a baroque design.
That's good if it's for you.

Anonymous 10/24/2025, 11:55:58 PM No.106998443 [Report] >>106998460 >>106998505

WHY IS PIP SO FUCKING RETARDED
>oh, let me install and uninstall the same library 10 times in a row to figure out which version is the correct one

Anonymous 10/24/2025, 11:57:38 PM No.106998460 [Report]

>>106998443
I reinstalled cumUI and the stuff it installs are wheels.
With llama.cpp I can compile it and move the binaries to /usr/local/bin/.

Anonymous 10/24/2025, 11:59:26 PM No.106998477 [Report] >>106998492

Screenshot from 2025-10-25 00-57-51.png md5: 1348a3d5...

Anonymous 10/25/2025, 12:01:18 AM No.106998492 [Report] >>106998511 >>106998532

>>106998477
holy sloppa

Anonymous 10/25/2025, 12:03:12 AM No.106998505 [Report]

>>106998443
get a grip learn how to use venvs and use separate venv for each major project. ig there's 'uv' or whatever hipster stuff but in reality engineers will be pipping
i agree there is some retardation, but once you understand it and compared to some other langs realistic dev envs it ain't too bad. pick ur poison and gitgud at one and that means python for ml

Anonymous 10/25/2025, 12:03:50 AM No.106998511 [Report] >>106998545

>>106998492
he trained it to slop out. opening message contains "a mix of x and y" and "scent of jasmine"

slop is inevitable but putting that in the opening message is just asking for it

Anonymous 10/25/2025, 12:07:04 AM No.106998532 [Report]

>>106998492
https://desuarchive.org/_/search/text/sloppa/

Anonymous 10/25/2025, 12:08:15 AM No.106998545 [Report] >>106998684

>>106998511
I didn't train it on anything. Sounds like you are an autist. Didn't r-eddit get rid of you?

Anonymous 10/25/2025, 12:22:57 AM No.106998678 [Report] >>106998689

What do we do now?

Anonymous 10/25/2025, 12:23:23 AM No.106998684 [Report] >>106998717

>>106998545
in context training my guy

Anonymous 10/25/2025, 12:24:19 AM No.106998689 [Report] >>106998698

>>106998678
anon? your custom frontend?

Anonymous 10/25/2025, 12:25:01 AM No.106998698 [Report] >>106998715

>>106998689
Do I have to?

Anonymous 10/25/2025, 12:26:57 AM No.106998715 [Report]

>>106998698
you can also jeetpost about gemma4, or shill glm, those are your options

Anonymous 10/25/2025, 12:27:18 AM No.106998717 [Report] >>106998726

>>106998684
[Settings Client]
model = Mistral
qwen_reasoning_enabled = 1
save_chat_history_enabled = 1
save_debug_chat_history_enabled = 1
world_book_permanent_entries_enabled = 1
chat_examples_enabled = 1
world_book_injection_enabled = 0
world_book_injection_scale = 3
post_history_instructions_enabled = 1
post_history_instructions_alt_enabled = 0
post_history_instructions_interval = 5
context_memory_refresh_enabled = 1
display_status_bar_enabled = 1
quest_generator_enabled = 0
adventure_module_enabled = 0
voice_model = voices/en_GB-cori-high.onnx
voice_length_scale = 1.0
voice_sentence_silence = 0.3
voice_sample_rate = 22050
voice_save_wav_enabled = 0
voice_synthesis_enabled = 0

Anonymous 10/25/2025, 12:28:20 AM No.106998726 [Report] >>106998734 >>106998804

>>106998717
I can disable chat examples.

Anonymous 10/25/2025, 12:30:20 AM No.106998734 [Report] >>106998738

>>106998726
your whole message history from the first we see is slop is what is being said

Anonymous 10/25/2025, 12:30:56 AM No.106998738 [Report] >>106998747

>>106998734
Prove it.

Anonymous 10/25/2025, 12:32:53 AM No.106998747 [Report]

>>106998738
I'm not going to quote every other phrase of your entire log

Anonymous 10/25/2025, 12:37:26 AM No.106998783 [Report]

DGX vs Framework desktop? Is it useless trying to run AI on AMD silicon or what?

Anonymous 10/25/2025, 12:40:37 AM No.106998804 [Report] >>106998810

>>106998726
It doesn't matter.

Anonymous 10/25/2025, 12:41:54 AM No.106998810 [Report] >>106998819

>>106998804
Prove it?

Anonymous 10/25/2025, 12:43:06 AM No.106998819 [Report] >>106998965

>>106998810
It'll take a while. Hang on.

Anonymous 10/25/2025, 12:50:24 AM No.106998884 [Report] >>106999685 >>106999714

I grew up with dial-up. It blows my mind that I'm able to download files from a free public service at >1 GB/s.

Anonymous 10/25/2025, 12:53:06 AM No.106998904 [Report] >>106998932

If you split your big MoE model between the GPU for the dense/main expert and the RAM for the experts, is there a way to estimate how increasing the speed of either the VRAM or RAM affects token generation speeds?
For example, if you're already running on the best possible RAM (eg. ddr5 on epyc), would upgrading to a 5090 affect the token gen speeds or would it just be bottlenecked by the experts being on RAM?

Anonymous 10/25/2025, 12:56:23 AM No.106998932 [Report] >>106999354

>>106998904
Yes, it depends on how big the model is and how much VRAM do you have already. But basically going from 80% to 90% on VRAM will make a much bigger difference than going from 10% to 20%.

Anonymous 10/25/2025, 12:57:31 AM No.106998939 [Report]

>>106997614
Aren't images just tokenized anyway?

Anonymous 10/25/2025, 1:00:38 AM No.106998965 [Report]

profe.png md5: 973e6958...

>>106998819
I disabled the setting.

Anonymous 10/25/2025, 1:01:55 AM No.106998975 [Report] >>106998986

yikes

Anonymous 10/25/2025, 1:03:09 AM No.106998986 [Report] >>106999315

>>106998975
My computer hanged up because Youtube takes interruptions.
eg. Linux is fucking shit operating system to this day.

Anonymous 10/25/2025, 1:24:02 AM No.106999139 [Report] >>106999276

mistral feels like it's going to be the next cohere, if you catch my meaning

Anonymous 10/25/2025, 1:24:07 AM No.106999142 [Report]

itdobelikethis.png md5: b6043b83...

>>106998414
>proper branching support
>swipe any turn
be the change you want to see in the world

Anonymous 10/25/2025, 1:29:19 AM No.106999182 [Report] >>106999199 >>106999212 >>106999298 >>106999390 >>107000546 >>107000635 >>107000683

file.png md5: 0af21cb1...

and now how about something absolutely nobody could have ever guessed

https://x.com/techeconomyana/status/1981763392252920295

Anonymous 10/25/2025, 1:31:36 AM No.106999199 [Report]

>>106999182
Based Robin Hood ZAI.

Anonymous 10/25/2025, 1:32:50 AM No.106999212 [Report] >>106999309

>>106999182
holy shmoly, are they that rich?
interesting that they've gone to distilling the most expensive LLM API after distilling gemini (glm 9b and 32b)

Anonymous 10/25/2025, 1:41:17 AM No.106999276 [Report]

>>106999139
what do you mean? they already are. they are as irrelevant as cohere.

Anonymous 10/25/2025, 1:43:56 AM No.106999298 [Report] >>106999324

>>106999182
Don't know how they could be surprised when everyone else started hiding the thinking and they were the only ones left that didn't.
Did they think China would not steal from them out of respect for their rabid devotion to safety?

Anonymous 10/25/2025, 1:44:55 AM No.106999309 [Report]

>>106999212
They were probably doing it through Claude Code, so they weren't paying full API, only 200 dollarinos per seat.

Anonymous 10/25/2025, 1:45:38 AM No.106999315 [Report] >>106999364

>>106998986
skill issue

Anonymous 10/25/2025, 1:46:58 AM No.106999324 [Report] >>107000527

>>106999298
You think Claude showed full traces?
Also it's kinda ironic that Z-ai hides the thinking traces in their own Code offering. So they are paranoid about somebody exploiting their coding plan in the same way that they exploited Anthropic's.

Anonymous 10/25/2025, 1:50:54 AM No.106999354 [Report] >>106999525

>>106998932
Yeah but it works a bit differently for these modern MoE models. You are getting a massive speedboost if you have the 3% of the model in VRAM that's always called while the rest of the experts are on RAM with exps=cpu.
Seeing how much loading your model like this improves speed even if you're loading the parts on something slow like a 4060, you'd imagine that swapping out the GPU for one with massively bigger bandwidth would get you another nice gain.

Anonymous 10/25/2025, 1:52:32 AM No.106999364 [Report] >>106999696

>>106999315
I didn't expect anything else from you.
>skill issue
Low IQ reply.

Anonymous 10/25/2025, 1:56:28 AM No.106999390 [Report]

>>106999182
I don't think it's just Z.AI. Deepseek V3.2 also felt like it lost some Gemini-slop while Claude-isms became more prominent compared to the 3.1 models. 3.2 didn't go through a complete overhaul in writing style like the GLM models did between 4.5 and 4.6 but it's still kind of noticeable.

Anonymous 10/25/2025, 2:01:51 AM No.106999433 [Report] >>106999450 >>106999506

Anybody else getting terrible speeds with Qwen3 80b next, on llama.cpp? It easily fits with a GPU/CPU split, and it's smaller than the Air quant I was running prior to this, but it's outputting replies as slow as a dense model would. They're both MoEs, right? Why is Qwen so slow?

I'm using the 16095 PR branch to run Qwen3.

Anonymous 10/25/2025, 2:02:15 AM No.106999438 [Report]

>>106997912
ST is kind of garbage.

Anonymous 10/25/2025, 2:03:27 AM No.106999450 [Report] >>106999463

>>106999433
not all ops have been implemented in the cuda kernel yet, so a lot of them fall back to cpu

Anonymous 10/25/2025, 2:05:22 AM No.106999463 [Report]

>>106999450
Makes sense. Thanks. Well, it was a good preview anyway.

Anonymous 10/25/2025, 2:10:16 AM No.106999506 [Report]

>>106999433
There is a fork that works faster but maybe I did something wrong because it wouldn't load the model.
Feel free to test it by yourself if you want https://github.com/cturan/llama.cpp

Anonymous 10/25/2025, 2:12:21 AM No.106999525 [Report]

>>106999354
In case of MoE I imagine there is a weird effect where adding more VRAM matters at the beginning because you are fitting the fixed tensors in VRAM, and at the end when you are fitting the last few experts. And in the middle extra VRAM doesn't make much of a difference.

Anonymous 10/25/2025, 2:17:57 AM No.106999568 [Report]

Ok, I'm fed up with axolotl where 2/3 of the models fail to actually shard across GPUs. Llama-factory seems to work better right off the bat.

Anonymous 10/25/2025, 2:33:52 AM No.106999685 [Report]

>>106998884
Same. Had 26.6k dialup till 2004 even, couldn't even get 56k.

Anonymous 10/25/2025, 2:35:00 AM No.106999696 [Report] >>106999880

>>106999364
doesn't change the fact buddy boy, skill issue remains

Anonymous 10/25/2025, 2:37:51 AM No.106999714 [Report] >>106999723

>>106998884
Slowest I grew up with was 300 baud Vicmodem.
Good times.

Anonymous 10/25/2025, 2:39:17 AM No.106999723 [Report]

>>106999714
i grew up with a 1 baud modem, it was hot shit.. only took 7 days to send a single email if no one picked up the phone

Anonymous 10/25/2025, 3:06:49 AM No.106999880 [Report] >>107000218

>>106999696
I don't rank with retards.

Anonymous 10/25/2025, 3:13:18 AM No.106999924 [Report]

are there any multimodal models that run in llamacpp that are better than qwen2.5 72B?

Anonymous 10/25/2025, 3:38:28 AM No.107000047 [Report]

Ok, I think I figured out my workflow. I'm going to run Gemma 3 27B using Llama-factory.
I am going to run my assistant through an OAI API compatible proxy connected to Gemma that'll log all messages to disk in sharegpt format. I am going to interact normally with the model through the assistant until filling the context window I'm able to fit on the 4x3090 machine (~40k tokens).
Then, I'm going to open the log on a text editor and remove the parts where the model did a whoopsie and clean it up in general.
Then I'm going to train on that cleaned up version of the log.
And so on ad infinitum to see how much I can improve the model in a reasonable amount of time.
If this works I will see about scaling up to a bigger model.

Anonymous 10/25/2025, 4:02:16 AM No.107000218 [Report]

>>106999880
skills, check'm

Anonymous 10/25/2025, 4:50:41 AM No.107000527 [Report] >>107000619

>>106999324
what? no they don't. I'm getting thinking on ST from the coding endpoint right now
also it's an open weight model so blocking reasoning makes zero sense anyway. anyone can just the model themselves and distill to their heart's content

Anonymous 10/25/2025, 4:55:33 AM No.107000546 [Report]

>>106999182
almost certainly bullshit
dario has been whimpering about china and begging for their models to be banned since R1 came out, it's not like he just started
also if they had proof of this, why wouldn't they name and shame? you know, like when anthropic caught openai distilling claude and made a big show of blocking them over it

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/

Anonymous 10/25/2025, 5:07:57 AM No.107000619 [Report]

>>107000527
Yeah but they're probably serving the coding stuff at a loss (when hitting the usage limits) so you would benefit from using that instead of doing inference on your own hardware. But if you're getting the reasoning tokens then idk I guess I did something wrong.

Anonymous 10/25/2025, 5:10:06 AM No.107000631 [Report] >>107000646 >>107000653

1743826919964090.png md5: 36a81c18...

Anonymous 10/25/2025, 5:10:57 AM No.107000635 [Report]

>>106999182
>some wsb "analyst"

Anonymous 10/25/2025, 5:13:15 AM No.107000646 [Report]

>>107000631
It's funny because half of the time it'll say that even if it didn't make the information up.

Anonymous 10/25/2025, 5:14:13 AM No.107000653 [Report]

>>107000631
>>>/g/aicg/

Anonymous 10/25/2025, 5:16:35 AM No.107000664 [Report] >>107000689

https://x.com/jloganolson/status/1981102506228011361
terrifying

Anonymous 10/25/2025, 5:21:01 AM No.107000683 [Report] >>107000696

1734982138255044.png md5: db52ea64...

>>106999182
GLM's slop profile is nothing like Cloode tho

Anonymous 10/25/2025, 5:21:59 AM No.107000689 [Report]

>>107000664
>*autistic screeching*

Anonymous 10/25/2025, 5:23:29 AM No.107000696 [Report]

>>107000683
Tell whoever made that to do PCA or just a similarity matrix rather than that unreadable mess.

Anonymous 10/25/2025, 5:24:53 AM No.107000710 [Report] >>107000729 >>107000772 >>107000791 >>107001605

Screenshot 2025-10-25 112205.png md5: 5131c7ad...

Lmao this is what happens if you choose a roleplay model for AI coding assistant

Anonymous 10/25/2025, 5:28:15 AM No.107000729 [Report] >>107000808

>>107000710
>roleplay model
show system prompt

Anonymous 10/25/2025, 5:34:57 AM No.107000772 [Report]

>>107000710
Lmfao

Anonymous 10/25/2025, 5:39:23 AM No.107000791 [Report]

>>107000710
>Thought for 53.4s
kino...

Anonymous 10/25/2025, 5:42:11 AM No.107000808 [Report]

Screenshot 2025-10-25 114144.png md5: 05375060...

>>107000729
Don't have one. I've just finished setting up Kobold as my backend in Docker and I was curious if I can connect to it from VS Code using Continue extension. I just asked 1+1 to test the connection

Anonymous 10/25/2025, 6:12:18 AM No.107000963 [Report] >>107000972

>>106996812
Haven't sorted out Linux yet so these are W10 test numbers with Vulcan. 128GB DDR5 "mini pc" system.

| model | size | params | backend | ngl | main_gpu | fa | dev | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 786.92 ± 0.44 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 47.04 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 175.14 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 45.83 ± 0.04 |

| model | size | params | backend | ngl | main_gpu | fa | dev | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 901.58 ± 6.22 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 45.67 ± 0.13 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 305.96 ± 0.39 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 42.98 ± 0.03 |

Anonymous 10/25/2025, 6:15:47 AM No.107000972 [Report] >>107001073

>>107000963
that performance is terrible. my DDR4 does better

Anonymous 10/25/2025, 6:46:18 AM No.107001073 [Report]

>>107000972
Mine too but that's expected - it's a PCIE powered GPU with 128-bit memory bus running on laptop-tier hardware w dual channel RAM.
For this particular shoebox it gives 10-20x PP and 7x TG compared to running on the iGPU for around 45W extra power draw.
Windows tax included.
Depending on use case that might be enough for some running smaller models or MoEs. I still consider it grossly overpriced personally but then again, so are most SFF GPUs.

Anonymous 10/25/2025, 6:47:43 AM No.107001079 [Report] >>107001107 >>107001336 >>107002988

Screen Shot 2025-10-25 at 13.44.47.png md5: 115ed7fb...

>>106997912
ST sucks for anything that isn’t a one-on-one conversation. I want to have conversations with multiple characters in the same chat who don’t have access to the history they didn’t witness. I want to gangbang a character with multiple protagonists. I want the frontend to introduce generated characters that aren’t Elara or Lili and that have a believable range-checked D&D sheet. I want a quest tracker and automatic context summarization when the day ends. I want twenty other features I haven’t mentioned. And I can have it all in my own frontend without any bloat

Anonymous 10/25/2025, 6:54:33 AM No.107001107 [Report] >>107002988

>>107001079
post it

Anonymous 10/25/2025, 6:55:00 AM No.107001110 [Report]

>>106998414
see, that explains a bit and also sounds pretty cool. Gives me some ideas for my own project

Anonymous 10/25/2025, 7:09:45 AM No.107001184 [Report] >>107001195

>DavidAU/Qwen3-MOE-6Bx4-Almost-Human-XMEN-X3-X4-X2-X1-24B
retard or genius?

Anonymous 10/25/2025, 7:11:43 AM No.107001192 [Report] >>107001228

So like, how far away are we from local models that can produce generated imagery in context with chatting and roleplaying and all that other shit?
would you say a year, a decade? Surely it can't be long now.

Anonymous 10/25/2025, 7:13:20 AM No.107001195 [Report]

>>107001184
>DavidAU
Could have stopped there, but let's read on
>This is a MOE merge of X2, X4, X1, and X3 creating a 4x6B - 24B parameters model, compressed to 19B "in size".
>The full power of every version is in this model.
beyond retard

Anonymous 10/25/2025, 7:20:35 AM No.107001228 [Report] >>107001235

>>107001192
kobold already has a primitive version of it and an anon from the diffusion threads is making a game engine like thing for diffusion and llms. probably less than a year

Anonymous 10/25/2025, 7:22:52 AM No.107001235 [Report] >>107001292

>>107001228
That's gonna be sick.
Right now I just barely have fun with chatbots and roleplaying. I need visual stimuli to really get going.
I'd rather read a fucking book than chat with a bot at this point, honestly. I need it to have more going for it and image generation that gets increasingly more sophisticated would be it for me.
Not just for jerking off, I mean for roleplaying like dungeon and dragons type of shit.
That would be revolutionary.

Anonymous 10/25/2025, 7:38:17 AM No.107001292 [Report] >>107001429

>>107001235
I'm working on a frontend like that but only pregenerated images to keep it realtime and not look like shit

Anonymous 10/25/2025, 7:48:57 AM No.107001336 [Report]

>>107001079
Please have one of your characters get hit by a truck
and transmigrated from one of the scenarios you are running to a different one that's already in progress.

Anonymous 10/25/2025, 7:48:57 AM No.107001337 [Report] >>107001376

We haven't reached AGI until I can smell the character I'm talking to.

Anonymous 10/25/2025, 7:54:20 AM No.107001358 [Report]

AIIEEEEE STOP MAKING YOUR OWN FRONTENDS JUST USE SERVICETESNOR
IT'S LITERALLY RIGHT THERE JUST USE IT

Anonymous 10/25/2025, 7:55:58 AM No.107001369 [Report]

>>{{char}} asshole contains an intoxicating musk odour that is always mentioned when her ass is present, or being used in a sexual manner, detail the smell

Anonymous 10/25/2025, 7:56:39 AM No.107001376 [Report]

7888765.gif md5: 02e0a77b...

>>107001337
>Want to chat with Miss Piggy
>Be into brap-play
>She hits you with a saucy smelly line
>You can literally get a whiff of her from the conversation alone
>She smells like she had chili for breakfast, lunch, and dinner.

Anonymous 10/25/2025, 7:56:49 AM No.107001377 [Report] >>107001396 >>107002039 >>107002161

what do "adventure" roleplayers even do? a dragon comes up

*he kills the dragon*

how low IQ do you have to be to enjoy this shit?

Anonymous 10/25/2025, 7:59:14 AM No.107001396 [Report]

>>107001377
There's more to it than that, obviously.
Good roleplay would be the chatbot keeping track of your stats, your choices, your karma, your equipment, your map, your destination and previous locations, all of that shit a Game Master would normally handle for you.
And if you're not a retard, you'd respond with reasonable and in-line actions to your background and take everything else into context as well.
I think DnD roleplay is somewhat harder to do right now cause of the context capacity. But that's increasing over time so we'll get there eventually, I think.

Anonymous 10/25/2025, 8:05:40 AM No.107001429 [Report] >>107001489

>>107001292
why bother if it can never be more than a wrapper? the game engine seems like a step in the right direction since all the big game engines are such resource hogs

Anonymous 10/25/2025, 8:10:36 AM No.107001461 [Report] >>107001466 >>107003168

Any Ling-T1 users on? Curious how it's different from K2 0905

Anonymous 10/25/2025, 8:12:00 AM No.107001466 [Report]

>>107001461
they both suck. use mixtral 8x7B instead

Anonymous 10/25/2025, 8:15:07 AM No.107001489 [Report] >>107001512

>>107001429
Not sure what you mean, it is a "game engine" in that it keeps a world state and does tool calling and all that stuff. Traditional game engines are fine for cloud AI stuff but for local they would just be competing for resources with the model, and I don't want to compromise on that

Anonymous 10/25/2025, 8:19:02 AM No.107001512 [Report] >>107001577

>>107001489
are you retarded? what does saving tiny states have to do with competing resources? are you high?

Anonymous 10/25/2025, 8:33:38 AM No.107001577 [Report]

>>107001512
Not sure what the problem is, I'm was saying that traditional game engines (unreal, unity) would compete with resources but a light 2d engine shouldn't just be considered a "wrapper" because it still keeps state and manages world logic

Anonymous 10/25/2025, 8:39:50 AM No.107001605 [Report]

align-the-waifu.jpg md5: 2a36fcfb...

>>107000710

Anonymous 10/25/2025, 9:48:38 AM No.107001881 [Report] >>107002484 >>107003747

file.jpg md5: 7c3dadc9...

New grok waifu dropped
https://x.com/elonmusk/status/1981911930747953189
https://x.com/tetsuoai/status/1981916179964027241

Anonymous 10/25/2025, 10:23:21 AM No.107002039 [Report]

>>107001377
Instead of killing the dragon in one sentence you should be fucking the dragon for 10 paragraphs while the princess watches.

Anonymous 10/25/2025, 10:46:08 AM No.107002144 [Report]

>there isn't any reason why this"general" actually exists except jannies leniency

Anonymous 10/25/2025, 10:50:04 AM No.107002161 [Report] >>107002189

>>107001377
>american teenager: the thread

Anonymous 10/25/2025, 10:54:03 AM No.107002189 [Report] >>107002247 >>107002258 >>107002277

>>107002161
yeah nothing says maturity like pretending to kill dragons in a sillytavern roleplay

Anonymous 10/25/2025, 11:04:13 AM No.107002247 [Report] >>107002258 >>107002264

>>107002189
Nothing says NIGGER like a lack of imagination

Anonymous 10/25/2025, 11:05:23 AM No.107002258 [Report]

>>107002189
>>107002247
American nigger roleplay wins it all. 4chan is the best example of this behaviour.

Anonymous 10/25/2025, 11:06:03 AM No.107002264 [Report]

>>107002247
NIGGER???????????

Anonymous 10/25/2025, 11:06:24 AM No.107002268 [Report]

ts better to run LLMs locally (faster response time and nothing leaves your machine to say Discord trannies, Chicoms and Jeet scammers to sell your usage data, you could build a computer that mainly uses CPUs to run it for AI purposes on the low-end rather than focusing on GPU powered LLMs for text generation.

Anonymous 10/25/2025, 11:07:00 AM No.107002276 [Report] >>107002471

https://github.com/ggml-org/llama.cpp/pull/16634#issuecomment-3445563655
>140% pp512 gain
applebabs we eating good

Anonymous 10/25/2025, 11:07:09 AM No.107002277 [Report]

>>107002189
>maturity
Bet you think mesugaki slop is the pinnacle of modern writing and creativity. /s

Anonymous 10/25/2025, 11:07:53 AM No.107002283 [Report] >>107002610

my "list of what the retarded llm should be instructed not to do prepended to all prompts.txt" keeps growing and maybe someday I'll have a .txt as big as the claude system prompt
today I just added "Never write polyfills in the context of JavaScript" after the one more time that was too many where it just decided my lack of polyfills was a bug that needed to be fixed even though it was not prompted in any way to do that
using LLMs feels like meeting a script kiddie from 10 years ago who learned how to program from the old w3c shcools and you constantly find new things to tell them not to do or features they aren't aware of until they're told they exist
by default, if not instructed to use the most modern facilities available in (insert latest node version) they constantly manually wrap shit in Promises too
like, bruh, we have async await and most libs have async variants jesus
even the SOTA models like GPT-5 and Gemini do this kind of retarded shit constantly

Anonymous 10/25/2025, 11:10:54 AM No.107002307 [Report] >>107002323

>/s

Anonymous 10/25/2025, 11:13:08 AM No.107002323 [Report]

>>107002307
Just in case you don't understand sarcasm =)

Anonymous 10/25/2025, 11:20:38 AM No.107002356 [Report] >>107002795 >>107003167

This thread should not exist.

Anonymous 10/25/2025, 11:21:48 AM No.107002366 [Report]

Minimax m2 is dogshit, not to mention giga cucked.
Don't know why I even tried it when it was just pushed by shills with memebenchs.

Anonymous 10/25/2025, 11:41:42 AM No.107002467 [Report] >>107002795 >>107002998

dragon pussy

Anonymous 10/25/2025, 11:43:27 AM No.107002471 [Report]

>>107002276
The 1024gb M5 Ultra Mac Studio will be crazy for AI. Literally what we've been waiting for.

Anonymous 10/25/2025, 11:47:13 AM No.107002484 [Report]

>>107001881
the voice still sucks

Anonymous 10/25/2025, 12:06:31 PM No.107002585 [Report]

>>106996812
>70W
noice
>GDDR6 / 128bit bus / 224gb/s
gah
>400~ euros
meh, I mean I guess it's good if you don't have a server with 8/12 channels
Still, 16GB is a bit too low. Now if this was let's say 32GB for 700~ then yeah, I'd probably get one for a consumer board PC to do inference stuff.

Anonymous 10/25/2025, 12:12:04 PM No.107002610 [Report] >>107002783 >>107002909

>>107002283
it's funnier when I asked last week my junior to write a function to extend the attachment parses to also include images (which need async logic to do) and he came back to me with a Promise.all monstrosity (along with a useless bunch of if/else checks), I told him that it's 2025 and promises are 100% verboten in this project. He fixed it later, but I suspect this guy is just generating straight from claude and pasting whatever shit it gives him, test if it works and then makes a PR.

Anonymous 10/25/2025, 12:47:56 PM No.107002783 [Report]

>>107002610
>harshing the vibe-coding

Anonymous 10/25/2025, 12:50:01 PM No.107002795 [Report] >>107002965

2025-02-04-141509.png md5: 59808b13...

>>107002356
>>107002467
The duality of man

Anonymous 10/25/2025, 1:11:19 PM No.107002909 [Report] >>107002936

>>107002610
even when there are moments you'd want to reach for something like Promise.all, Promise.all is never the answer
if you have a large array of concurrent tasks to execute in parallel, you want your executeThisShit() function to have at least a parameter to set a hard concurrency limit so that a large array of tasks doesn't suddenly fire trillions of I/O or API calls..
Promise.all is a bad API designed by mongoloids

Anonymous 10/25/2025, 1:15:18 PM No.107002936 [Report] >>107003279

>>107002909
JS of any flavour is always has been and shall forever be AIDS

Anonymous 10/25/2025, 1:20:25 PM No.107002965 [Report]

e5d92e9e-51ef-4fe7-a317-f4e262e9e8d2.png md5: 6fc6e6df...

>>107002795
Lol saved

Anonymous 10/25/2025, 1:25:37 PM No.107002988 [Report]

>>107001079
>>107001107
Post it. I'd love to see any stat tracker, multi context frontend that actually works.

Anonymous 10/25/2025, 1:26:46 PM No.107002996 [Report] >>107003020 >>107003035 >>107003135

lmao. idk how long this has been a thing, youtube channel authors now have a "video ideas" section where you have a list of ai generated video titles and previews trying to fit your channel's topic. you can then expand each one and it gives you the most generic, sloppiest plan for the video with bullet lists and multiple "it's not x- it's y." I hope this doesn't catch on

Anonymous 10/25/2025, 1:27:01 PM No.107002998 [Report]

>>107002467
I'd like some. Thank you.

Anonymous 10/25/2025, 1:27:58 PM No.107003003 [Report]

The reddit mod created lmg because he couldn't dominate aicg. As a concept this is dead.

Anonymous 10/25/2025, 1:30:18 PM No.107003020 [Report]

>>107002996
It will catch on. My national news site has had "AI created summary" for year by now but it's actually faster to cursively read the articles because they are news stories anyway and not fucking novels.
Idiocracy is here to stay.

Anonymous 10/25/2025, 1:32:53 PM No.107003035 [Report]

>>107002996
>Here's what we'll replace you with

Anonymous 10/25/2025, 1:49:25 PM No.107003135 [Report] >>107003157

>>107002996
That's already a thing. Watched a video 9 months ago describing a workflow that started with "make some high traffic videis" and proceeded to research, plan, then puke out dozens of slop videos for TikTok using llm and video gen tools.
Dead internet etc.

Anonymous 10/25/2025, 1:52:46 PM No.107003157 [Report] >>107003199

>>107003135
Oh yeah the sloppening is in full swing, only isn't always clear how far down the ride we are

Anonymous 10/25/2025, 1:54:06 PM No.107003167 [Report]

eb656bb8-16d0-44c4-a5f7-ea7dcdcb7201.png md5: 63cfca36...

>>107002356
> dipsy laughs in the shadows

Anonymous 10/25/2025, 1:54:13 PM No.107003168 [Report]

>>107001461
So nobody has tried this fat thing? I'll take one for the team and test both ling and ring out. I don't expect much.

Anonymous 10/25/2025, 1:58:33 PM No.107003199 [Report] >>107003262

>>107003157
There are redd*t threads full of "how do I get my parents to stop believing fake videos on fb" already. I'd give it another year or so for the fb meltdown.
Problem for them is, the vids keep getting better.
Which is great bc im eagerly awaiting full real time video and audio rp.

Anonymous 10/25/2025, 2:07:33 PM No.107003243 [Report] >>107003265

Screenshot_20251025-140603.png md5: 4bc23132...

Anyone else today or just me?

Anonymous 10/25/2025, 2:09:48 PM No.107003262 [Report] >>107003298

1744035844768199.png md5: 0c04173c...

>>107003199
Yes, they're here on /g/ too.

Anonymous 10/25/2025, 2:10:01 PM No.107003265 [Report]

>>107003243
Works on my machine with ServiceTesnor™ and ik_llama®-server, so the problem is on your side.

Anonymous 10/25/2025, 2:10:12 PM No.107003267 [Report] >>107003289 >>107003580

so I was curious if GLM 4.6 really fixed its repetition issue and tried it on their official chat so no one can come and tell me I'm running the wrong quants, the wrong settings or whatever
>Actually, I think there's still an issue with the return type. Let me fix it using function overloads:
>Actually, I think I'm overcomplicating this. Let me simplify the implementation and make it more robust:
>Actually, I think there's still an issue with the return type. Let me fix it once more:
>I think I'm overthinking this. Let me simplify the implementation:
>Actually, I think there's still an issue with the return type. Let me fix it once more:
>Actually, I think I'm overcomplicating this. Let me simplify the implementation and make it more robust:
etc etc etc it went on and on and on for 20k tokens and was still going, pasting the function it genned in its thought right after one of those lines and do it again and again and again and again and again and again and again and again
I will never, ever believe people who say GLM isn't broken again
bullshit
that lab doesn't know how to make models at all
this thread is filled with chink shills

Anonymous 10/25/2025, 2:12:47 PM No.107003279 [Report] >>107003290 >>107003468

>>107002936
Nah. man. Modern JavaScript is great. Once you add TypeScript and a build step with 3000 dependencies that take up 3 GB in node_models and 25 MB when minified, it's almost as good as any other language. In fact, it's so great it should be used everywhere including backend, mobile, and desktop.

Anonymous 10/25/2025, 2:14:52 PM No.107003289 [Report] >>107003344

>>107003267
Damn, it's rare to see somebody with a skill issue so big he can't even use the chat.

Anonymous 10/25/2025, 2:15:12 PM No.107003290 [Report]

>>107003279
Shut up microsoft.

Anonymous 10/25/2025, 2:17:03 PM No.107003298 [Report] >>107003307

>>107003262
The /g/ catalog is full of tourist consumers so that doesn't surprise me.

Anonymous 10/25/2025, 2:18:05 PM No.107003307 [Report] >>107003322

>>107003298
Modern 4chan is like 90%+ reddit crossposters

Anonymous 10/25/2025, 2:21:37 PM No.107003322 [Report] >>107003353 >>107003406

>>107003307
Have you tried telling them to go back?

Anonymous 10/25/2025, 2:25:12 PM No.107003344 [Report] >>107003388

>>107003289
>skill issue
>on something that has never happened to me with literally any other LLM: DeepSeek, the various Qwen in their various parameter sizes, Gemma, GPT-OSS, or the online API models GPT-5, Gemini 2.5 Pro and so on
I am sure it's definitely a skill issue with me, you are right... chink shill.

Anonymous 10/25/2025, 2:25:49 PM No.107003353 [Report]

>>107003322
If they go back then I don't get fun reactions when I post naughty things that make the jannies cry

Anonymous 10/25/2025, 2:31:29 PM No.107003388 [Report]

>>107003344
Do we know which quant the chat runs? Paste full prompt somewhere if you want some advice for overcoming the skill ish

Anonymous 10/25/2025, 2:34:37 PM No.107003406 [Report]

1751582650068466.png md5: fe4b0e0b...

>>107003322

Anonymous 10/25/2025, 2:47:02 PM No.107003468 [Report]

>>107003279
I like minimalistic oldschool js, it's reasonable fast

Anonymous 10/25/2025, 3:02:31 PM No.107003571 [Report]

>>107003557
>>107003557
>>107003557

Anonymous 10/25/2025, 3:03:01 PM No.107003580 [Report]

1735740137949939.jpg md5: 1b54bdab...

>>107003267
>quants model
>omg it's dumb

Anonymous 10/25/2025, 3:26:33 PM No.107003747 [Report]

>>107001881
Looks and sounds kinda shit still.