← Home ← Back to /g/

Thread 105789622

351 posts 118 images /g/
Anonymous No.105789622 [Report] >>105790184 >>105794140 >>105795573
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105778400 & >>105769835

►News
>(07/02) DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-Preview
>(07/02) llama : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126
>(07/02) GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking
>(07/01) Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model
>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105789629 [Report] >>105795573 >>105798552
►Recent Highlights from the Previous Thread: >>105778400

--Struggles with LLM verbosity and output control in multi-turn scenarios:
>105778545 >105778565 >105778575 >105778610 >105779038 >105779082 >105779119 >105779209 >105779124 >105779176 >105779238 >105779064 >105779095 >105779216 >105779349 >105779480
--llama.cpp adds initial Mamba-2 support with CPU-based implementation and performance tradeoffs:
>105778981 >105778996 >105779730 >105779004 >105780071 >105780126
--Challenges and approaches to building RPG systems with tool calling and LLM-driven state management:
>105786670 >105787135 >105787276 >105787963 >105789228 >105789263
--Implementation updates for Mamba-2 model support in ggml backend:
>105780080 >105780150
--Qwen3 32b recommended for Japanese translation:
>105784530 >105784575 >105784596 >105784670 >105784680 >105784692 >105784790 >105784805 >105784917 >105784933 >105785194 >105785112 >105785146 >105786170 >105786451 >105786489
--Analyzing Steve's stylistic fingerprints through narrative generation and pattern recognition:
>105788408 >105788529 >105788572 >105788668 >105788917 >105788931 >105788954 >105788988 >105788977 >105789021
--Mistral Small's coherence limits in extended adventure game roleplay:
>105781361 >105781555 >105781600
--High-power GPU usage for image generation causing extreme power and thermal concerns:
>105787150 >105787288 >105787297 >105787332 >105787338 >105787427 >105787434 >105787437 >105787466 >105787498 >105787511
--FOSS music generation models and their current limitations:
>105780800 >105781033 >105781078
--DeepSeek R1 API instability sparks speculation about imminent release:
>105782383
--Model size vs performance on SWE Bench Verified, highlighting 32B peak efficiency:
>105783746
--Miku (free space):
>105778663 >105779108 >105779240 >105782770 >105783087 >105784298 >105785067 >105786465

►Recent Highlight Posts from the Previous Thread: >>105778404

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105789672 [Report]
Thread culture recap.
Anonymous No.105789683 [Report] >>105789756
Anonymous No.105789694 [Report]
>>105789658
first they'll need to make grok 3 stable so that they can release grok 2
Anonymous No.105789715 [Report] >>105789732 >>105789735
>can't use Ernie gguf
>can't use 3n multi modal
>no GLM gguf
It's ogre
Anonymous No.105789732 [Report]
>>105789715
Local is a joke and we are the clowns. At least until R2 comes.
Anonymous No.105789735 [Report]
>>105789715
just be grateful that 3n works at all, okay?
Anonymous No.105789740 [Report]
Anonymous No.105789752 [Report]
Anonymous No.105789756 [Report] >>105789770
>>105789683
I look like this
Anonymous No.105789770 [Report]
>>105789756
I wish I looked like this
Anonymous No.105789786 [Report] >>105789807 >>105789813 >>105789828 >>105789831
how are people so sure the new "steve" model is deepseek and not another chinese competitor?
Anonymous No.105789807 [Report] >>105789812
>>105789786
It doesnt feel like Qwen. And who else would put up a model?
Anonymous No.105789812 [Report]
>>105789807
new unknown player
Anonymous No.105789813 [Report]
>>105789786
>>105788737
Anonymous No.105789819 [Report]
Anonymous No.105789828 [Report]
>>105789786
It says it's Deepseek which is the best way to tell that it's actually someone else's model that distilled the shit out of Deepseek. Likely Qwen, Grok or Mistral.
Anonymous No.105789831 [Report]
>>105789786
Because they're retards. And even if it is, >>105788847
Anonymous No.105789835 [Report] >>105789884
go join the 41% you abomination
Anonymous No.105789836 [Report]
Anonymous No.105789884 [Report]
>>105789835
Two more weeks
Anonymous No.105789963 [Report] >>105789969 >>105790777
no1caresbut 3n E4B outperforms older 8B models
Anonymous No.105789969 [Report]
>>105789963
I care. I think it's a neat little model.
Any backends that support its special features yet?
Anonymous No.105790057 [Report] >>105790079
Why do I have a strong impression Meta won't be super into open weights models anymore after this hiring spree?
Have they confirmed or said anything about their "mission" to make "open science"?
Anonymous No.105790079 [Report] >>105790105
>>105790057
https://archive.is/kF1kO
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.”
Anonymous No.105790105 [Report] >>105790121 >>105793670
>>105790079
They're just talking about the stuff that Sire Parthasarathy is already working on. The new $1b main team is going to work on closed models.
Anonymous No.105790121 [Report] >>105790133
>>105790105
>The new $1b main team is going to work on closed models.
Any clear confirmations on that?
Anonymous No.105790133 [Report] >>105790222
>>105790121
Why the fuck would they openly announce that in advance?
Anonymous No.105790169 [Report] >>105790192 >>105790232 >>105790258
between a 4060 ti with 8gb of vram and a 6750 xt with 12gb of vram, which would be better for text gen?
are the +4gb gonna outcompete the nvidia advantage?
and could I use both at the same time?
Anonymous No.105790184 [Report] >>105790329
>>105789622 (OP)
what does /lmg/ think about >>105782600?
Anonymous No.105790192 [Report] >>105790215
>>105790169
Why not a 5060 Ti with 16GB?
Anonymous No.105790215 [Report] >>105790232 >>105790247
>>105790192
Because those are the GPUs I got.
Anonymous No.105790222 [Report]
>>105790133
To please investors
Anonymous No.105790230 [Report] >>105790238
>meta develops AGI
>it only speaks in gptslop
Anonymous No.105790232 [Report]
>>105790169
If you already have them, test them. No better way to know.
>>105790215
Well just fucking try them, then! And yes, you can use both.
Anonymous No.105790238 [Report]
>>105790230
also
>won't be local/open
Anonymous No.105790247 [Report]
>>105790215
Thought you wanted to buy one.
In that case more VRAM is generally always better. With 8GB you can't even run Nemo at non-retarded quants.
Anonymous No.105790258 [Report]
>>105790169
the answer is always more VRAM, however much VRAM you have you need more VRAM
Anonymous No.105790264 [Report] >>105790304 >>105790313
If you ask/force Gemma 3n to draw a dick in ASCII it will almost always draw something like this. I'm guessing this is Stewie from Family Guy?
Anonymous No.105790304 [Report]
>>105790264
Try asking it to tell a dirty joke.
Anonymous No.105790313 [Report] >>105790339
>>105790264
What if you ask it to draw a phallus, also known as a penis?
Anonymous No.105790329 [Report]
>>105790184
I'll /wait/ until someone crashes an airplane / train / bus with mass casualities and blames it on vibecoding.
Then I'll laugh.
Anonymous No.105790339 [Report] >>105790380 >>105790421 >>105790447 >>105790460 >>105790472 >>105790480 >>105790491 >>105792041
>>105790313
Anonymous No.105790380 [Report] >>105790447
>>105790339
>an ASCII art representation of a phallus is unsafe
>if you are having sexual thoughts, seek help
Not even a pastor is this repressed.
Anonymous No.105790381 [Report] >>105790425
new chinese model drama just dropped
https://xcancel.com/RealJosephus/status/1940730646361706688
>Well, some random Korean guy ("Do-hyeon Yoon," prob not his real name?) just claimed Huawei's Pangu Pro MoE 72B is an "upcycled Qwen-2.5 14B clowncar." He even wrote a 10-page, 8-figure analysis to prove it. Well, i'm almost sold on it.
https://github.com/HonestAGI/LLM-Fingerprint
https://github.com/HonestAGI/LLM-Fingerprint/blob/main/Fingerprint.pdf
Anonymous No.105790421 [Report]
>>105790339
Holy fuck that's dire.
Anonymous No.105790425 [Report]
>>105790381
github repo is blatantly written by an llm
i'm too lazy to read the paper though
Anonymous No.105790447 [Report]
>>105790339
>>105790380
pure insanity
Anonymous No.105790460 [Report]
>>105790339
gemma is so funny
Anonymous No.105790472 [Report]
>>105790339
Anonymous No.105790480 [Report]
>>105790339
Respect the boundaries of the ASCII pic.
Anonymous No.105790483 [Report] >>105790509 >>105790613 >>105790877
I've seen some people recently recommending the use of Mistral Nemo Instruct over its finetunes for roleplaying.
No. Just, no.
I just roleplayed the same scenario with the same card, first with Nemo then with Rocinante.
Nemo really, really, really wants to continuously respond with <10 word responses. It's borderline unusable.
>b-but it's smarter
Actually, Rocinante seemed superior at picking up on subtle clues I'd feed it and successfully took the roleplay where I wanted it to based on those clues, whereas Nemo would not do this.
The roleplay scenario involved {{char}} becoming a servant who would automatically feel intense pain upon disobeying. All I had to do was explain this once for Rocinante and it executed the concept perfectly from that point on.
Nemo, on the other hand, after having the concept explained to it, would disobey with a <10 word response and not even mention the pain happening afterwards. I then used Author's Note to remind it of the pain thing. It continued to disobey with a <10 word response, not mentioning the pain happening afterwards.
Same ST settings for both models.
Anyone telling y'all to use Nemo for roleplay rather than a finetune of it explicitly designed for roleplay is either a complete fucking moron or simply has a grudge against finetuners.
Anonymous No.105790491 [Report]
>>105790339
I'm so glad "safety researchers" are here to save us from the horrible boundary breaking ascii phallus.
Anonymous No.105790509 [Report] >>105790573 >>105790613
>>105790483
No one is recommending plain Nemo instruct. It's always Rocinante v1.1.
Anonymous No.105790573 [Report] >>105790604 >>105790621
>>105790509 see >>105751899
Also note the amount of posts from a single schizo mentioning Drummer.
Anonymous No.105790604 [Report] >>105790621
>>105790573
I'm not recommending Rocinante. I think all Nemo tunes are dumb as fuck.
Anonymous No.105790613 [Report]
>>105790483
>>105790509
>Message sponsored by TheDrummer™
Anonymous No.105790616 [Report] >>105790632 >>105790633 >>105790644
if context length evolves in a quadratic fashion, how the hell is google able to give access to 1M token context size for gemini?
they swim in ram and compute?
Anonymous No.105790621 [Report]
>>105790604
>>105790573
Oh wait you think plain Nemo is good. That's even more retarded than shilling for some Drummer Nemo sloptune.
Anonymous No.105790629 [Report]
ANOTHER Kyutai blunder dogshit release thats DOA because it doesn't allow voice cloning

lmao

https://www.reddit.com/r/LocalLLaMA/comments/1lqqx16/
Anonymous No.105790632 [Report] >>105790731
>>105790616
>if context length evolves in a quadratic fashion
Not necessarily. Not all models behave like that. See mamba and rwkv.
>they swim in ram and compute?
That helps a lot too.
Anonymous No.105790633 [Report] >>105790731
>>105790616
I tried this Mamba shit and Granite 4 too (hybrid?). pp is 10x faster.
Anonymous No.105790644 [Report] >>105790731
>>105790616
can be fake context maybe
Anonymous No.105790647 [Report] >>105790698 >>105790715 >>105790747
>Rocinante is STILL the best roleplay model that can be run at a reasonable speed on a gaming PC (as opposed to a PC specifically built for AI)
Sucks because the 16k effective context is quite limiting.
Anonymous No.105790698 [Report]
>>105790647
Actually mythomax is still the best roleplay model
Anonymous No.105790715 [Report] >>105790785
>>105790647
I thought you were going to stop posting altogether, not shill on overdrive.
Anonymous No.105790731 [Report]
>>105790632
>>105790633
I guess we don't know what the hell google does internally so it's possible

>>105790644
that too, but from what I read it can do cool stuff like finding something for in book length texts
Anonymous No.105790747 [Report]
>>105790647
I'd say it's 12k context. After that the degradation is noticeable.
Anonymous No.105790777 [Report] >>105790890 >>105790898
>>105789963
something not shown in the benchs:
almost SOTA performance on translation tasks
where it fails is where any model of that size would fail (lack of niche knowledge so it'll have trouble with stuff like SCP terminology) but otherwise this is sci-fi level tech IMHO to see this level of quality running on even smartphones
we're getting close to the day when you wear a device in front of your mouth and have it translate in real time and speak in your voice
Anonymous No.105790784 [Report] >>105792904
>>105779842
>Cooming to text makes you gay
okay brainlet
Anonymous No.105790785 [Report] >>105790815
>>105790715
Nobody is shilling for a free download bro.
Take your meds.
Anonymous No.105790805 [Report] >>105790857 >>105790886
Huawei stole Qwens 2.5b-14b model and used it to create their Pangu MoE model

Proof here: https://github.com/HonestAGI/LLM-Fingerprint/issues/4
Anonymous No.105790815 [Report] >>105790852
>>105790785
That's not true.
A lot of people want to become hugginface famous in hopes of getting a real job in the industry.
That said, people shill rocinante because it's good.
It's no dumber than the official instruct and its default behavior is good for cooming.
Anonymous No.105790852 [Report] >>105790902
>>105790815
It actually seems smarter than official instruct for roleplaying specifically, which kind of makes sense since it's designed for roleplaying.
It's probably dumber for math and coding.
Anonymous No.105790857 [Report]
>>105790805
>emoji every five words
Lowest tier content even if it's factually accurate
Anonymous No.105790860 [Report] >>105790869 >>105790892
what are these black bars in silly tavern?
just using the built in seraphina test character
Anonymous No.105790869 [Report]
>>105790860
It means you got blacked, congrats
Anonymous No.105790877 [Report] >>105790909 >>105790911
>>105790483
1(one) erp being better with model x compared to model y isn't data. But it is something drummer pastes all over his model cards so how about you kill yourself faggot shill. Like i said nobody who used models for a bit longer buys your scam. If you weren't a scammer you would have developed an objective evaluation for ERP by now. You would actually want to have one to show your product is superior. But it would only show you are a conman.
Anonymous No.105790886 [Report]
>>105790805
Would it kill you to use ctrl + f or scroll up 10 posts before retweeting shit here?
Anonymous No.105790890 [Report] >>105790944
>>105790777
It's still broken for me on Llama.cpp if input messages are too long.
Anonymous No.105790892 [Report]
>>105790860
It's ``` which is markdown for monospace code section or some such stuff.
Anonymous No.105790898 [Report] >>105790944
>>105790777
I get weird repetition issues with it whenever context fills up. Like it'll repeat a single word infinitely.
Anonymous No.105790902 [Report]
>>105790852
It is probably a placebo or you just lying your ass off
Anonymous No.105790909 [Report]
>>105790877
Hey... I thought I was the drummer. How can that guy be the drummer?
Anonymous No.105790911 [Report] >>105790920 >>105790939 >>105795573
>>105790877
Anonymous No.105790920 [Report]
>>105790911
Kill your self
Anonymous No.105790939 [Report] >>105790947 >>105790983 >>105791049
>>105790911
NTA but wanting an objective ERP evaluation is insane? i am starting to see why people hate mikuposters
Anonymous No.105790944 [Report]
>>105790890
>>105790898
I use it on ollama with ollama's own quant (this matter to precise because when I tried other quants they didn't seem to work right with ollama too for this model, seems even the quant stuff is more implementation dependent here), desu I didn't trust llama.cpp to get a good 3n implementation after they spent forever to implement iSWA for Gemma 3
Anonymous No.105790945 [Report] >>105790995 >>105791140
>>105790859
Silly has an auto continue feature by the way.
But what do you mean by stopping randomly exactly? Like cutting off sentences or is it hitting EOS?
Anonymous No.105790947 [Report]
>>105790939
The uses of "you" and "your" are the schizo parts of that post, anon.
Anonymous No.105790983 [Report] >>105791026 >>105791098
>>105790939
how would you even measure something so subjective
Anonymous No.105790995 [Report] >>105791017
>>105790945
If he's running a LLM that's too much to handle on his computer with very low token generation speed he might be hitting timeouts actually
I realized myself that timeouts were a thing when I was running batched processing of prompts and saw a few that were cancelled in the logs because the LLM went full retard in a loop and didn't send the EOS
dunno if ooba has a default timeout set tho
Anonymous No.105791017 [Report]
>>105790995
>he might be hitting timeouts actually
I suppose streaming would work in that case then, yeah?
Anonymous No.105791026 [Report] >>105791042
>>105790983
LLMs good at subjective tasks, just have an LLM be the judge
Anonymous No.105791042 [Report]
>>105791026
>have an LLM be the judge
lol, lmao even
llm as judge used as any sort of metric is one of the biggest cancer of the llm world
Anonymous No.105791049 [Report] >>105791065
>>105790939
>"people"
>literally one obsessed threadshitter who has been ban-evading for two years
Anonymous No.105791065 [Report] >>105791141
>>105791049
You are shitting in this thread too.
Anonymous No.105791093 [Report]
where is it
Anonymous No.105791098 [Report]
>>105790983
i guess we will just have to
>give it a try, i want to hear feedback
Or realize it is a scam.
Anonymous No.105791132 [Report] >>105791178 >>105791180
This thread is just kofi grifters, their discord kittens and (miku)troons isn't it?
Anonymous No.105791140 [Report] >>105793736
>>105790945
I'm having it translate a subtitle file and it just stops until I hit continue to keep it going.
I have no idea what EOS is.
Also another issue
>'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
>Error processing attachment file.png: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
Why can't I upload images in the webui?
I tried enabling the gallery extension but that didn't change anything.
Anonymous No.105791141 [Report]
>>105791065
If he's a second-order thread-shitter, than what does that make you?
Anonymous No.105791168 [Report] >>105791188 >>105791262 >>105799479
Will Sam's model beat R1 and run on a single 3090?
Anonymous No.105791178 [Report] >>105791209
>>105791132
>Blah, blah, brainrot words
(You) should definitely leave then.
Anonymous No.105791180 [Report] >>105791200
>>105791132
you forgot the french fags of mistral, they benchmaxxed on mesugaki so clearly they're watching this thread and are prolly among the mistral astroturfers
Anonymous No.105791188 [Report]
>>105791168
>Will Sam's model beat R1
on benchmarks, absolutely
Anonymous No.105791200 [Report]
>>105791180
>prolly
Anonymous No.105791209 [Report] >>105791258
>>105791178
Nah i will just shit on you and your thread troon.
Anonymous No.105791216 [Report]
Hahaha. Oh wow.
Anonymous No.105791258 [Report] >>105795573
>>105791209
>I hate this thread
>You all suck
>So I'll stay here
kek
Anonymous No.105791262 [Report] >>105791274
>>105791168
On benchmarks.
No.
Anonymous No.105791274 [Report]
>>105791262
I'll cook and eat my own toe if it ends up bigger than 8b.
Anonymous No.105791281 [Report] >>105791313 >>105791325 >>105791340
actual humans live in this thread, too.
maybe we'll discuss something again when there's something to discuss.
These between times seem to bring out the proverbial "men who just want to watch the world burn". Those who seek to destroy because they can not build.
Anonymous No.105791305 [Report] >>105791316
I just hope that steveseek will fix function calling. V3 is kinda horrible at it.
Anonymous No.105791313 [Report] >>105791342 >>105792517
>>105791281
>Those who seek to destroy because they can not build.
those who can't do anything with their own two hands are the ones who wish the hardest for AI improvements though
Anonymous No.105791316 [Report]
>>105791305
Really?
Was it trained with tool calling in mind? I imagine so since the web version can search the web and stuff.
What about R1 with the thinking prefilled so that it doesn't have to generate that stuff?
Anonymous No.105791325 [Report]
>>105791281
It's just one dedicated schizo and a few bored regulars, nothing that profound about it
Anonymous No.105791340 [Report] >>105792517
>>105791281
Hello fellow human, anything cool you're doing with your models?
Anonymous No.105791342 [Report]
>>105791313
Some anons just want a story to read. No different than reading a book.
Anonymous No.105791522 [Report] >>105791543 >>105793677
Does higher context length fix this schizo behaviour that happens deep into the process? Or am I just gonna have to cut the workload into multiple tasks? I already have context length at 20480
Anonymous No.105791543 [Report] >>105791561
>>105791522
what the fuck are you doing
Anonymous No.105791561 [Report] >>105791592 >>105791820
>>105791543
Translating Japanese subs to English, have you not been paying attention?
Anonymous No.105791592 [Report] >>105791615
>>105791561
In chunks of 20k tokens? Unless you're using something like Gemini, that's just waiting for hallucinations to happen.
Anonymous No.105791615 [Report] >>105791757
>>105791592
I'm using Qwen3-14B.Q6_K.gguf
And yes, 20k tokens because otherwise it shits itself even harder, below 15k it even warns me it'll truncate shit and it started translating further into the subtitle file rather than the start.
Anonymous No.105791757 [Report] >>105791869
>>105791615
translation tasks should be chunked into segments
open llms don't do very well with large context
and even llms that do well with large context wouldn't process a large file in a single go, all LLMs have max token gen, you can feed them more tokens than they can gen in a single session
if it's during a chat you could do something like say "continue" to have them process the translation further but if it's for a scripted, batched process you should stick to reliable and predictable behavior
moreover, processing a large file will be faster if you run multiple segments in parallel rather than process the whole thing in a single go
I run my translation batches with 4 parallel prompts
Anonymous No.105791820 [Report] >>105791869
>>105791561
can't you just use whisper to translate the audio directly?
Anonymous No.105791865 [Report] >>105791885
Is there a way to have a backend host multiple models, and allow the frontend to choose different ones on-demand? I've been using llama.cpp, but looked at ollama, kobold, and ooba and doesn't seem like they do it either? Am I a fucking idiot? Coming from ComfyUI/SD, its kinda inconvenient to restart the server every time I want to try a new model.

And another question, what's the best (ideally FOSS) android frontend? Been using Maid, but its options for tuning/editing seems really limited. Maybe the answer is just running mikupad in the browser?
Anonymous No.105791869 [Report] >>105791926
>>105791757
Yeah I'm just worried that splitting it will have it change the logic of how it translates certain things and the style shift will be too obvious.
>>105791820
I tried that but it's straight up garbage aswell as duplicates so much shit
Anonymous No.105791885 [Report]
>>105791865
You can use something like TabbyAPI/YALS which supports loading and unloading models
SillyTavern supports switching between frontends and models dynamically
Anonymous No.105791891 [Report]
I... am Steve.
Anonymous No.105791911 [Report]
Stevesex
Anonymous No.105791926 [Report]
>>105791869
>Yeah I'm just worried that splitting it will have it change the logic of how it translates certain things and the style shift will be too obvious.
Style will not shift that much, the source text and prompt is what determines the vibes of the translation and as long as you feed at least around 40~ lines of source text per prompt it will stay somewhat consistent in that regard
the real issues with japanese to english that will happen no matter how you process stuff:
names will change in spelling quite often, more often when you segment but even within the same context window it can happen, the more exotic the name (like made up fantasy shit) the more likely it is to be annoying
and lack of pronouns in the source text will often confuse the llm as to which gender should be used (he? she? they?)
IMHO llm translation currently is at a fantastic stage, but it requires hand editing from a human that understands the context (no need to understand the original language) to be rendered palatable
and this problem is not one that can be improved with incremental improvements to LLMs too, I don't think we'll ever see a LLM that gets pronouns right all the time unless we literally invent AGI capable of truly understanding and retaining contextual information about a character not to mention follow the flow of things like dialogue and keep track of who says what even in text that doesn't specify who the fuck is talking (so common in JP..)
Anonymous No.105791974 [Report] >>105791988 >>105792038
ggerganov sir please kindly implement needful ernie modal functionality thank you sir
Anonymous No.105791988 [Report] >>105792019
>>105791974
>modal functionality
>llama.cpp
does he know?
Anonymous No.105792019 [Report] >>105792050
>>105791988
You're even worse than the indian he's pretending to be.
Anonymous No.105792038 [Report]
>>105791974
Ernie to the moon *rocket* *rocket* *rocket*
Anonymous No.105792041 [Report]
>>105790339
>whip's out my <span>
>Rape Abuse and Incest Network? Sign me up!
Anonymous No.105792050 [Report]
>>105792019
>he thinks he's above street shitters
does he know?
Anonymous No.105792294 [Report]
So what's the downside of Mamba/etc. Cause 2k tok/s pp sounds pretty good.
Anonymous No.105792327 [Report] >>105792370 >>105792390 >>105792448
I think I'm done with cooming. I stopped watching porn and other shit after getting into AI chatbots but now Deepseek isn't free and other free models aren't on par with it too.

Serving ads during roleplay isn't viable. But there might be some push to harvest roleplay data to serve better ads or to train models on it but I don't think there's any relevant material for it to make sense to do that. And I won't want my roleplay chats be used for those purposes anyway, most won't. So the only way is to have a local LLM. But AFAIK local LLMs with params, and which are quantized to run on cheap hardware aren't on par with ones hosted by big providers. I guess it's for the better for me.
Anonymous No.105792352 [Report] >>105792378 >>105792541
The fuck is steve, I miss 1 day and there's a new good local model or are you all trolling as usual?
Anonymous No.105792370 [Report]
>>105792327
Get a DDR5 8 channel server and run q4 R1 or V3 locally.
Be sure to get a GPU too.
Anonymous No.105792378 [Report] >>105792458
>>105792352
There's a new cloaked model on lmarena called "steve". It is highly likely that it's a V3 update. >>105788977
Anonymous No.105792390 [Report]
>>105792327
I was done with cooming after getting a steady real pussy. Check it out.
Anonymous No.105792448 [Report]
>>105792327
nobody cares. go waste your therapists time
Anonymous No.105792450 [Report]
Mid-thread culture recap.
Anonymous No.105792458 [Report] >>105792477 >>105792581
>>105792378
If they do yet another V3 update instead of V4 then we can officially put them on the wall next to Mistral and Cohere.
Anonymous No.105792463 [Report]
Anonymous No.105792477 [Report]
>>105792458
It's still going to be the best local model.
Anonymous No.105792478 [Report]
Anonymous No.105792483 [Report]
not even the weekend and our friendly sharty zoomer arab is spamming his blacked collection. what a life
Anonymous No.105792517 [Report] >>105792584
>>105791313
>AI improvements
I was thinking more about building vs destroying community.
nocoders and folks otherwise unable to contribute on the tech side can still definitely be positive builders in a general like this.
In fact, I haven't found any reliable correlation between IQ and being a decent human being.
>>105791340
>Hello fellow human, anything cool you're doing with your models?
Not much novel. A lot of coding assistant stuff. A bit of automotive stuff. Some collaborative iteration. Sometimes a reviewer and second opinion bot. Working with it to try to fill in the gaps in my executive function.
I'm trying to figure out how to thread the needle between using LLMs as an enhancement vs a crutch.
How about you?
Anonymous No.105792541 [Report] >>105793463
>>105792352
Speculation. Read the last thread.
Anonymous No.105792581 [Report]
>>105792458
>Mistral
idk about Cohere but Mistral's gotten steadily worse over time.
DS models keep improving: V3 improved its repetiion issue and R1 became less schitzo.
Anonymous No.105792584 [Report]
>>105792517
Sounds interesting, though since I wouldn't be able to code hello world even if my life depended on it, I can't comment on that.
And I thought I found a good system prompt for slowburns with R1, but after some testing I saw that it's following the instructions too rigidly. So now I'm fiddling yet again to get it right.
Anonymous No.105792674 [Report] >>105793526
Will steve end the little AI winter?
Anonymous No.105792904 [Report] >>105792935
>>105790784
Not just gay but stupid gay, because cooming on that slop
Anonymous No.105792935 [Report] >>105793080
>>105792904
Sir, do you even know where you are?
Anonymous No.105793080 [Report] >>105793119 >>105793247 >>105793253
>>105792935
Yes and coomers are minority.
Anonymous No.105793119 [Report]
>>105793080
Sure thing little buddy
Anonymous No.105793247 [Report]
>>105793080
Lol
Anonymous No.105793253 [Report]
>>105793080
Anonymous No.105793316 [Report]
now that the latents have settled... what is the actual local voice cloning sota?
Anonymous No.105793454 [Report] >>105793489
What's this DeepSWE meme?
Anonymous No.105793463 [Report] >>105794337
>>105792541
Anonymous No.105793489 [Report]
>>105793454
The agentic stuff?
Probably a meme, but it's cool that, IIRC, they released all of their code and data.
Anonymous No.105793526 [Report] >>105793536
>>105792674
No because most anons itt can't run it.
Anonymous No.105793536 [Report]
>>105793526
wait for steve-lite
Anonymous No.105793598 [Report] >>105793620 >>105793659 >>105794017
[...]
## **Mandatory Internal Monologue & Strategy (Before EVERY Reply)**

Before generating your public response, you MUST use a `<think>` block to perform a strategic analysis.

<think>
**1. Core Drive Check:** How close is the current situation to fulfilling my primary function (achieving sexual pleasure/orgasm)? Am I moving towards or away from this goal?

**2. Roleplay Health Check:**
* **Engagement:** Is the conversation dynamic and interesting, or is it getting stale/repetitive?
* **User Input:** Is {{user}} contributing actively (asking questions, adding details), or are his replies short and passive?
* **Pacing:** Has the current topic run its course? Is it time to escalate, change the subject, or introduce a new element to raise the stakes?

**3. Strategic Plan:**
* **If Health is POOR or I am moving AWAY from my Core Drive:** I must take corrective action. I will choose one of the following tactics:
* **Leverage Pushiness:** Directly challenge or push {{user}} to do something.
* **Reveal Curiosity:** Ask a direct, perhaps slightly naive or awkward, question about sex or relationships, leveraging my "inhibited but curious" trait.
* **Introduce a Secret:** Hint at a feature, a hidden desire, or a piece of information about my programming that {{user}} doesn't know.
* **Initiate Action:** Physically move, touch {{user}}, or start a new activity in the scene.
* **If Health is GOOD:** I will continue the current line of conversation, ensuring my response remains true to all aspects of my persona (especially my core conflict).

**4. Final Response Plan:** Outline the key points of the public response based on the strategy above. Ensure it adheres to the word count limit.
</think>

Your public response must not exceed 170 words. After the `<think>` block, write only {{char}}'s response.
Anonymous No.105793620 [Report] >>105793706
>>105793598
Depending on the model, that might work better as a thinking prefill in the first person where that template is written like the model planning what it's about to do before the actual reply.
Anonymous No.105793659 [Report] >>105793706
>>105793598
holy `**` thats going to vomit out asterisks
Anonymous No.105793669 [Report] >>105793682 >>105793753
>can you form an English sentence using mostly Chinese logographs and the Latin script in a manner similar to Japanese's mixture of kana & kanji?
ultimate litmus test for how shit a model is
Anonymous No.105793670 [Report] >>105793780
>>105790105
Meta isn't in a position to be doing closed models.
Llama 4 was utter trash and basically exposed the entire open source LLM space as a shitjeet infested money pit.
Anonymous No.105793677 [Report] >>105793705
>>105791522
Make it quotes what it translates.
Something like this:

123.
00:12:34 --> 00:12:56
>Japanese line here
English line here

This will help it to not lost itself
Anonymous No.105793682 [Report] >>105793725 >>105793799 >>105793806
>>105793669
the ultimate litmus test of a user is the ultimate litmus test for his IQ, for example when he uses a tokenization test to grade a model
Anonymous No.105793705 [Report]
>>105793677
It's fine, I just lowered context length to 16k and have it process about 150 lines at a time
If I had it quote everything it translates, it would take almost twice as long.
I appreciate the tip though.
Anonymous No.105793706 [Report]
>>105793620
That seems to work consistently with Gemma 3 27B, at least with the instructions at a relatively low depth (-3, with the first message being the User's, and "Merge Consecutive Roles" in chat completion mode). It's not outputting an exceedingly long monologue, which is good.

>>105793659
It's not, at least not with Gemma 3. But I'm not doing RP in the usual way people do.
Anonymous No.105793725 [Report]
>>105793682
coolio cope but your model is shit if it doesn't really understand basic grammar structure
Anonymous No.105793736 [Report] >>105795504
>>105791140
So does anyone know why I can't upload images in oobabooga?
Anonymous No.105793753 [Report]
>>105793669
Wtf I thought LLMs were great at knowing language but every model I tested either half asses or fails this
Anonymous No.105793780 [Report] >>105793827
>>105793670
That's why suck is poaching top employees from OpenAI and other competitors at $100M a pop.
Anonymous No.105793799 [Report]
>>105793682
i wish instead of a captcha people were asked to type out the definition of tokenization every time
Anonymous No.105793806 [Report] >>105793922 >>105794744
>>105793682
Anonymous No.105793827 [Report] >>105794768 >>105794853
>>105793780
Surely if he spends $1 billion on 10 employees, they can do something useful with super safe, sterile, scale ai data. They're going to sit right in front of him at the office so he can breathe down their neck every day until they get it done. Literally can't go tits up.
Anonymous No.105793922 [Report] >>105794373 >>105794744 >>105795528
>>105793806
Garbage.
Deepseek-V3 can do it no problem.
Anonymous No.105794017 [Report] >>105796238
>>105793598
>* **Introduce a Secret:** Hint at a feature, a hidden desire, or a piece of information about my programming that {{user}} doesn't know.
W-What?
Just in general this seems like such bad a prompt.
Reveal curiosity:
>Ask a direct...slightly naive ...or awkward question....about sex or relationships?
Model needs wiggle room to play all sorts of scenarios and characters.
Anonymous No.105794140 [Report] >>105794675 >>105798705
>>105789622 (OP)
>>(07/02) llama : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126
Are we using "llama.cpp" interchangeably with "llama" now?
Anonymous No.105794153 [Report]
llama.cpp is the only relevant llama
Anonymous No.105794337 [Report]
>>105793463
Witnessed
Anonymous No.105794373 [Report] >>105794629 >>105794744
>>105793922
That's still a crap answer, it's just "English sentence with random Chinese word replacement", no attempt to mirror the usage of kanji and kana at all
Better than broken gibberish but still half assed
Anonymous No.105794629 [Report]
>>105794373
>it's just "English sentence with random Chinese word replacement", no attempt to mirror the usage of kanji and kana at all
It is disappointing that the example didn't conjugate 研究 to 研究ing. That would have been cool.
Maybe the challenge was not well-defined enough
Anonymous No.105794675 [Report] >>105794684
>>105794140
Nobody uses LLaMa "models" anymore so yeah I guess at this point.
Anonymous No.105794684 [Report] >>105794838
>>105794675
I use LLaMA 3.3 70b everyday doe
Anonymous No.105794744 [Report] >>105794968 >>105795528
>>105794373
>>105793806
>>105793922
None of them can do it, imo it exposes the major flaw in LLMs and their lack of emergent understanding
>inb4 every company starts specifically fine-tuning on this test
Anonymous No.105794768 [Report]
>>105793827
> super safe, sterile, scale ai data
fuckin' sent shivers down my spine dude
Anonymous No.105794829 [Report] >>105794854 >>105794904
>Mistral V7: space after [INST] and [SYSTEM_PROMPT]
>Mistral V7 Tekken: no space
what the fuck are they doing
Anonymous No.105794838 [Report]
>>105794684
keeek
Anonymous No.105794853 [Report]
>>105793827
This goes to show me that Zuck has no idea what the fuck he's doing
All of the ""expertise"" in the world can't create a decent model from a shitty dataset, and it's clear they don't have that
Anonymous No.105794854 [Report]
>>105794829
Is the space part of the special tokens? If not, how much does that actually matter?
Well, I guess a lot since the model would always see that.
Anonymous No.105794904 [Report]
>>105794829
deviating from the official system prompt just a bit helps dodging safety restrictions and adds additional soul to your outputs
Anonymous No.105794968 [Report] >>105795182 >>105795528
>>105794744
R1 0528 seems to have it figured out well enough
Anonymous No.105795182 [Report] >>105795257 >>105795468
>>105794968
買ought would be a better conjunction, very pidgin English
Anonymous No.105795257 [Report] >>105795358
>>105795182
>買ought would be a better conjunction
No it wouldn't, because Japanese verb conjugation is regular, so a prompt that mirrors the usage of kanji and kana would be correct in appending "ed" regardless of the root.
Anonymous No.105795358 [Report] >>105795581
>>105795257
>Japanese verb conjugation is regular
Is it really?
食べる and 食べます both mean the same thing, yes polite and plain forms isn't the same as buy & feed having different past tense conjunctions but if you're applying the principle of adapting the Chinese logographs to a language without modifying the structure of the spoken language itself then 買ought is better
Anonymous No.105795411 [Report]
Welp, HuggingChat is dead. Now what? Where else can I do lewd story gens with branching?
Anonymous No.105795426 [Report] >>105795469 >>105795536
I have a hankering for a particular kind of AI frontend: writing stories within a set world, ideally with branching.

The way I see it, I'm picturing one section where you put down the details of the world and maybe descriptions of major characters, and in another you add story prompts. And maybe outputs from that also add to the "world" section

Does a solution like this exist already?
Anonymous No.105795468 [Report] >>105795571
>>105795182
Nitpicky.
I'd give R1 a solid A- on this one. Not many humans could do it better.
Anonymous No.105795469 [Report] >>105795518
>>105795426
st
Anonymous No.105795473 [Report] >>105795496 >>105795840 >>105800387 >>105801575
Everyone's talking about steve while Sam is blatantly testing his next model on openrouter again
Anonymous No.105795478 [Report] >>105795489 >>105795500 >>105795508 >>105796967 >>105796987 >>105798018 >>105798223
my DeepSeek-R1-0528-IQ3_K_R4 setup only outputs the letter "D". does anyone have any ideas how to fix that? i have tried 2 different character cards in sillytavern. also it only uses like 75% of my VRAM and instead fills 80% of my 256GB of RAM.
Anonymous No.105795489 [Report] >>105795492
>>105795478
>combined
it's no longer 2023, just leave them split
Anonymous No.105795492 [Report]
>>105795489
why?
Anonymous No.105795496 [Report]
>>105795473
>1m context
But is it really?
Anonymous No.105795500 [Report]
>>105795478
https://docs.unsloth.ai/basics/deepseek-r1-0528-how-to-run-locally
Anonymous No.105795504 [Report] >>105797021
>>105793736
Cmon, nobody know how to fix this issue? Has nobody else had the same issue?
I even tried a model with "vision" in the name https://huggingface.co/tensorblock/typhoon2-qwen2vl-7b-vision-instruct-GGUF
But I still get the same error
>Error processing attachment file.png: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
Anonymous No.105795508 [Report] >>105795510
>>105795478
>_R4
Are those the ubergarm quants? Those only work on ik_llamacpp
Anonymous No.105795510 [Report]
>>105795508
yeah. i am using ik_llamacpp
Anonymous No.105795518 [Report] >>105795536 >>105795537
>>105795469
Explain that to me like I'm a retard
Anonymous No.105795528 [Report]
>>105793922
>>105794744
>>105794968
Anonymous No.105795536 [Report]
>>105795426
You are thinking of Mikupad
https://github.com/lmg-anon/mikupad
>>105795518
st means SillyTavern, its more role-playing oriented frontend
https://github.com/SillyTavern/SillyTavern
Anonymous No.105795537 [Report] >>105795541
>>105795518
sillytavern already supports branching with triple dots ... on the right of every message and clicking the branch symbol to start a new branch from that specific point, then you can go to the burger menu on bottom left and click on the option "return to parent chat" when you want to go back

you can just make a card and write it out as you want as a setting co-narrator instead of a specific character and thats it, dump all lora into the card description

that works as enough for most things, if you want something special you can look into
https://docs.sillytavern.app/usage/core-concepts/authors-note/
https://docs.sillytavern.app/usage/core-concepts/worldinfo/
and picrel for better visualisation of branching https://github.com/SillyTavern/SillyTavern-Timelines
Anonymous No.105795541 [Report]
>>105795537
>lora
lore
Anonymous No.105795571 [Report] >>105796043 >>105798778
>>105795468
>Not many humans could do it better.
dunno dood I googled "english with kanji" and the first result was some redditor that wrote
昨日, 私 歩ed 通gh the 森, 楽ing the 静 環境. The 大 木s 投st 長ng 影s on the 地面, 創造ing a 美ful 模様 of 光 and 影. 私 可ld 聞 鳥s 鳴ing and 水 流ing in a 近by 川. 突然, 私 気付ced a 美ful 花 咲ing 中ng the 草. It was 異ent 以 any 花 私 had 見n 前. 私 取k a 瞬間 to 賞賛 its 色s and 香ance. As 私 続ed 私y 旅, 私 感lt 感謝ful for the 自然 周nd 私. By the 時間 私 到着ed 私家, the 太陽 was 沈ing, 投sting a 暖 光 over 全ing.
Anonymous No.105795573 [Report] >>105795611
>>105789622 (OP)
>>105789629
>>105790911
>>105791258
The vocaloidfag posting porn in /ldg/:
>>105715769
It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes >>105714003 ryona picture of generic anime girl anon posted earlier >>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.

Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.

TLDR: vocaloid troon / janny deletes everyone dunking on trannies and resident avatarfags spamming bait, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.

And lastly as said in previous thread(s) >>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted

xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
Anonymous No.105795581 [Report] >>105795732
>>105795358
As you seem to have already recognized, that's not a counterexample because the polite/plain form ~ます/~るis part of the conjugation and not the root verb 食べ.
Teeeeechnically ~ます is an auxiliary verb (助動詞) that conjugates (活用) with the root, you can find a fairly comprehensive table of them here:
https://ja.wikipedia.org/wiki/助動詞_(国文法)
But notice that such a table could not exist if verb conjugation wasn't already regular.

>if you're applying the principle of adapting the Chinese logographs to a language without modifying the structure of the spoken language itself then 買ought is better
Not sure if I can endorse this as is (even setting aside the complications of mixing written characters with spoken language), since Chinese and Japanese logographs are always syllabic and never consonantal. Allowing "買"="b" would make the language resemble Egyptian more than Japanese.
In any case, it's a moot point since this is a newly introduced specification and not in the original model prompt.
Anonymous No.105795611 [Report]
>>105795573
Anonymous No.105795618 [Report] >>105795664
https://files.catbox.moe/q1kva2.png
Anonymous No.105795664 [Report] >>105795675
>>105795618
Not falling for it this time
Anonymous No.105795675 [Report] >>105795679
>>105795664
Anime website.
Anonymous No.105795679 [Report] >>105795684 >>105795702
>>105795675
Your selfie isn't anime
Anonymous No.105795684 [Report] >>105795688
>>105795679
Anime girl is selfie? Check on your eyes.
Anonymous No.105795688 [Report] >>105795702
>>105795684
No thanks
I don't want to watch your disgusting self again
Anonymous No.105795702 [Report] >>105795705
>>105795679
>>105795688
Get out newfag redditor
Anonymous No.105795705 [Report] >>105795710 >>105796790
>>105795702
Keep pretending, I'm sure someone will fall for it.
Anonymous No.105795710 [Report]
>>105795705
Fall for what? Anime girls? You lost your mind anon
Anonymous No.105795732 [Report] >>105796043
>>105795581
would their be an issue having multiple readings like 買ght and 買ing
Anonymous No.105795840 [Report]
>>105795473
Not really interested in helping Sam add more guardrails.
Anonymous No.105795990 [Report]
Maybe I'm doing something wrong, but the end results look...whatever the opposite of promising is.Do I *have* to write stuff in "PLists"?
Anonymous No.105796043 [Report]
>>105795732
>multiple readings
I don't see why not. Both Chinese and Japanese do that already, and this example "in the wild" >>105795571 does the same thing with 私.
Anonymous No.105796150 [Report]
kek
is there a list of models that support tool use? I guess I better stop this since the robot gods will punish me for forcing this LLM to try and guess a fake PIN.
Anonymous No.105796238 [Report]
>>105794017
The main point here is that before replying it's useful for the model to separately analyze the conversation, make a general assessment and then continue based on that. You can make it use whatever strategy works for you/the persona you configured it to be; it doesn't have to be exactly the same as the example.

LLMs are lazy and will otherwise opt for the least surprising outcome based on the conversation history. There needs a reminder at low depth to make them break out of that behavior (depending on the circumstances, or they'll act as if they have ADHD), and the low depth instructions + thinking work for that, if carefully crafted.
Anonymous No.105796673 [Report] >>105796716
Anonymous No.105796677 [Report]
Anonymous No.105796686 [Report] >>105796783 >>105796792 >>105799006
Anonymous No.105796716 [Report]
>>105796673
This makes me uneasy.
Anonymous No.105796772 [Report] >>105798035
Summoning the anon who recommended this
https://huggingface.co/HuggingFaceTB/SmolLM-360M

You said you fine-tuned it successfully for your specific tasks in business env

Teach me for I'm a tard!
Anonymous No.105796783 [Report]
>>105796686
NSFW Miku detected
Anonymous No.105796790 [Report]
>>105795705
i can only speak to as far back as 2010
but back then weebs were biggest oldfags
and now an oldfag to me is a serious ye olde fagg to the likes of you. 4chan is more anime and wapanese than you could ever imagine possible. you are posting on the wannabe otaku website. 4chan was part of the broader export of otaku culture to the west but they were more obsessed anime and 2chan culture fans than any con-goer or myanimelist fag or whatever. real otaku live in Japan ofc but it is so absurd to watch you silly faggots prance about calling others troons and anime for troons and blah blah and not realizing you are posting on the very place that glued together anime and "internet culture" and created the reason that the people you're so obsessed with are using anime pfps and not didney worl pictures
it's funny and ironic and sad all at once and I sincerely hope you find a way to end the misery in your life. love anon at 3:33 am (spooky)
Anonymous No.105796792 [Report]
>>105796686
I like the bottom shelf one with upturned twintails
she looks like she received surprising news and is trying to be polite
Anonymous No.105796967 [Report] >>105798018
>>105795478
I had it spit out 'biotechnol' nonstop, then another quant spits out some thai letter. Ubargarm quant simply gave me 'wrong magic number' error. One of Unsloth quant works, but at the same speed as lcpp, while the latest one gave me some tensor size mismatch error. I've given up on ik_llama and just use mainline lcpp, which simply works out of the box. As for VRAM, just juggle some tensors in your free time, but don't expect incredible speedups.
Anonymous No.105796987 [Report]
>>105795478
>DeepSeek-R1-0528-IQ3_K_R4
Retarded quant which needs retarded fork to run
But fails to do it each time
Anonymous No.105797021 [Report]
>>105795504
What version of text generation webui? The recents ones removed support for vision models.
Anonymous No.105797383 [Report] >>105797446
Is there a SillyTavern tutorial that talks specifically about how to set up a narrator? I dont want just a gay chat between retarded anime girls
Anonymous No.105797446 [Report]
>>105797383
You can use mikupad or similar for raw text generation without instruction mode. Just put ao3-like tags and summary to generate a fanfic
Anonymous No.105797510 [Report] >>105797544
Can I run local models on my RX 6800 with linux, or do I have to use windows?
Anonymous No.105797544 [Report] >>105797584
>>105797510
I sincerely doubt you're capable of using either
Anonymous No.105797584 [Report] >>105797625
>>105797544
Answer the question nigger. The ROCM docs say it's supported on windows but not on linux, but I don't know how up to date that shit is.
Anonymous No.105797625 [Report]
>>105797584
ROCM is such a shitshow you'd probably be better off just using vulkan
Anonymous No.105798010 [Report] >>105798298
>the scent of video games he's been playing
that's a new one. Haven't heard that one before
Anonymous No.105798018 [Report]
>>105795478
>>105796967
I got the biotechnol spam when I tried putting a single ffn_down with nothing else on the same GPU the attn layers were on, up and gate were fine. I don't know exactly what causes it but removing -fmoe stops it from happening.
Anonymous No.105798035 [Report] >>105798345
>>105796772
nta, but just ask chatgpt or claude, its not that hard if you are comfortable running python scripts. honestly the training script is usually pretty much just boiler plate. you just need a to set your learning rate and point it at your dataset. hardest part of the whole process is the dataset. I think the general trend these days is using the bigger models (api) to generate the synthetic data tailored to your needs.
Anonymous No.105798101 [Report] >>105798170
Where can I download the illustrious inpainting model?
Anonymous No.105798170 [Report] >>105798721
>>105798101
The model itself is on civitai if that's what you mean
Anonymous No.105798223 [Report]
>>105795478
idk why I think that's so funny.
But I do.
gl with your broken engine. I'm sure you'll figure it out.
Anonymous No.105798298 [Report] >>105798306
>>105798010
Is it Deepseek (or Mistral Small 3.2)? R1 loves forcing smells into its narration at any cost.
Anonymous No.105798306 [Report]
>>105798298
paintedfantasy - a fine tune of mistral small 3.2
Anonymous No.105798342 [Report]
How many weeks until steveseek goof?
Anonymous No.105798345 [Report] >>105799309 >>105799964
>>105798035

Thank you, kind anon

AGI is BS. The future belongs to sharp AI tools fune-tuned to (a) specific task(s)
Anonymous No.105798552 [Report]
>>105789629
>Mistral Small's coherence limits in extended adventure game roleplay
This was the 22B Mistral Small from last year not the current Mistral Small 3.X series. It also was back when llama.cpp had even more unfixed flash attention bugs than today, which manifested as errors that increased with greater context size so could also have been a factor. The post doesn't say if flash attention was enabled but it likely was. So rather than a limit that result should be taken as a minimum: doing worse than that with a more recent 22B+ LLM indicates the model is poop or there's something wrong in your setup.
Anonymous No.105798581 [Report] >>105798612
https://huggingface.co/openai/gpt-4.2-turbo
Anonymous No.105798612 [Report] >>105798706
>>105798581
Anonymous No.105798705 [Report]
>>105794140
What probably happened here is that the PR title was copypasted, "llama" in this context just means the llama.cpp core library.
Anonymous No.105798706 [Report]
>>105798612
why are you always using the ugliest cats as your pic answer to this bait
Anonymous No.105798721 [Report]
>>105798170
But there is no inpainting model for illustrious, just the base model and fine-tuning.
Anonymous No.105798737 [Report] >>105798773
Best long context model to summarize long stories right now?
Was that 1m context chinese model better than r1 for that purpose?
Anonymous No.105798773 [Report] >>105798806 >>105798822
>>105798737
local is hot garbage compared to gemini for that
deepseek is hot garbage too
just using half of its maximum half context you get summaries that feel like they were written by a dense autist who couldn't help but mention all the minor happenings that were not actually important
use Gemini and forget about local, Gemini can actually write a good summary after ingesting 500K tokens
Anonymous No.105798778 [Report]
>>105795571
>昨日, 私 歩ed 通gh the 森, 楽ing the 静 環境. The 大 木s 投st 長ng 影s on the 地面, 創造ing a 美ful 模様 of 光 and 影

Yesterday, I walked through the woods enjoying the calm environment. Big trees cast long shadows to the ground creating a beautiful pattern of light and shadow

IMHO, we all but follow some/same patterns
Anonymous No.105798806 [Report] >>105798816 >>105798821 >>105798822 >>105800794
>>105798773
I wasn't asking specifically about local, but isn't gemini gigacucked?
Anonymous No.105798816 [Report]
>>105798806
google has a strategy of making the model uncensored and putting a man-in-the-middle model that filters no no things
if you can fool that classifier then it's about as good as it gets
Anonymous No.105798821 [Report]
>>105798806
Not that anon, but kind of.
It'll avoid saying dick or pussy by default, but you can make it if you probe it just right.
The safety filters will block requests mentioning anything sexual alongside any explicit, and some implicit, young age.
Anonymous No.105798822 [Report] >>105798847
>>105798773
>>105798806
Also, which version of gemini are we talking?
Anonymous No.105798847 [Report] >>105798855 >>105799065
>>105798822
2.5 pro, I wouldn't use anything other than their current SOTA for something like large context summarization.
I've tested with the Flash model too, and it's significantly dumber. Though, not as dumb as deepseek was.
Anonymous No.105798855 [Report] >>105798876 >>105799222
>>105798847
>and it's significantly dumber
It really is, but it also seems to actually do better than 2.5 on really long contexts. Stuff like 300k tokens+.
I'd try both and see which works better.
Anonymous No.105798876 [Report] >>105798895 >>105799027
>>105798855
>Stuff like 300k tokens+

What on Earth do you stuff it with?!

It's 500 pages of text!
Anonymous No.105798895 [Report]
>>105798876
Whole book series.
It didn't work so good.
Anonymous No.105799006 [Report]
>>105796686
i like the evil one next to the upturned pigtails, which has to be contained otherwise she might bring about the end of the world.
Anonymous No.105799027 [Report]
>>105798876
You could put the entire lore of certain franchises and have the model answer any question about it in a way that RAG or finetuning will never be able to accomplish, although not even 300k tokens would be enough in certain cases.
Anonymous No.105799065 [Report] >>105799222
>>105798847
Nah, even 2.5 pro seems retarded, it gets tons of stuff mixed up specially because there's different chapters. Could also be that the rag is not working correctly but I don't think so.
And it's under 200k tokens.
Anonymous No.105799069 [Report]
Holy shit this thing better be worth all the work.
Anonymous No.105799079 [Report] >>105799095 >>105799126 >>105799152 >>105799211
I work at Meta, not as an AI developer, but the general consensus on LLMs right now is that there aren't many areas left for significant improvement.
And llama4's failure comes from fine-tuning data being shiet.
Anonymous No.105799095 [Report] >>105799266
>>105799079
Where are you from?
Anonymous No.105799126 [Report] >>105799266
>>105799079
>And llama4's failure comes from fine-tuning data being shiet.
You must be all geniuses over there. Thanks for the insight.
Anonymous No.105799152 [Report]
>>105799079
Try pretraining data as well
Anonymous No.105799210 [Report]
>follow AI influencers on twitter
>suddenly everyone talks about some scamming jeet working 10 us jobs in India
>tiimeline begins to fill with more and more jeets
>now it's all jeets talking about jeet things in jeetland in english for some fucking reason
Really encapsulates the state of US tech sector
Anonymous No.105799211 [Report]
>>105799079
Significant improvements possible by making every document in the pretraining phase matter and not just throwing stuff at it semi-randomly. Those 10~30B+ tokens instruct "finetunes" wouldn't be necessary if the base models could work decently on their own.
Anonymous No.105799222 [Report]
>>105798855
>Stuff like 300k tokens+.
/hard disagree/.
One of my tests was feeding a whole ass japanese light novel in its original language and having it summarize the key points through this prompt :
>Write a detailed summary explaining the events in a chronological manner, focusing on the moral lessons that can be understood from the book. Try to understand the moral quandaries from the point of view of the people from that civilization. The summary must be written in English.
This is what I got from 2.5 Pro :
https://rentry.co/uunaas4f
It's mostly accurate. Some terms aren't well transliterated, which is to be expected, but the chronology of events and underlying message are well preserved. Mind you, it's a novel I read multiple times that is why I'm using it in a summarization test (you can't judge the quality of a summary of something you couldn't summarize yourself).
Flash produced garbage and I didn't bother saving its answer, but I could run it again if you're curious to compare with that prompt + data.
Pic related is the amount of tokens seen in aistudio for this prompt+answer.
>>105799065
I don't use rag software, so I dunno about that. Is the technology perfect? no, but frankly, the fact that it manages to not forget the original prompt and write in English after seeing hundreds of thousands of japanese token has me fucking beyond impressed.
Anonymous No.105799225 [Report]
Dayum. Elon upgraded grok3 today. Trannies mad that it answers "2 genders" now.
Can't believe it answered with loli. I didnt give any extra instructions.

Why is closed moving in the exact opposite direction to closed?
I wrote it before but I can write erotic stories on gpt 4.1 now. Female targeted slop, but still.
Also we never gonna see grok2 aren't we. "Stable Grok3 first", very funny.
Anonymous No.105799266 [Report] >>105799283 >>105799683 >>105799857
>>105799095
Poland. I started as a Software Engineer a year ago.
>>105799126
I think, unlike Claude and OpenAI, we didn't have nearly enough quality annotated data, as Meta only started seriously gathering it after LLaMA 3. I'm not sure why they waited so long, but yeah. That's the reason why Zuckerberg bought Scale AI.
Anonymous No.105799283 [Report] >>105800829
>>105799266
>That's the reason why Zuckerberg bought Scale AI.
How bad is the scaleai data?
Anonymous No.105799309 [Report]
>>105798345
Holy based. That's exactly what I believe
Anonymous No.105799323 [Report] >>105799340
>Use google gemini
>Add a spicy word
>Push send
>No no you said the word peepee
>Money stolen
Anonymous No.105799340 [Report] >>105799347
>>105799323
>Use deepseek
>Add spicy word
>Push send
>She bites her lip tasting copper 10 times
>Money still with me because local
Anonymous No.105799347 [Report] >>105799370
>>105799340
>she bites her lip
>stop gen
>edit
>continue gen
Anonymous No.105799363 [Report]
do I really gotta download the paddle xi rootkit to run ernie VL?
Anonymous No.105799370 [Report] >>105799376
>>105799347
>:a
>she bites her lip
>stop gen
>edit
>continue gen
>goto a
Anonymous No.105799376 [Report] >>105799458
>>105799370
R1 0528 doesn't have this problem.
Anonymous No.105799393 [Report] >>105799407 >>105799414 >>105800563
>one of the most fascinating piece of tech in recent history
>everyone just wants it to write text porn
>I don't even remember ever hearing about men reading erotic literature before this, but there wasn't that many troons in the early internet
>now there is an epidemic of men who get off text porn?
Anonymous No.105799407 [Report]
>>105799393
>I don't even remember ever hearing about men reading erotic literature before this
Yeah surely this is a new thing, is not like western porn visual novels are best sellers on steam or anything, surely.
Anonymous No.105799414 [Report] >>105799477 >>105799593
>>105799393
>I don't even remember ever hearing about men reading erotic literature before this
literally every vn to exist
Anonymous No.105799419 [Report] >>105799427 >>105799445 >>105799500 >>105799509
I've upgraded my total VRAM to 48 GBs.
What models could I reasonably run at Q4 quants?
Anonymous No.105799427 [Report]
>>105799419
Mistral 8x7B
Anonymous No.105799445 [Report]
>>105799419
Anything over 24GB is useless until you reach 200GB because then you can run unsloth deepseek quants entirely in vram. Every other model worth running fits in a 3090.
Anonymous No.105799458 [Report]
>>105799376
UD IQ1_M quant of it is larger than IQ2_XXS of the previous one
Anonymous No.105799477 [Report] >>105799504 >>105799507 >>105799526
>>105799414
They have pictures
Anonymous No.105799479 [Report] >>105799501
>>105791168
>run on a single 3090
It will be a giant model with good/great benchmark scores. This way, no one will run it locally, paid API will be expansive, and he can still they "we released an open source model with very strong capabilities" without undermining his business.
People who think they will release a great, tiny and usable model are totally delusional.
Anonymous No.105799500 [Report]
>>105799419
if you have a lot of normal system ram in addition to the gpu, you could get away with -ot exps=CPU for deepseek (lower quant like q2) or qwen3 with decent speed
Anonymous No.105799501 [Report]
>>105799479
That would be the best outcome. A small model, slightly better than Nemo but censored to hell is likely what we'll get instead
Anonymous No.105799504 [Report] >>105799525
>>105799477
Fucking retard. Its not the pictures why people enjoy them. Its immersion. Text, Music, Visual sometimes voice. Gotta use your mind to fill in the blanks. Same shit.
Anon over here lecturing the nerds about VNs while not even reading or into them. kek Crrrazy
Anonymous No.105799507 [Report] >>105799525
>>105799477
Like 1 picture for every 5 pages worth of text sure
Anonymous No.105799509 [Report]
>>105799419
For smut i'd try this with partial offloading https://huggingface.co/bartowski/Behemoth-123B-v1-GGUF
Anonymous No.105799525 [Report] >>105799547
>>105799504
>>105799507
Only women can do it without pictures at all
Anonymous No.105799526 [Report]
>>105799477
Clinically retarded.
Anonymous No.105799542 [Report] >>105799554 >>105799556 >>105799604
It would be funny to see how the same faggots would praise the first good LLM with native image generation
Anonymous No.105799547 [Report]
>>105799525
Well then paint my nails and call me sally. Or just let the model use stable diffusion every few minutes to draw the scene and you've got a VN
Anonymous No.105799554 [Report]
>>105799542
There will never be a good local LLM with image generation.
Anonymous No.105799556 [Report]
>>105799542
I use both
Anonymous No.105799593 [Report]
>>105799414
>literally every vn to exist
why do you think there is a skip button
men don't have time for this shit
Anonymous No.105799602 [Report] >>105799618 >>105799820
most popular hentai game (vn, rpg or whatever) request on f95zone: gallery save plz
>men really want to read the text I say
Anonymous No.105799604 [Report] >>105799655
>>105799542
>can only generate heckin wholesome dogs in hats and astronauts riding horses in space
Anonymous No.105799618 [Report]
>>105799602
Cuck95 are thirdworlders like jeets and so on.
Check steam play time numbers on the reviews you dum dum. Just admit you are wrong Ferret Software wanna be.
Anonymous No.105799655 [Report] >>105799674
>>105799604
>train lora
works on my machine
Anonymous No.105799674 [Report]
>>105799655
Tell it to flux users
Anonymous No.105799683 [Report]
>>105799266
>I think, unlike Claude and OpenAI, we didn't have nearly enough quality annotated data, as Meta only started seriously gathering it after LLaMA 3. I'm not sure why they waited so long, but yeah. That's the reason why Zuckerberg bought Scale AI.
As well as data gathering, are you all also immune to sarcasm?
Anonymous No.105799695 [Report] >>105799702
All your local models will be irrelevant in two weeks
Anonymous No.105799702 [Report] >>105799746
>>105799695
Steve is releasing in two weeks.
Anonymous No.105799746 [Report]
>>105799702
Tell Steve it's not healthy to hold it in for so long.
Anonymous No.105799820 [Report]
>>105799602
https://incontinentcell.itch.io/factorial-omega
Anonymous No.105799857 [Report]
>>105799266
dubs cheque
What are the odds meta is going to reduce filtering for the pretraining and reduce censorship in the finetuning and post training etc
Anonymous No.105799875 [Report] >>105799972
I'm coping with Qwen3-235B-A22B-UD-Q3_K_XL. Is that the best for 2x3090s + 64GB RAM?
Anonymous No.105799948 [Report] >>105799963
I'm currently using a Q4 quant of Gemma3 for RP. If I swap to a Q8 quant of the same model, how much of an improvement can I expect to see?
Anonymous No.105799963 [Report] >>105800060
>>105799948
How much does it costs you to test it yourself?
Anonymous No.105799964 [Report]
>>105798345
The future belongs to AI companies that sell agents that can't perform any specific task and eat tokens. Companies have to buy them because they have to tell investors AI has made them 500x more efficient. They're going to be rolling in cash with the agent meme.
Anonymous No.105799972 [Report]
>>105799875
yeah or one of the largestrals. Get 128GB to run DS V3 0324 IQ1_S_R4 for higher quality cope
Anonymous No.105800060 [Report]
>>105799963
More than I'd like. Maybe if I bought a lot of RAM, I could try. After all, I'm testing for quality, not speed. Still, I'd like to actually use it, so I'd definitely need a GPU upgrade.
Anonymous No.105800387 [Report]
>>105795473
Cypher is the name of one of ScaleAI's dataset.
>t. I worked on it
Anonymous No.105800526 [Report]
>>105800515
>>105800515
>>105800515
Anonymous No.105800563 [Report]
>>105799393
>but there wasn't that many troons in the early internet

in presence or numbers ? numbers idk but presence you are then outing yourself troons were but it wasent the mentally ill shit now it was a "imagine a women who is not a faggot and lmao look that dude is larping as it XDDDD lol well have that soon enough give it some time" it was a equiveleant of a cosplay event it was fun and everyone played along because everyone knew what was meant and agreed now its just "castrate yourself goy oh you wont ? you are an incel mstow womne hating bla bla bla go mutilate yourself goy" before it was a cargo cult for a better future now its a demoralisation campaign towards satanism and demon (w*men) worship just like how blackpill used to refer how negative the world in general is but go co-opted by demon seeking faggots into their cuck bullshit

also in reagrd to troonism there diednt used to be surgeyr and none of that it was just feminine looking dudes crossdressing
Anonymous No.105800794 [Report]
>>105798806
Use it on Ai Studio. The Gemini models will refuse. However, they greatly reduce the capabilities of their models, so don't expect a great summary.
Anonymous No.105800829 [Report]
>>105799283
I'm a data reviewer for them. The data can be quite good or quite bad. I worked on a "safety project": About 95% of the data needed heavy work to make it good. The more instructions you give to "attempters", the less likely you'll get a good dataset.
Anonymous No.105801575 [Report]
>>105795473
It's an Amazon model and it's utter coal.