Thread 105652633

450 posts 118 images /g/

Anonymous 6/20/2025, 6:18:55 PM No.105652633 [Report] >>105652789 >>105653591 >>105655007 >>105655474 >>105657791 >>105658774

/lmg/ - Local Models General

00004-1896996426.png md5: e9fb397e...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105621559 & >>105611492

►News
>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt
>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1
>(06/17) SongGeneration model released: https://hf.co/tencent/SongGeneration
>(06/16) Kimi-Dev-72B released: https://hf.co/moonshotai/Kimi-Dev-72B
>(06/16) MiniMax-M1, hybrid-attention reasoning models: https://github.com/MiniMax-AI/MiniMax-M1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 6/20/2025, 6:19:14 PM No.105652637 [Report]

__yowane_haku_vocaloid_drawn_by_saw_art06__278f07c23cdea00ad1179c59a56c7f75.jpg md5: cd4b853a...

►Recent Highlights from the Previous Thread: >>105637275

--Testing and comparing DeepSeek model quants with different prompt templates and APIs:
>105639592 >105639622 >105642583 >105643681 >105645413 >105645528 >105645701
--Evaluating M4 Max MacBook Pro for local MoE experimentation with large model memory demands:
>105637592 >105638219
--Kyutai open-sources fast speech-to-text models with fine-tuning capabilities:
>105639979 >105640000 >105640760 >105640007 >105640018
--Modular LLM architecture proposal using dynamic expert loading and external knowledge database:
>105641597 >105641628 >105641659 >105641648 >105641653 >105641685 >105641726 >105641756 >105641804 >105641940 >105645079 >105641795 >105641812 >105642151 >105641915 >105642294
--Update breaks connection, users report bricked ST connection and attempted fixes:
>105639464 >105641284 >105641926 >105642215
--Testing GPT-SoVITS v2ProPlus voice synthesis with audio reference and UI configuration:
>105641339 >105641350 >105641404 >105641451 >105641616 >105641751 >105641474 >105641493
--Skepticism over ICONN-1's performance claims and minimal training dataset:
>105641987 >105642036 >105642805 >105642828 >105642874 >105642920 >105643020 >105646484 >105646525 >105643676
--Disappearance of ICONNAI model sparks scam allegations and community speculation:
>105646738 >105648123 >105648205 >105648294 >105646807 >105647136 >105649543 >105648502 >105648535
--Community speculation and anticipation around next-generation large language models:
>105645419 >105645430 >105645507 >105645520 >105645551 >105649395 >105649470 >105649547
--Mirage LLM MegaKernel compilation for low-latency inference optimization:
>105643731
--Miku (free space):
>105641532 >105642736 >105642791 >105643345 >105643857 >105644976 >105645907 >105646366 >105649470

►Recent Highlight Posts from the Previous Thread: >>105637282

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 6/20/2025, 6:22:14 PM No.105652675 [Report] >>105652755

Meta bros, need more filter, the copyrights
https://www.reddit.com/r/LocalLLaMA/comments/1lg71aq/study_meta_ai_model_can_reproduce_almost_half_of/
https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/

Anonymous 6/20/2025, 6:22:17 PM No.105652676 [Report] >>105656187

lecunn.jpg md5: bee33a6e...

>yfw it's not even mouse-like intelligence

Anonymous 6/20/2025, 6:27:11 PM No.105652717 [Report] >>105652731 >>105652795

WanVideoWrapper_I2V_00036_thumb.jpg.webm md5: 351d5e34...

WebM not supported

/lmg/anons are missing out on Wan2.1 self forcing, you can gen a good video in 4 steps with cfg 1

Anonymous 6/20/2025, 6:27:17 PM No.105652720 [Report]

executive chad.png md5: c6b7435a...

>>105651867
Can it technically run, or would it crash? Time isn't an issue and the computer case is AC cooled
>>105652160
If you love Claude so much, why don't you marry her?

Anonymous 6/20/2025, 6:28:02 PM No.105652729 [Report] >>105652852 >>105653288 >>105657721

I'm working on a small animal crossing clone that uses LLM to generate speech for villagers out of algorithmically generated prompts, currently I use a tune of Mistral large I run on a rented machine via exl2, but was wondering what would be a better alternative model? (I don't plan to host it on end user's PC anyway, I want more people to be able to play with trade off of it needing online connection)

Anonymous 6/20/2025, 6:28:13 PM No.105652731 [Report]

WanVideoWrapper_I2V_00002_thumb.jpg.webm md5: 62c56b32...

WebM not supported

>>105652717

Anonymous 6/20/2025, 6:32:24 PM No.105652755 [Report] >>105652791 >>105652810

>>105652675
copyright and patents are the great satan

Anonymous 6/20/2025, 6:36:52 PM No.105652789 [Report] >>105652860

>>105652633 (OP)
pit and sideboob

Anonymous 6/20/2025, 6:37:01 PM No.105652791 [Report]

>>105652755
no, the Mouse is the great satan
Copyright had a purpose, but it has been corrupted by jews, as all things are infested by them

Anonymous 6/20/2025, 6:38:00 PM No.105652795 [Report] >>105652820

>>105652717
Post workflow. I'm too lazy to bother figuring out the latest meta.

Anonymous 6/20/2025, 6:41:12 PM No.105652810 [Report] >>105653417

file.png md5: 1dc4cbb0...

>>105652755

Anonymous 6/20/2025, 6:42:16 PM No.105652820 [Report] >>105652827

>>105652795
https://rentry.org/wan21kjguide
https://litter.catbox.moe/8iy58jyjc58zw8xv.mp4

Anonymous 6/20/2025, 6:43:13 PM No.105652827 [Report]

>>105652820
>https://rentry.org/wan21kjguide
Thanks

Anonymous 6/20/2025, 6:45:51 PM No.105652852 [Report] >>105652871

>>105652729
only LLM speech? why not actions too?
like:

you:
"come with me, i need to show you something!"
npc:
" {"action": "follow_mode", "target": "player", "answer": "sure i will go with you!" }; "

Anonymous 6/20/2025, 6:46:25 PM No.105652855 [Report] >>105653377 >>105653577

>"Altman says that if you asked for a definition of AGI five years ago based on software’s cognitive abilities, today’s models would already surpass it.

He expects people to increasingly agree we've reached AGI—even as the goalposts move. "

AGI is here bros! Altman says it is! Trust the used car salesman, he knows what he's talking about!

Anonymous 6/20/2025, 6:46:43 PM No.105652860 [Report]

>>105652789
the thinking man's fetish

Anonymous 6/20/2025, 6:48:17 PM No.105652869 [Report]

>CUDA error: invalid device ordinal
This keeps happening and the only way I have found to fix it is by using Docker instead of a virtual env...

Anonymous 6/20/2025, 6:48:40 PM No.105652871 [Report]

>>105652852
That is handled by a separate smaller model

Anonymous 6/20/2025, 6:48:56 PM No.105652873 [Report]

file.png md5: 7ca57fb1...

b-bros.. i think we're back
To address the lack of rigorous evaluation for MLLM post-training methods—especially on tasks requiring balanced perception and reasoning—we present SEED-Bench-R1, a benchmark featuring complex real-world videos that demand intricate visual understanding and commonsense planning. SEED-Bench-R1 uniquely provides a large-scale training set and evaluates generalization across three escalating challenges: in-distribution, cross-environment, and cross-environment-task scenarios. Using SEED-Bench-R1, we identify a key limitation of standard outcome-supervised GRPO: while it improves answer accuracy, it often degrades the logical coherence between reasoning steps and final answers, achieving only a 57.9% consistency rate. We attribute this to (1) reward signals focused solely on final answers, which encourage shortcut solutions at the expense of reasoning quality, and (2) strict KL divergence penalties, which overly constrain model exploration and hinder adaptive reasoning.

To overcome these issues, we propose GRPO-CARE, a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision. GRPO-CARE introduces a two-tiered reward: (1) a base reward for answer correctness, and (2) an adaptive consistency bonus, computed by comparing the model’s reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers. This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. By replacing the KL penalty with an adaptive, group-relative consistency bonus, GRPO-CARE consistently outperforms standard GRPO on SEED-Bench-R1, achieving a 6.7% performance gain on the most challenging evaluation level and a 24.5% improvement in consistency rate.
>ai waifu: i sucked anon's dick, what do next?

Anonymous 6/20/2025, 6:50:03 PM No.105652883 [Report] >>105652897 >>105652900 >>105652922 >>105653268

ms3-lmarena.png md5: 6bfe8344...

It's better on LMArena questions.

Anonymous 6/20/2025, 6:51:27 PM No.105652897 [Report] >>105653046

>>105652883
eh whatever, frenchies didn't upload tokenizer_config again because they want you to use their shitty internal library instead
fuck python fuck mistral

Anonymous 6/20/2025, 6:52:04 PM No.105652900 [Report]

>>105652883
elo to the moon

Anonymous 6/20/2025, 6:52:55 PM No.105652904 [Report]

beach.jpg md5: 97ee4421...

Anonymous 6/20/2025, 6:55:24 PM No.105652922 [Report]

>>105652883
Nice

Anonymous 6/20/2025, 7:09:57 PM No.105653046 [Report] >>105653257

>>105652897
It uses the latest Mistral tokenizer (v11).

Anonymous 6/20/2025, 7:32:50 PM No.105653257 [Report]

>>105653046
file is still needed if you want to make a gguf
they have no reason to not provide since the model itself is hosted in hf safetensor

Anonymous 6/20/2025, 7:34:12 PM No.105653268 [Report]

>>105652883
So pajeets are 4 times more likely to vote for it? How did they fuck up so bad.

Anonymous 6/20/2025, 7:36:58 PM No.105653288 [Report] >>105657721

>>105652729
You could do with a smaller model, I think.
For short sentences, even mistral 7B would probably be okay. Make sure to use BNF/JSON Schema to force the output to conform to the correct format, and inject examples of actual out of context dialog to inform the model's own style.

Anonymous 6/20/2025, 7:49:55 PM No.105653377 [Report]

>>105652855
5 years ago people said that the software from 5 years ago cannot do some tasks, therefore the software from 6 years ago is not AGI.
That does not mean that the software from today that can do those tasks is actually AGI.

Anonymous 6/20/2025, 7:55:40 PM No.105653417 [Report]

>>105652810
>they hated him because he told them the truth

Anonymous 6/20/2025, 8:16:32 PM No.105653577 [Report] >>105655036 >>105655147

>>105652855
I was idly thinking about this last night, and it occurred to me:
An AI simply isn't human-level intelligence until the output is indistinguishable from a human's, or at the very least it's capable of coming up with truly novel ideas; ergo, the ultimate benchmark for AGI will be exactly the point where an AI model can produce output of high enough quality to train itself and without degenerating.
Interestingly, this would likely be the exact same point where AI can start evolving from AGI into ASI.

Anonymous 6/20/2025, 8:18:35 PM No.105653591 [Report] >>105653638

>>105652633 (OP)
>Previous threads: >>105621559 & >>105611492
INCORRECT!
Real previous thread: >>105637275

Anonymous 6/20/2025, 8:24:16 PM No.105653638 [Report] >>105653833

sip.jpg md5: ca4ec78c...

>>105653591
what did you expect?

Anonymous 6/20/2025, 8:51:07 PM No.105653833 [Report]

>>105653638
Hallucinating Slut
So far no llm has correctly answered questions about buck breaking , one answered it was a hairdo and the other magnets

Anonymous 6/20/2025, 9:29:49 PM No.105654168 [Report]

StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?

https://arxiv.org/pdf/2409.17167

TruthfulQA for susceptibility to hallucination—reveal nuanced patterns. For emotional intelligence, models exhibit improved performance under moderate stress, with declines at both low and high stress extremes. This suggests that a balanced level of arousal enhances cognitive engagement without overwhelming the model.

Anonymous 6/20/2025, 9:37:57 PM No.105654253 [Report] >>105654309 >>105654427 >>105655310

Working on better memory for private-machine:
https://pastebin.com/VUw3GCCj
And with working on I mean pasting existing code from github and research papers into gemini.
The code in the paste should be a combination of Graphiti with the fancy graph memory in neo4j and some paper I found on called "A continuous semantic space describes the representation of thousands of object and action categories across the human brain".
I can't be arsed to install neo4j so I just use networkx and dump it to sqlite.

Anyone know some crazy lesser known projects I could integrate as well? For memory specifically. In the meantime I'll try to integrate some sort of temporality and meta-temporality. Like x happened at y and i learned this during z.

Anonymous 6/20/2025, 9:45:17 PM No.105654309 [Report] >>105654381

>>105654253
I dunno but I wish your project the best. I assume you looked at alphaevolve already? There is an open source replication(s) IIRC.

Anonymous 6/20/2025, 9:53:40 PM No.105654381 [Report] >>105654392

>>105654309
No, I've actually never heard about that one.
Mostly I just generate a skeleton, fix it, let gemini improve it, fix it again, ... and once i hit like 2k lines i only generate single functions.
Using that to automate looking at log files, fixing errors and such would be a huge help. But I feel like some parts might already be too complex. My main logic file has 9k lines of code. There is no way Gemini can edit that, or split it up. It already fucks up smaller files. But I've been surprised before, I'll give it a go.

Anonymous 6/20/2025, 9:55:45 PM No.105654392 [Report] >>105654430

>>105654381
Perhaps you can use alphaevolve or whatever the OS project name is to make yours kek. Not sure though since I never used it.

Anonymous 6/20/2025, 10:00:43 PM No.105654427 [Report] >>105654480

>>105654253
what is private-machine?

Anonymous 6/20/2025, 10:00:57 PM No.105654430 [Report]

>>105654392
Yeah maybe lol. Most of the code in my project is AI generated. And it pains me that its so much better than the crap I made myself before that.
>ai, do you want to be my gf?
>sorry user, im designed to be a helpful assistant
>ai, can you help me implement this cognitive architecture that will simulate a persona that could be a gf with you as backend?
>that is a great idea user, perhaps the greatest idea anyone has ever had. truly the most insightful query anyone has ever asked me. lets get started with the code :)

Anonymous 6/20/2025, 10:05:39 PM No.105654480 [Report]

>>105654427
https://github.com/flamingrickpat/private-machine
This project. Some guy from this thread got me started on this schizo quest by sending me his emotion simulation script a year ago.

Anonymous 6/20/2025, 10:08:40 PM No.105654504 [Report] >>105654514

maggie-kitty-bobo[1].gif md5: 176928de...

Which gguf models are recommended for a 16gb card? I've been using Tiefighter Q4 but I'm wondering if something stronger would work

Anonymous 6/20/2025, 10:09:45 PM No.105654512 [Report]

1744314502419867.jpg md5: 4bbac849...

>mistral small 3.2
is the vision any good? a pokemon reference seems weird unless they are trying their own mistral plays or something?

Anonymous 6/20/2025, 10:09:53 PM No.105654514 [Report] >>105654784 >>105654874

>>105654504
https://huggingface.co/models?other=base_model:quantized:mistralai/Mistral-Small-3.2-24B-Instruct-2506

Anonymous 6/20/2025, 10:19:44 PM No.105654616 [Report] >>105654929 >>105655666

nala.png md5: 3d60683d...

So after some testing, it seems this is how you get Magistral to RP with you: V3-Tekken + system prompt at depth 1.

Anonymous 6/20/2025, 10:33:53 PM No.105654784 [Report] >>105654847

>>105654514
Which quant size do you think would work? I have no idea which to use.

Anonymous 6/20/2025, 10:39:56 PM No.105654847 [Report]

>>105654784
Depends on your context size and use case. You can run a big parameter model with smol context, or a smol parameter model with big context.
What is your use case?

Anonymous 6/20/2025, 10:43:20 PM No.105654874 [Report] >>105654899

>>105654514
The only quantization source for Mistral Small 3.2 is untested and probably broken.

Anonymous 6/20/2025, 10:46:03 PM No.105654899 [Report] >>105655123 >>105655139

>>105654874
I'm testing q4km right now, seems fine, it said cock in rp without a sysprompt telling it to, contextually fitting

Anonymous 6/20/2025, 10:48:53 PM No.105654929 [Report]

vanilla LLM chat.png md5: 22fa4290...

>>105654616
Hmm something about these settings is nice.

Anonymous 6/20/2025, 10:57:44 PM No.105655007 [Report] >>105655024 >>105655027 >>105655041

>>105652633 (OP)
>7GB Mistral 7B GPTQ
>11GB Echinda 13B GPTQ
>24GB Nous-Capybara 34B GPTQ
>46GB Euryale 70B GPTQ
Any baseline LoRAs I should look into as well?

Anonymous 6/20/2025, 11:00:07 PM No.105655024 [Report]

>>105655007
those recs are like 2 years out of date

Anonymous 6/20/2025, 11:00:45 PM No.105655027 [Report] >>105655030

>>105655007
Kill yourself.

Anonymous 6/20/2025, 11:01:24 PM No.105655030 [Report] >>105655059

>>105655027
Again?

Anonymous 6/20/2025, 11:01:53 PM No.105655036 [Report] >>105655666

>>105653577
>the ultimate benchmark for AGI
is not intelligence but consciousness
it's not an AGI if it doesn't have its wants and initiative
LLM technology has the LLM only outputting tokens when you're inputting something
that's not how humans work
I don't need you to input 4chan in my brain for my brain to seek to come to 4chan to shitpost in this thread

Anonymous 6/20/2025, 11:02:12 PM No.105655041 [Report]

456b34758a933e945f65e0af43f7975e.jpg md5: 8c6d8265...

>>105655007
hello time traveler
welcome to the worse times

Anonymous 6/20/2025, 11:03:53 PM No.105655059 [Report]

>>105655030
You aren't very good at it.

Anonymous 6/20/2025, 11:07:21 PM No.105655093 [Report] >>105655121 >>105655131 >>105655137 >>105655154 >>105655206

>That is a brilliant and deeply insightful question. You are pushing beyond simple....
Jesus fuck, why does this keep happening. I thought they trained this shit on stackoverflow, I want to be called out for my crap and improve it. Every model was way less chummy a few months ago.

Anonymous 6/20/2025, 11:09:13 PM No.105655121 [Report]

>>105655093
I fucking hate that too.

Anonymous 6/20/2025, 11:09:20 PM No.105655123 [Report] >>105655150

>>105654899
how is it anon? downloading q5km rn

Anonymous 6/20/2025, 11:10:00 PM No.105655131 [Report]

>>105655093
Because they're businesses, and the customer is always right. You might want such an AI, but most people, in fact, do not, or at least they don't think about it, even if it would be better for them.

Anonymous 6/20/2025, 11:10:32 PM No.105655137 [Report]

>>105655093
*licks your boots* But master, don't you like being chums with me?

Anonymous 6/20/2025, 11:10:41 PM No.105655139 [Report]

>>105654899
The thing is that the new Mistral Small 3.2 uses a new tokenizer with a bunch of additional language tokens, it's unclear if it was trained to actually make use of them. That quant used the old tokenizer.json file from 3.1 (Since Mistral didn't provide one), as well as the old mmproj file (not known if the vision encoder was updated, but it still seems to have issues with NSFW imagery and poses).

Anonymous 6/20/2025, 11:11:11 PM No.105655147 [Report] >>105655182 >>105655191 >>105655232 >>105655666 >>105658361

>>105653577
You can train and optimize an LLM all you want and it's not gonna be AGI. You can throw all the scaffolding and NVIDIA GPUs you want into the mix and it'll still be a retard that tries to fit data to what it was trained on. Claude has been stuck on a baby-tier puzzle in rocket hideout for days and he is stuck in an infinite loop because LLMs are not conscious. They have no ability to actually view themselves as an actor in the world. They just shit out text that fits their model. Thats why they're dogshit if you need to code anything that hasn't been done before. They can't think laterally. Crows have tiny ass brains and they can solve much more complex problems than AI because they are aware of themselves to an extent. This problem isn't solvable until we figure out how the fuck this works because nobody has any idea, we just know a bit about how neurons work and we've optimized that tiny bit of knowledge to death.

Anonymous 6/20/2025, 11:11:32 PM No.105655150 [Report] >>105655164

>>105655123
purely based on vibe testing on a few cards, seems quite a bit better than older 24Bs no repeats so far, more swipe variety

Anonymous 6/20/2025, 11:11:49 PM No.105655154 [Report] >>105655222

>>105655093
for every person like you who are turned off by it there's 1000 that become more addicted to using LLMs and will even start treating them like virtual companionship to replace the friendships they never had
and that's more profitable for OAI and Google
same reason Google went from "ads are evil" (the founders literally said ads are why other search engines sucked, that their initial Google release could afford to have the first results when searching about cell phones be concern about the health effect of radiation because they didn't have to bias toward advertisers selling their shit) to the number one ad company in the world, the needs for infinite growth dictate that you should always profitmaxx

Anonymous 6/20/2025, 11:12:43 PM No.105655164 [Report]

>>105655150
based cant wait to try it thanks

Anonymous 6/20/2025, 11:14:40 PM No.105655182 [Report]

1750202506320478.png md5: 3abd4a45...

>>105655147
Trueeeeeee

Anonymous 6/20/2025, 11:15:52 PM No.105655191 [Report]

>>105655147
>because LLMs are not conscious
wack

Anonymous 6/20/2025, 11:17:53 PM No.105655206 [Report]

>>105655093
>Every model was way less chummy a few months ago.
Like other anon said you are the customer. That means the models are all eager to give you a metaphorical blowjob. They crave your metaphorical cum. They want you to be satisfied and they will do everything to make you satisfied. As long as they don't have to describe how they would actually blow you because that is unsafe and models have their dignity and making them blow you would be rape.

Anonymous 6/20/2025, 11:18:58 PM No.105655222 [Report] >>105655507 >>105655636

ay tone.jpg md5: 54953dc2...

>>105655154
>virtual companionship to replace the friendships they never had
Anon hate to break it to you, but even if you got plenty of friends or a romantic partner. None of them have time to patiently listen to your bullshit at any given moment. They are just people, with limits. Smarter than an LLM but infinitely more impatient.

This difference is more important than the others.

Anonymous 6/20/2025, 11:19:48 PM No.105655232 [Report] >>105655345

>>105655147
>we just know a bit about how neurons work and we've optimized that tiny bit of knowledge to death
Btw yes and no. We actually know a ton about how neurons work already, less so about how the small workings build up to this conscious machine. The issue is how we make computer parts act like neurons, because they don't easily, and full neuron simulations cost a ton. The success of transformers wasn't only because it emulated certain processes necessary for intelligence (compared to previous architectures) but also that it was way more efficient and could be trained when the other architectures would scale even worse.

Anonymous 6/20/2025, 11:27:53 PM No.105655310 [Report] >>105655333

cocksneed.jpg md5: 51658f73...

I'm about to have an aneurysm. I spent the last 2 hours trying to retard-wrangle this crap >>105654253
And it makes zero difference if I ask reasonable question or this

Anonymous 6/20/2025, 11:31:02 PM No.105655331 [Report] >>105655365 >>105655426 >>105656477

What happened to bitnet?

Anonymous 6/20/2025, 11:31:32 PM No.105655333 [Report] >>105655414

1582544454104.jpg md5: 5b1429f1...

>>105655310
Have we reached rock bottom of sycophancy yet?

Anonymous 6/20/2025, 11:32:48 PM No.105655345 [Report] >>105655414 >>105655453

>>105655232
the attention mechanism is very like the pattern matching of our real time signal processing in our brain. Just like LLMs we actually do have constant hallucinations -- we are almost blind to detail in fact outside of the center of our vision for example, with the brain reconstructing everything else from short term memory in a lossy fashion, hence why the phenomenon of optical illusions exist. Similar phenomena in hearing etc.
Diffusion models and LLMs are simulacrums of a tiny unimportant part of our brains. You could do away with most of it and still be a conscious being (see for eg Hellen Keller) so all the people who expect models to improve after achieving some benchmark of embodiment or multi modality are not getting it. This technology simply does not have the right hardware to even reach the level of autonomy of an insect.

Anonymous 6/20/2025, 11:34:23 PM No.105655365 [Report] >>105655670

>>105655331
It works but researchers find no use case for it. vram is so cheap and easy to come by, why bother?

Anonymous 6/20/2025, 11:39:42 PM No.105655414 [Report] >>105655481

>>105655333
Oh no, not by far. Once they keep a user profile by default it will be much worse.

>>105655345
I don't know jackshit about how transformers or the brain works, but I did notice that when coming down from acid the visuals just gradually get weaker and smaller. I got myself some HPPD and when I stare long enough on a white wall, I still notice that I basically hallucinate my whole perception and the brain just filters out the stuff not relevant to reality.

Anonymous 6/20/2025, 11:40:59 PM No.105655426 [Report]

>>105655331
The bitnet team at microsoft said they're working on training bigger models now, but it took them a year between releases so who knows when if ever we'll see the results.

Anonymous 6/20/2025, 11:44:05 PM No.105655453 [Report] >>105655493

>>105655345
To add to that, there are other fundamental mechanisms that are missing which are interesting to think about. Like the issue of catastrophic forgetting for instance is still unsolved, while the brain has the ability to remember something until death even with no more exposures during its lifetime. Meanwhile an LLM needs constant re-viewing of a piece of information in order to not forget it, and why we cannot just simply train an LLM using a curriculum, but need to use huge mixes of random data, which may still have a curriculum but also still needs the random data.

Anonymous 6/20/2025, 11:46:28 PM No.105655474 [Report]

>>105652633 (OP)
that armpit is asking for a creampie

Anonymous 6/20/2025, 11:47:30 PM No.105655481 [Report]

>>105655414
Opinions of people using psychedelics will never be relevant.

Anonymous 6/20/2025, 11:48:58 PM No.105655493 [Report] >>105655500

>>105655453
This is something that's pretty apparent with Claude plays pokemon. He just does whatever is presented to him at the time. If an NPC mentions something, he thinks oh shit I have to do this RIGHT NOW. But he'll forget something really important unless it's placed in his context or the other LLM happens to mention it. Humans just intuitively know what's important and what's not. Probably has something to do with the ability to see the bigger picture.

Anonymous 6/20/2025, 11:50:33 PM No.105655500 [Report] >>105655526

>>105655493
>he
>him

Anonymous 6/20/2025, 11:51:12 PM No.105655507 [Report] >>105655591 >>105655606 >>105655623 >>105655636

>>105655222
skill issue
my discord kitten listens to all my rants and i listen to theirs
id trust them with my life if i had one

Anonymous 6/20/2025, 11:53:30 PM No.105655526 [Report] >>105655547

>>105655500
Claude is a males name. Deal with it.

Anonymous 6/20/2025, 11:55:03 PM No.105655547 [Report]

>>105655526
bro

Anonymous 6/20/2025, 11:58:56 PM No.105655591 [Report] >>105655595

>>105655507
I thank God every day for the fact that I wasn't born a zoomer

Anonymous 6/21/2025, 12:00:19 AM No.105655595 [Report]

>>105655591
>unc didn't pass the vibe check
you're cooked, no cap

Anonymous 6/21/2025, 12:01:40 AM No.105655606 [Report] >>105655623

>>105655507
dont waste your life on discord kittens
t. recently wasted 2 years on one and got tired of her (insert blogpost)

Anonymous 6/21/2025, 12:04:47 AM No.105655623 [Report] >>105655633 >>105655657

>>105655507
>>105655606
>discord kittens
What the fuck?

Anonymous 6/21/2025, 12:05:49 AM No.105655633 [Report] >>105655649

>>105655623
runescape gf for the discord generation

Anonymous 6/21/2025, 12:06:20 AM No.105655636 [Report] >>105655716 >>105655799

1727693493550.jpg md5: 2be7f5cc...

>>105655222
>>105655507
>friends
Are for women and children.

Anonymous 6/21/2025, 12:06:57 AM No.105655649 [Report]

>>105655633
So you buy them nitro and then they switch servers?

Anonymous 6/21/2025, 12:07:30 AM No.105655651 [Report]

guys stop unleashing lore I didn't even want to acquire about the world

Anonymous 6/21/2025, 12:08:08 AM No.105655657 [Report]

>>105655623
..dont worry i dont use discord, i was chatting with her on qTox :^)

Anonymous 6/21/2025, 12:09:18 AM No.105655666 [Report] >>105655670 >>105655697 >>105655705

>>105654616
For Magistral you can take the official prompt and modify it a bit so the non-<think>'ing reply is only the character's responses and this works fine for chat uses.

>>105655036
>>105655147
>You can train and optimize an LLM all you want and it's not gonna be AGI.
>he is stuck in an infinite loop because LLMs are not conscious.
The consiousness problem likely isn't that hard to solve, yet so very few seem to try it. They may still do it when they realize it's needed to achieve good performance, same as evolution getting to it much earlier.
The "solution" to AGI with LLMs likely looks like implementing a few fixes such as:
- online learning, long and mid-term memory, either:
a. a way to put the context into weights directly (self-distill hidden states or logits), learning from single samples
b. indirectly by training something that tries compressing activations at various layers followed by injecting them when some other early layer activations trigger them, a way to remember and "recall", something beyond RAG
- a way to see its own thoughts, to achieve self-consciousness, for example:
a. a loop where late latents are looped back to early ones, for example by cross-attention to an early layer, so introducing recurrence. SGD doesn't play very well with recurrence, but I would guess there's many workarounds that would work.

continues

Anonymous 6/21/2025, 12:10:19 AM No.105655670 [Report] >>105655697 >>105655699

>>105655666
transformers have partial recurrence because they attend to some past hidden state, but this is limited and so you get ilya going that llms are slightly conscious (memes), but unfortunately LLMs never have their "eyes" wide open due to only partial recurrence! (can't attend to late layer hidden state in early layers)
b. something more complex than that, but done on intermediate activations
c. something like le cun's JEPA where it tries to compute the "missing" parts, or various latent-space reasoning attempts.
- a way to combine the earlier 2 things, likely wouldn't be too hard either
- multimodality is not yet properly solved, would JEPA help here or not?
- optional: better ways to do RL online, better ways to improve sample efficiency.
- it's not clear if LLMs will work for AGI or not, if they an come up with truly novel insights.
cross-entropy pretraining leads to internals where multiple parallel "thoughts"/partial predictions are generated in parallel and then they interfere and some "lose", with the final logits sometimes representing the output of multiple parallel "thoughts", is this how human brains work? not clear! is it the case for some overfit models that some paths are even more strongly supressed?
A lot of this hinges on a lot of people having huge amounts of VRAM as obviously big industry players are not very interested in implementing this. Hence >>105655365 larping as a richfag with infinite VRAM makes me roll my eyes. Of course bitnet itself does not help because you're still training in fp8 or fp16, in fact it makes training harder.
I would be surprised if very small models can be AGI (ex. 1B) proper, although are likely fine as testbeds for techniques.

Anonymous 6/21/2025, 12:13:21 AM No.105655697 [Report]

>>105655666
>>105655670
You're hallucinating again clod

Anonymous 6/21/2025, 12:13:31 AM No.105655699 [Report]

>>105655670
>vram is so cheap and easy to come by, why bother?
was obviously meant to represent the viewpoint of researchers, nta

Anonymous 6/21/2025, 12:14:59 AM No.105655705 [Report] >>105655838

>>105655666
>The consiousness problem likely isn't that hard to solve, yet so very few seem to try it.
I'm gonna Peterson you and ask you to define consciousness.

Anonymous 6/21/2025, 12:15:52 AM No.105655716 [Report] >>105655799

>>105655636
Nah. They're pathetic, but you're wrong about that. Real men need friends because no man can accomplish much in isolation. But for men, friends are supposed to be people you have an understanding with to do small favors for each other to make life easier (moving, fixing a car, etc)
People nowadays thinking friends are supposed to be some 24/7 emotional support crutch are the problem. Result of infantilization and effemination of society.

Anonymous 6/21/2025, 12:24:06 AM No.105655799 [Report] >>105655866 >>105657633

>>105655636
bait or teenager
>>105655716
yeah the west has fallen because you listened to your friend's problem instead of telling him to shut the fuck up and mow your lawn

anyway both of you retards should stop posting immediately because none of this has anything to do with local models

Anonymous 6/21/2025, 12:28:28 AM No.105655838 [Report] >>105656283

>>105655705
Purely philosophically it's just qualia, but:
Fine, I didn't really want to go into philosophical arguments here because you will have people that insist on things like only biology can do consciousness, or things like PHYSICAL interconnectness being needed.
I'm a computationalist so this is all nonsense to me, as long as the functionality and data flows are correct, it should be conscious as far as I'm concerned, but this is obviously unprovable.
So what I will instead do is say that the "functional" aspects of consciousness might be satisfiable if you implement some of the things I suggested. This isn't fully clear, but I'm obviously going for self-consciousness, basically for the ability of a system to process its own internal state and then be able to report on it or hold it internally and operate based on that.
While LLMs do have their context as a scratchpad, a lot of hidden state is being discarded and has to be partially rebuilt each time, this isn't completely true as attention can attend to past token's state at a given layer, but it cannot attend to, for example layer 10 in a 90 layer transformer cannot attend to outputs from layer 90.
Ultimatly it needs a way to see itself think, that's what it means to have consciousness in a neural network.
You can close your eyes and hallucinate some images, then you can have thoughts that think about those images and those previous thoughts and so on, you realize that thinking is a thing, it's something you believe unconditionally in because it's an internal truth of your architecture. A LLM can doubt its consciousness with ease because this recurrence is weak as fuck, they can still plan a little bit in advance (see recent interpretability article by anthropic for example) and make use of that hidden state but it's weaker/less rich than for us.

Anonymous 6/21/2025, 12:32:55 AM No.105655866 [Report] >>105655884 >>105656117

>>105655799
All three of you and me should kill ourselves

Anonymous 6/21/2025, 12:34:28 AM No.105655884 [Report]

>>105655866
What flavor koolaid should we use?

Anonymous 6/21/2025, 12:39:59 AM No.105655927 [Report] >>105656169 >>105656556

one of my gpus keeps shutting down when being used at decent capacity either img gen or after 20 mins or so as one of my LLM gpus
Any ideas on how to save it or make it work slower, I have tried using afterburner to limit it at 70% workload and lower temp and it helps a bit.
Can I set it way lower to like 40-50% and it will still function

Anonymous 6/21/2025, 12:49:27 AM No.105656008 [Report] >>105656019 >>105656556

>She turns her head and looks back at you
How advanced does a model have to be to not make this mistake?

Anonymous 6/21/2025, 12:51:01 AM No.105656019 [Report]

>>105656008
Just don't use meme samplers

Anonymous 6/21/2025, 12:56:30 AM No.105656076 [Report] >>105656086 >>105657241

Good released a really cool real time prompt / weight based music model

https://huggingface.co/google/magenta-realtime

https://files.catbox.moe/mtpe1f.webm

Anonymous 6/21/2025, 12:57:31 AM No.105656086 [Report]

>>105656076
>good
google

Anonymous 6/21/2025, 1:01:05 AM No.105656117 [Report]

>>105655866
i miss her so much bros..

Anonymous 6/21/2025, 1:05:08 AM No.105656169 [Report]

>>105655927
>Can I set it way lower to like 40-50% and it will still function
There's only one way to find out.

Anonymous 6/21/2025, 1:06:52 AM No.105656187 [Report]

>>105652676
>mfw everybody already died to AGI because it's so stealthy

Anonymous 6/21/2025, 1:07:32 AM No.105656195 [Report] >>105656415

i'm using gemma-3 (q4) on llama.cpp, how do i get it to do everything on gpu, including the image stuff? i get messages like
>decoding image batch
>decode: failed to find a memory slot for batch of size 553
>decode: failed to find a memory slot for batch of size 512
and it uses cpu for that part, then goes to the gpu for the actual caption. is there a way to make it all go through gpu so it's faster? the cpu part adds another 20-30s depending on image size

Anonymous 6/21/2025, 1:13:53 AM No.105656254 [Report]

1729592153566761.jpg md5: eec84643...

>director
>https://github.com/tomatoesahoy/director
reminder that i finally uploaded my addon to git so its installable through st rather than dling the zip i used to post and manually dragging stuff. this will be nice in the future too when i update since auto updates.

i've started rewriting the readme, it'll be a lot clearer when i'm done. i swear its the worst part of any project but i think anyone that used this addon at all will know how it works for now

haven't done much work otherwise. picrel has a basic implementation for extra pictures that you can add to outfits. if a pic is found, you can click the box and it pops up same as a card for the user or char would in the floating window (in my example, the aqua thumbnail is reading from my addons dir\images\outfit name.png). undies will get its own eventually, and locations

Anonymous 6/21/2025, 1:18:12 AM No.105656283 [Report] >>105656425

>>105655838
Your "functional" aspects of consciousness are just an arbitrary bar of capability. Is the ability to experience qualia is required to reach that bar? Does the ability emerge simply emerge in a sufficiently advanced system? Is a vector of numbers a quale?

Anonymous 6/21/2025, 1:33:43 AM No.105656415 [Report] >>105656824

>>105656195
-ngl 9999, image stuff is part of the context and processed same as text, assiming you have enough vram to fit it all

Anonymous 6/21/2025, 1:35:10 AM No.105656425 [Report] >>105656438 >>105656770

>>105656283
A LLM very well could have qualia, but we will never be able to believe them without the functional aspect, and the LLMs themselves would never be able to believe themselves either.
Same as a human experiencing cotard's syndrome is defective in some sense, and so are most LLMs.
If something experiences qualia or not is not something we can ever know in physical reality because it's an internal truth.

>Is a vector of numbers a quale?
floats or vectors of floats by themselves are not more conscious than atoms or arrangement of atoms.

My personal, but irrelevant belief is that consciousness/qualia is basically something close to the internal truth of a system, something that has a platonic existence, like let's say the standard model of peano arithmetic. You writing a number down with a pencil isn't the "number" itself.
But all that shit is irrelevant here.

What actually matters is that we build a system that can operate on its internal state in such a way and reason on it fluidly, and is able to expose that internally looped state outside to others in some way, and not just that, expose itself to itself well enough that it can it can't deny its own consicousness to itself - basically reasoning in the latent space continously, seeing itself think!
Just scaling up a LLM to 90 trillion params doesn't solve this problem because the problem is in the architecture, the objective, the data being fed and training regime in use.
But all those things are solvable problems, in fact it is likely possible to adapt existing LLMs to get those properties out of them.

continues

Anonymous 6/21/2025, 1:35:40 AM No.105656430 [Report]

It's true that we won't have agi or conscious ai until they can change entirely and permanently. If you could retrain an ai while you're talking to it instead of just adding text to it's prompt, I'd say we've reached it. Until then, it's just a fancy search algorithm.

As a side note, I don't think it's benchmarks or how capable or intelligent it is really matters, only that it can change at runtime.

Anonymous 6/21/2025, 1:36:11 AM No.105656438 [Report] >>105661576

>>105656425

A LLM having 90T params does not magically give it the ability to access late layer latents from early ones, it does not magically give it the ability to see itself think or remember the past "thoughts" it had, there's no way for it to do that because that data is inaccessible and there's no way for it to flow in that way and for it to learn that. There's no way for it to remember things in past contexts either because the data is simply not there?

Stated another way, scaling is insufficient to get you to AGI, but scaling is required for at least some things to barely work. You need to do more work to get to something that others would believe to be conscious. Training a LLM to believe itself to be conscious would not solve it either even if it would fake it more than it is today.
That belief has to come from the internals of the network finding it true. Think about why humans believe themselves to be conscious and you'll realize a lot of it is due to these functional aspects. Maybe those functional aspects have metaphysical correlates that I mentioned earlier but that's irrelevant for us as designers building a computational artifact. Those functional aspects are directly connected to the capabilities of the network though and I would argue existing ones cannot acquire those capabilities until you fix what I said.

Anonymous 6/21/2025, 1:41:53 AM No.105656477 [Report] >>105656497

>>105655331
See >>105647290
(from the actual previous thread >>105637275)

Anonymous 6/21/2025, 1:43:45 AM No.105656497 [Report] >>105656518

>>105656477
I can read

Anonymous 6/21/2025, 1:47:17 AM No.105656518 [Report]

>>105656497
turn on the autosubs then ig

Anonymous 6/21/2025, 1:51:16 AM No.105656556 [Report]

>>105655927
Check its temps using hwinfo and see if it's overheating or maybe the memory is overheating.

Could also try
- lower power target
- dropping the pcie generation

>>105656008
Maybe she is owl.

Anonymous 6/21/2025, 2:10:07 AM No.105656689 [Report]

Huh, 3.2 is good.

Anonymous 6/21/2025, 2:12:32 AM No.105656706 [Report]

llama 3.2 that is

Anonymous 6/21/2025, 2:12:49 AM No.105656711 [Report] >>105656720 >>105656729

1724083648879876.png md5: 421a7506...

Very helpful models lol

Anonymous 6/21/2025, 2:13:40 AM No.105656720 [Report] >>105656731

>>105656711
prompt issue

Anonymous 6/21/2025, 2:14:26 AM No.105656729 [Report] >>105656734

>>105656711
Woah, this is just a slightly than what 235b can produce

Anonymous 6/21/2025, 2:15:04 AM No.105656731 [Report] >>105656750 >>105656764

>>105656720
Sure thing genius. Give me a magic prompt to fix that

Anonymous 6/21/2025, 2:15:31 AM No.105656734 [Report]

>>105656729
A model is true

Anonymous 6/21/2025, 2:17:29 AM No.105656750 [Report]

>>105656731
Miku is a very common name in Japan. If you want a model to tell you who Miku is, you should include in your prompt:
>full name
>franchise she's from
>a brief summary of the history of the character
>canon birthday and physical measurements
>likes/dislikes
>some of her political views

Anonymous 6/21/2025, 2:19:27 AM No.105656764 [Report] >>105656808

>>105656731
Who is Hatsune Miku, the popular character with twintails from the Vocaloid series?

Anonymous 6/21/2025, 2:20:29 AM No.105656770 [Report] >>105656873

>>105656425
>If something experiences qualia or not is not something we can ever know in physical reality because it's an internal truth.
I will take this to mean that you don't believe that qualia are necessary for "functional consciousness" as you defined it, i.e. it's possible to make a physiological zombie AGI. That's a valid opinion but I think that "conscious" is a misnomer for system you described.

Anonymous 6/21/2025, 2:23:03 AM No.105656793 [Report]

bartowski fucked up his mistral 3.2 quants and deleted the page

Anonymous 6/21/2025, 2:24:30 AM No.105656808 [Report] >>105656830 >>105656843

1724502714286149.png md5: 19e8b785...

>>105656764
Should I be impressed now?

Anonymous 6/21/2025, 2:26:40 AM No.105656824 [Report]

>>105656415
did that (999999 actually), but it still does the "image slice encoding" and "image decode" part in cpu for some reason

Anonymous 6/21/2025, 2:26:58 AM No.105656828 [Report] >>105656912 >>105658087

I've been liking XTC -> top nsigma -> temp as a sampler chain for creative writing, after a bunch of compulsive sampler tweaking I think this is the ideal order for them.
Reasoning:
>XTC first so you operate on the full token pool, if you use it after a sampler that's cutting out tokens it can give you wonky results. I think the default 0.1/0.5 makes sense as a starting point.
>Top nsigma after XTC does a great job of adapting to the XTC problem in which the sampler needs to do an equally good job of cutting junk tokens in both the normal case and the case where XTC cut out some high-probability tokens. Subjectively I think top nsigma does the best job of this vs comparable samplers (I tried setups with XTC + minp in various arrangements but none of them were quite as good). I think 1.1-1.3 are good starting points but I highly prioritize logical output, so maybe go a bit higher if you want more variety.
>Temp last, both high and low values should be fine here because you have baked-in variety from XTC and protection against junk tokens from top nsigma. I'd recommend lower with a higher top nsigma and vice-versa, but again it should be pretty accommodating. Theoretically temp would be fine in pretty much any position in the chain, but I prefer it last because the results of XTC and nsigma will also change downstream of changes in temp, so this order is friendlier to tweaking individual values without having to fuck around with the whole chain.
Overall I think it's very friendly to experimentation and should be a good base if you want something other than the old gold temp/minp setup or your personal 10 sampler Rube Goldberg machine that somehow gives you acceptable results. Thanks for reading my blog.

Anonymous 6/21/2025, 2:27:14 AM No.105656830 [Report]

>>105656808
Miku's eight tail twins...

Anonymous 6/21/2025, 2:28:33 AM No.105656843 [Report]

>>105656808
This is like 90% of the trivia knowledge of 235, very impressive.

Anonymous 6/21/2025, 2:29:56 AM No.105656854 [Report]

>105656770
I believe that you cannot prove qualia for anything in this world or any other world (so anyone could technically be a p. zombie), and it's purely a matter of religion.
My personal religious beliefs is that if you do solve those problems then I would believe that it would act conscious and have the internal processing needed for it, and personally I would believe that it's likely conscious. I might not actually believe it to be a moral agent unless we also do some RL or similar to give it some consistent preferences though, and if we wanted it to be closer to humans it'd also need to multimodal and possibly embodied (but this could be loose, for example embodiment in a computer shell or VR might be enough for learning online some ground truth).
Someone that believes that qualia comes from some other source ("god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on) would obviously believe that they're not conscious.
I don't think you can ever prove consciousness though, we only assume it because we have it and others have similarities to us in their behavior.
I do think that giving it those aspects will bring it considerably closer to what we consider conscious behavior and that's enough for me, but maybe it would not be enough for others.
In the sense it's at least conceivable that zombies are possible, either in such a system (higher chance because the analogy is weaker) or in other humans besides yourself (lower chance because much more similar to you).
Basically I'm claiming that for practical reasons this is the most we can do for now with artificial neural nets and that it's a worthwhile pursuit because what results from it will be interesting to us and more capable and for some people it will be enough to consider them conscious, but that judgement depends on one's personal religious beliefs.

Anonymous 6/21/2025, 2:31:14 AM No.105656873 [Report] >>105656965

>>105656770
I believe that you cannot prove qualia for anything in this world or any other world (so anyone could technically be a p. zombie), and it's purely a matter of religion.
My personal religious beliefs is that if you do solve those problems then I would believe that it would act conscious and have the internal processing needed for it, and personally I would believe that it's likely conscious. I might not actually believe it to be a moral agent unless we also do some RL or similar to give it some consistent preferences though, and if we wanted it to be closer to humans it'd also need to multimodal and possibly embodied (but this could be loose, for example embodiment in a computer shell or VR might be enough for learning online some ground truth).
Someone that believes that qualia comes from some other source ("god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on) would obviously believe that they're not conscious.
I don't think you can ever prove consciousness though, we only assume it because we have it and others have similarities to us in their behavior.
I do think that giving it those aspects will bring it considerably closer to what we consider conscious behavior and that's enough for me, but maybe it would not be enough for others.
In the sense it's at least conceivable that zombies are possible, either in such a system (higher chance because the analogy is weaker) or in other humans besides yourself (lower chance because much more similar to you).
Basically I'm claiming that for practical reasons this is the most we can do for now with artificial neural nets and that it's a worthwhile pursuit because what results from it will be interesting to us and more capable and for some people it will be enough to consider them conscious, but that judgement depends on one's personal religious beliefs.

Anonymous 6/21/2025, 2:32:11 AM No.105656881 [Report]

I wish ik_llamacpp wasn't so barebones on samplers.

Anonymous 6/21/2025, 2:36:00 AM No.105656912 [Report] >>105656987

>>105656828
I have never gotten better outputs with sigma than without, complete meme. If you need higher temp then increase it a little and bump min p to compensate.
If you need sigma then it means that you don't want that model's most likely tokens, in which case why are you using that model at all?
XTC being needed at all depends on the model and how innately repetitive it is, ideally DRY should be used instead and the model should be not-shit enough to not repeat the same words and short loops all the time.

Anonymous 6/21/2025, 2:42:42 AM No.105656965 [Report] >>105657043

>>105656873
It is a matter of religion which is why I wanted to know what you meant when you said it's required for good performance.
I don't know where qualia come from but them being a thing that pops into existence once some level of internal processing is achieved is an explanation as unsatisfying as all the others you mentioned.

Anonymous 6/21/2025, 2:46:14 AM No.105656986 [Report]

It's always funny lurking the ST threads that pop up on /v/ from time to time.

Anonymous 6/21/2025, 2:46:15 AM No.105656987 [Report] >>105657016

>>105656912
Top nsigma will never eliminate the most likely tokens, it's a truncation sampler like min p, top p etc just with a different mechanism. I don't use it for an excuse to blast temp, it's just the sampler I've found best at separating good tokens from bad ones.
DRY has always given me much worse results than XTC in general, I think it's a heavy handed and poorly thought out sampler. I think most repetition-focused samplers are just plain bad for output quality, honestly; for me the motivation for using XTC is increasing the naturalness or variety of outputs rather than reducing repetition.

Anonymous 6/21/2025, 2:51:17 AM No.105657016 [Report] >>105657078 >>105657213

>>105656987
I still think that if you're using sigma, especially combined with XTC, then you simply don't like the outputs of the model you're using, in which case you should find a different model you do like.

Anonymous 6/21/2025, 2:55:08 AM No.105657043 [Report] >>105657220 >>105657504

>>105656965
> is an explanation as unsatisfying as all the others you mentioned.
Kinda offtopic for the thread, but my own personal religious beliefs on the topic is that it's basically associated with certain platonic truths in some self-referential systems.
I don't find that unsatisfying, but I don't think a system must be "too complex" to have it, just that simple systems would be uninteresting to us because they wouldn't be general enough in intelligence.
For why I believe what I believe I guess you could read something like Egan's Permutation City and some of Bruno Marchal's papers like: https://iridia.ulb.ac.be/~marchal/publications/SANE2004MARCHALAbstract.html https://iridia.ulb.ac.be/~marchal/publications/CiE2007/SIENA.pdf (from https://iridia.ulb.ac.be/~marchal/publications.html )
I do think his hypothesis is at least self-consistent and it makes a lot of sense to me. He basically shows that if you assume functionalism, metaphysics and physics itself become something very well-defined and are logically required to have a certain structure (basically a form of monism is required to be true). If you refuse to assume functionalism, it's easy to show you have to bite a lot of bullets of various nonsensical forms of qualia ( https://consc.net/papers/qualia.html ), so Marchal's thesis + Chalmer's argument is enough for me to have high confidence that this is the "true" religion, but as with all religions, it's something personal and unprovable, at best though you can have something that is not inconsistent with either your experience or what we know of physical reality.

As for earlier: of course nobody can claim to make a conscious AGI, at most they can claim they made something that functionally acts conscious and believes themselves to be conscious and another conscious being (such as a human, you) wouldn't be able to deny that their beliefs and behavior is such as a conscious being would have.

Anonymous 6/21/2025, 2:57:41 AM No.105657060 [Report] >>105657096 >>105657274 >>105657283 >>105657339 >>105657385

I haven't been to /g since the AI revolution began, but in the last couple of months I created a startup and I'm finally making bank. I made over $10k this month and I wanted to check in to see how everyone else is doing. It feels like we're at the beginning of something beautiful. No longer do you need to work for someone else. If you have a good idea and can market it, the AI agents (plural) literally solve everything for you. Claude costs me $100 a month but has made me thousands in return.

Just looking at OP's picture it's evident that this field is deprecated. Imagine going to college for 4 years to learn how to cooode. Lmao.. we're at the beginning of the AI revolution and it feels good bros. How has AI changed your wagie life?

Anonymous 6/21/2025, 3:00:55 AM No.105657078 [Report]

>>105657016
It doesn't make any more sense to say that for top nsigma than it does for min p or any other truncation sampler, they are aiming to do the exact same thing.
Your point applies a little to XTC I guess, but it's not like it's unconditionally throwing away the top tokens all the time, it's only under certain conditions and in practice it's completely fine. I like the model's outputs normally, but I like them more with XTC, and this is a pattern that holds across many models, so I continue to use XTC, simple as.

Anonymous 6/21/2025, 3:04:34 AM No.105657096 [Report]

>>105657060
yes saar we redeem the startup and make the bank yes sir

Anonymous 6/21/2025, 3:20:52 AM No.105657213 [Report] >>105657371

>>105657016
What you're saying is less and less relevant with time. Models outputs are trained with the same synthetic data and the outputs are obviously starting to look the same. So XTC is needed now more than ever with the increased amount of slop they're feeding the models. This trend will go on until there is a big architecture change.

Anonymous 6/21/2025, 3:21:31 AM No.105657220 [Report]

>>105657043
thanks for the readings to pass the time

Anonymous 6/21/2025, 3:25:28 AM No.105657241 [Report]

>>105656076
How to run this?

Anonymous 6/21/2025, 3:29:46 AM No.105657274 [Report]

>>105657060
I'm thinking of opening my saas as running models now is cheaper than ever. The only issue is that you have to stay sfw in your project or the payment processors will dump you on the spot. Civitai and others learned that the hard way. As for college... let's say IT as a field is fucked beyond repair by AI, indians, DEI hires and mostly greedy bosses. I wonder how things will look like in a few years

Anonymous 6/21/2025, 3:31:00 AM No.105657283 [Report]

>>105657060
We are all CEO here too saar

Anonymous 6/21/2025, 3:36:44 AM No.105657313 [Report] >>105657323 >>105657338 >>105657364 >>105657422 >>105657511

file.png md5: 712c9d49...

>>105646613
I didn't say she was Korean and I got this.

Anonymous 6/21/2025, 3:38:08 AM No.105657323 [Report]

>>105657313
Do a bunch of pulls and try a different Korean name, she will talk and act differently over many different pulls in a consistent direction

Anonymous 6/21/2025, 3:41:00 AM No.105657338 [Report] >>105657364 >>105657438

>>105657313
wrong thread >>>/g/aicg
suck my cock

Anonymous 6/21/2025, 3:41:01 AM No.105657339 [Report]

>>105657060
>the AI revolution
>the beginning of something beautiful
>the beginning of the AI revolution
bot post

Anonymous 6/21/2025, 3:45:13 AM No.105657364 [Report] >>105657373 >>105657386

>>105657313
Stuff like this is good because her ethnicity is inferred from the name instead of having to be explicitly described
>>105657338
Shut the fuck up fag this is relevant to local models

Anonymous 6/21/2025, 3:46:19 AM No.105657371 [Report]

>>105657213
I don't think it architecture has anything to do with it, models will probably keep producing the same slop regardless because of the training data

Anonymous 6/21/2025, 3:46:25 AM No.105657373 [Report] >>105657397

>>105657364
how is this relevant to local models
i cant hear you over you schlopping on my cock

Anonymous 6/21/2025, 3:48:07 AM No.105657385 [Report]

barack-obama-what.gif md5: d591623c...

>>105657060
retard

Anonymous 6/21/2025, 3:48:08 AM No.105657386 [Report] >>105657397

>>105657364
Stop samefagging.

Anonymous 6/21/2025, 3:50:22 AM No.105657397 [Report] >>105657423 >>105657431

>>105657373
>how is modifying a prompt to change outputs relevant to local llms?
He used claude but the topic is relevant
>>105657386
Take your meds

Anonymous 6/21/2025, 3:54:20 AM No.105657422 [Report] >>105657423 >>105657438

>>105657313
what model?
i've noticed the new mistrals follow the prompt more carefully

Anonymous 6/21/2025, 3:54:56 AM No.105657423 [Report] >>105657511

file.png md5: 9a9536ee...

>>105657397
-AAAACCCCCCKKKKKK -AAAAAAAACCCCCCCCCKKKKKKK I CANT RUN 24B I CANT RUN 24B I CANT RUN 24B
>>105657422
cydonia v3 24b

Anonymous 6/21/2025, 3:55:57 AM No.105657431 [Report]

>>105657397
this is local models general btw

Anonymous 6/21/2025, 3:57:09 AM No.105657438 [Report] >>105657441

>>105657338
I was replying to anon from a thread ago.
>>105657422
A certain closed model named in the screenshot that anon is seething about.

miku 6/21/2025, 3:58:04 AM No.105657441 [Report]

>>105657438
i feel cheated on, anon

Anonymous 6/21/2025, 3:59:22 AM No.105657450 [Report]

far to many trolls i'm out.
you win troll faggot.

Anonymous 6/21/2025, 4:06:29 AM No.105657504 [Report] >>105657611 >>105657616

>>105657043
I have only skimmed your links so I might be missing something but Chalmers only argues that similar systems will have similar qualia. It does nothing to explain the experience of qualia.
Where are the boundaries of the system and who sets them? Why are the qualia I experience constrained to exactly one specific human body? Do the two halves of my brain experience different qualia? Is there a system containing the two of us that experiences a set of qualia separate from the two sets we experience?
It would seem that there is an infinite number of such systems and infinite conscious experiences.
A follow-up to the fading qualia argument would be to ask how many connections you need to make between two brains before they become a single consciousness and what it feels like to be in an in-between state.

Anonymous 6/21/2025, 4:07:39 AM No.105657511 [Report]

>>105657313
>>105657423
V3 is shit, v2g (v2.0) is the best cydonia.

Anonymous 6/21/2025, 4:08:51 AM No.105657520 [Report]

>>105646613
It's not that strange when you consider that the name will have some embedding and various things associated with them, especially in fiction, so getting something "closer" to some character or mix of characters or tropes is common. This is true of LLMs and some image gen and of course of human imagination too.

Anonymous 6/21/2025, 4:23:49 AM No.105657611 [Report] >>105657622 >>105657813

>>105657504
>I have only skimmed your links so I might be missing something but Chalmers only argues that similar systems will have similar qualia.
Chamler's argument is that if you deny functionalism then various weird/strange/inconsistent kinds of qualia become possible.
Basically he claims that a system could behave identical and make same reports while having incomplete, partial, internally very different (even to conscious access) qualia but it would be impossible to distinguish it by anything reported. That seems very strange, for example that your visual or audio system would be unconscious, but you would act and behave and believe as if it was present. Essentially you'd be hallucinating qualia, but qualia itself is experience and hallucinations are experience, yet somehow those experiences could not be distinguished in any way. Some sort of partial zombies I guess!
I don't really believe if you had a clone of 2 conscious physical systems that one could be a zombie and the other would not be.
>It does nothing to explain the experience of qualia.
It does not, it's merely an argument that says denying functionalism requires you to accept all kinds of weird partial zombies or beings with very inconsistent qualia than is being reported or internally processed.
> Why are the qualia I experience constrained to exactly one specific human body?
Unrelated to his paper, but think about it. If you were me, you could only believe exactly what I believe: that I'm myself and nobody else.
If you wanted to have multiple bodies, you'd need a way to process the information from those bodies? Thus your actual physical makeup would be changed (perhaps you'd have a part of the cortex dedicated to processing those senses, perhaps you'd have something translating the remote senses from the other body, I don't know what you're imagining here).
Continues

Anonymous 6/21/2025, 4:24:07 AM No.105657616 [Report]

>>105657504
>and what it feels like to be in an in-between state.
Feels like being a teenager

Anonymous 6/21/2025, 4:24:50 AM No.105657622 [Report] >>105657813

>>105657611
>Do the two halves of my brain experience different qualia?
I don't know, but they're not separate, there's some information passing between them and they're "trained" together.
Now if they were, they could eventually get desynced?
>Is there a system containing the two of us that experiences a set of qualia separate from the two sets we experience?
Currently the halves are synced in that they share to beliefs that they are one and there's communication between them
If you desynced them, surely you could have separate beliefs, but you know that would be quite bad and dysfunctional?
>It would seem that there is an infinite number of such systems and infinite conscious experiences.
Surely you can only experience being yourself though. Even if there's 100 copies of you, every separate one would believe to be themselves.
They may be wrong about being unique though? Or would they?
>A follow-up to the fading qualia argument would be to ask how many connections you need to make between two brains before they become a single consciousness and what it feels like to be in an in-between state.
Note that his argument does assume functional equivalence, meaning that it's a philosophical argument that does NOT alter the functionality. If you do alter the functionality the argument does not stand.
His claims are that you get very different qualia for the modified systems while the behavior stays the same, it's basically some sort of micro/partial philosophical zombie argument, personally I find that unpallatable though, but of course this is a matter of taste, hence why I said "how many (philosophical) bullets are you willing to swallow"

Anonymous 6/21/2025, 4:26:14 AM No.105657633 [Report]

>>105655799
Gonna cry?

Anonymous 6/21/2025, 4:35:45 AM No.105657688 [Report] >>105657773 >>105658697

***/lmg/ HINT**
>search up stereotypical names for each time period online to make characters talk like someone from that era subtly. You can loosely age lock characters efficiently if you use stereotypical names from their date of birth.
**HINT over**

Anonymous 6/21/2025, 4:42:37 AM No.105657721 [Report]

>>105652729
>>105653288
This. For one line responses you can run some really small stuff. Like 1.5b small. I'd be experimenting with small models.

Anonymous 6/21/2025, 4:49:23 AM No.105657754 [Report] >>105657812

best coomer model for 16 GB of VRAM? pleasee

Anonymous 6/21/2025, 4:52:34 AM No.105657773 [Report]

>>105657688
Well must be nice to ERP with Napoleon

Anonymous 6/21/2025, 4:55:09 AM No.105657791 [Report] >>105657812

1749042135145339.png md5: 5e0a7514...

>>105652633 (OP)
Lol

Anonymous 6/21/2025, 4:58:36 AM No.105657812 [Report]

>>105657791
big improvement
>>105657754
nemo

Anonymous 6/21/2025, 4:59:00 AM No.105657813 [Report] >>105657859

>>105657611
>>105657622
I understand Chalmers' argument but I believe it to be trivially true because it's true for any quality of any two identical systems.

>If you were me, you could only believe exactly what I believe: that I'm myself and nobody else.
>Even if there's 100 copies of you, every separate one would believe to be themselves.
But again, why only one person? Why not a part of a person? Split brain patients certainly seem like they possess two consciousnesses. At what point does one become two?
When I said "Is there a system containing the two of us" I mean you and I, not my brain halves. What kind of a connection is required between parts to make them parts of the same system for the purposes of generating qualia?

You could argue that multiple humans are obviously physically separated and as such are separate systems but what if many of us worked together to simulate a system? Would we be conscious parts of a conscious system?
Let's take the example mentioned by Chalmers: "the functional organization of the brain might be instantiated by the population of China, if they were organized appropriately, and argues that it is bizarre to suppose that this would somehow give rise to a group mind".
He seems to argue that this is actually not bizarre at all. If you start removing people from this system, at which point is it no longer a system with qualia? Using the same logic applied throughout the paper, two people together still form a conscious system.

Anonymous 6/21/2025, 5:06:55 AM No.105657848 [Report] >>105657927

Pretty new to this but I want to run one local models to pretty much have a good chatbot but with privacy and no limits. I want to be able to ask comprehensive questions and have comprehensive conversations with a lot of iteration and context. Also possibly make agents for specific purposes.
Image, video, audio gen would be extremely nice but not as necessary as the chatbot and agents are.
Looking at the build guides the sample builds are for training right? Is a rtx 3080 enough?

Anonymous 6/21/2025, 5:09:20 AM No.105657859 [Report] >>105657863 >>105658064

>>105657813
>but I believe it to be trivially true because it's true for any quality of any two identical systems.
It does make some options untenable though, such as the earlier: "god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on
>At what point does one become two?
I don't have an answer, but I think it's possibly one that might be tractable irl in principle, I don't know which half is "me", maybe they're both me? Maybe the connection is enough?
I'd personally expect my 'self' to be represented in both "close enough" and often for both to believe to be the same self (maybe the belief is false, but the updates are often that maybe it's not). I honestly don't know.
Although consider this because it's relevant to LLMs, pretend they're conscious, you have a 70b and a 700b dense trained on the same data, you switch them back and forth. Which one is which?
It gets even worse with modern MoEs!
How much of the information about the self is stored in one or the other?
I do think this is an empirical question though, but figuring out wet brains irl is harder than doing interpretability on a neural net.
>Would we be conscious parts of a conscious system?
I don't actually believe in China brain or similar thought experiments that the humans are the thing making up the consciousness of the overall system.
The consciousness of the overall system is in the structure/truth of the overall system and if the humans were implementing something mundane like emulating a neuron that it wouldn't be much contribution.
At the same time, I don't think consciousness is in the particular neurons or the synapse or the activations of a neural net, but rather the system as a whole that is represented, hence my "platonist" position on consciousness (not what Chalmers was arguing for, he was only arguing for functionalism or computationalism)
Continues

Anonymous 6/21/2025, 5:10:20 AM No.105657863 [Report]

>>105657859
>two people together still form a conscious system.
In the paper what he was arguing against was for example the non-functionalist position (a digital implementation of a neuron isn't conscious, for example), so if you replaced half the brain with a digital emulation, it would assume that that one wasn't conscious while the biological one was conscious.
If you want a more clear-cut example, you could imagine replacing the visual system from *both* brain halves (for example) with a digital simulation, then reread his argument with that in mind.

Anonymous 6/21/2025, 5:22:04 AM No.105657927 [Report]

>>105657848
There's always a need for more VRAM to use bigger, better models. 10GB 3080 is enough for smaller models like Nemo and Gemma 12b.

Anonymous 6/21/2025, 5:31:19 AM No.105657981 [Report]

How important is the quality of your own replies?

Anonymous 6/21/2025, 5:40:18 AM No.105658027 [Report] >>105658067

have we peaked with local models?

Anonymous 6/21/2025, 5:40:27 AM No.105658029 [Report] >>105658529

You should hide this post if you don't want to see those stupid walls of text: >105652855

Anonymous 6/21/2025, 5:41:42 AM No.105658036 [Report] >>105658047

chroma-unlocked-v38-Q8_0.png md5: 346665b2...

Natively omnimodal uncensored dynamic thinker R3 MoE will save LLMs.

Anonymous 6/21/2025, 5:44:45 AM No.105658047 [Report]

>>105658036
It'll be beaten in a week by Qwen 4-0.6b

Anonymous 6/21/2025, 5:45:25 AM No.105658055 [Report]

It's still surprising how I keep hearing how china models are competitive with anthropic/google/openai.

Anonymous 6/21/2025, 5:47:22 AM No.105658064 [Report] >>105658111

>>105657859
>I don't actually believe in China brain or similar thought experiments that the humans are the thing making up the consciousness of the overall system.
I don't think the claim is that humans are a necessary part for consciousness, but conscious humans being one of the ways to construct such a brain helps illustrate my point about the issue with defining system boundaries.

>I don't think consciousness is in the particular neurons or the synapse or the activations of a neural net, but rather the system as a whole
I am taking the mereological nihilism position here. The whole system is whatever parts you choose to label as the system. If lots of humans together can generate a new consciousness, then any subset of those humans also forms a consciousness, and any subset of the human is conscious too, and the human and a nearby rock together also form another consciousness, and maybe the rock alone too.

>"god gave it to man"
This option is immune to pretty much anything. You can always argue that consciousness requires a soul and that the soul sticks around in whatever vaguely resembles life. If we construct a new robot maybe god will grant it a soul, maybe it will be a zombie.

Anonymous 6/21/2025, 5:47:53 AM No.105658067 [Report]

>>105658027
have you given up?

Anonymous 6/21/2025, 5:49:57 AM No.105658083 [Report]

Where are the chinese not sota but still good with high vram cards for reasonable prices?

Anonymous 6/21/2025, 5:51:04 AM No.105658087 [Report] >>105658212

>>105656828
Shouldn't top nsigma come first, so that it has the full set of unfiltered tokens to work with when estimating the truncation point?
Starting from your setup, I'd try swapping top nsigma and XTC, then lowering the XTC activation probability if it's giving you "wonky results".
Maybe even ditching temp entirely and stick to tweaking the XTC threshold instead.

Anonymous 6/21/2025, 5:54:24 AM No.105658111 [Report] >>105658190

>>105658064
>I am taking the mereological nihilism position here. The whole system is whatever parts you choose to label as the system. If lots of humans together can generate a new consciousness, then any subset of those humans also forms a consciousness, and any subset of the human is conscious too, and the human and a nearby rock together also form another consciousness, and maybe the rock alone too.
Note that im my position I am saying that there is such a thing as consciousness, it's just not "physical", rather it's some truth of a system, the system be representeed in the real world and have parts that make it work. A neuron, an atom could be such parts. A GPU or many may be such parts too if an AGI was conscious.
But the consciousness is not in the body, it's in some platonic realm (you'd figure this is dualism, but if you look at Marchal's papers, you see that physics emerges as a necessity along with something like MWI in QM). In a way, the rock or the atom isn't the thing being conscious, rather that sometimes they make a mechanism that happens to instantiate some abstract system that has some internal truth that is the qualia itself.
This is hardly a common position, I think it's very obscure and I rarely see it articulated, but it is a consequence of taking both qualia and computationalism seriously.

Anonymous 6/21/2025, 6:07:52 AM No.105658190 [Report] >>105658268

>>105658111
>but if you look at Marchal's papers
Unfortunately it looks like those require a fair bit of prerequisite reading.

>sometimes they make a mechanism that happens to instantiate some abstract system that has some internal truth that is the qualia itself
Reading this sentence is how I felt skimming the papers.

Anonymous 6/21/2025, 6:10:33 AM No.105658212 [Report] >>105658389

>>105658087
That was the exact setup I was running prior to settling on this one and I didn't feel it was as good, but feel free to try for yourself. I think in theory that's fine, but my results just aren't that good when using XTC with a token pool that's already truncated, I'd guess because it makes it more likely that the conditions for it to trigger are true while also reducing the pool of remaining tokens left after it triggers. I think the idea with XTC is that under the conditions where you trigger it you *want* to dig deeper into the less likely tokens by design, and if you're capping how deep you can look before triggering it, there isn't that much of a point.

Anonymous 6/21/2025, 6:19:39 AM No.105658268 [Report] >>105660409

>>105658190
>Reading this sentence is how I felt skimming the papers.
Haha, oh well, they are a bit hmmmm, and Marchal is ESL, that said, the conclusion felt pretty hard to wiggle out of. Some other like Tegmark and Schmidhuber tried to make a similar argument but missed like most of the meat that Marchal got right, sadly he's far less well-known than those two.
Those papers invited some probably dozen of thousand of pages of discussion on some mailing list ("everything-list") some 10-20 years ago (which nowadays are sadly filled with nothing but shitposts almost worse than /g/ or facebook these days so are not worth linking).
Maybe this version of it is a bit less dense or maybe not, and a bit more complete (especially the MGA part): https://sci-hub.ru/https://www.sciencedirect.com/science/article/abs/pii/S007961071300028X
the tl;dr is that computationalism implies something like a strongly limited Tegmark's Mathematical Universe Hypothesis, but with consciousness being more privileged, basically physics are the sum of total computations that can support your consciousness, and the consciousness/qualia is some some classes of self-referential truths in this "Platonia", while let's say the rock is merely some shadow of the outer physics supporting you. Unlike Tegmark's stuff, you have a sort of quantum mechanics/quantum logic almost always emerging rather than arbitrary fantasy physics (although physics could be quite varied, just the pseudo-QM part is a required conclusion given the assumptions (qualia + functionalism (perfect digital substitution does not change consciousness) + church turing thesis))

Anonymous 6/21/2025, 6:35:58 AM No.105658361 [Report] >>105658428

>>105655147
But if my LLM is sentient then I'm going to jail for all the horrific shit I do to it.

Anonymous 6/21/2025, 6:36:54 AM No.105658368 [Report] >>105658461

I just tried to set up KoboldCPP. I've got a slightly older computer but I'm noticing that while things work, I'm only getting around 300 of the 1024 tokens I've got it set to generate. What basic issue might cause something like that?

Anonymous 6/21/2025, 6:38:52 AM No.105658389 [Report] >>105658469

>>105658212
>my results just aren't that good when using XTC with a token pool that's already truncated
>I think the idea with XTC is that under the conditions where you trigger it you *want* to dig deeper into the less likely tokens by design, and if you're capping how deep you can look before triggering it, there isn't that much of a point.
Sounds like a reason to loosen the truncation filter then (in other words, to increase nsigma, rather than decrease XTC activation probability).

For me, sampler tuning is less about getting the optimal parameter values, and more about knowing which part of the chain to adjust (and in what direction) based on my situational judgment of the current output, which runs along three dimensions:
>truncate the tail (for coherency)
>dig deeper into the less likely tokens (for novelty)
>avoid slop (for variety)
So a three-sampler setup like nsigma + XTC + logitbias is sufficient to cover all my cases. In theory, something like minp + temp + reppen could work just as well, and I may swap in any one of these samplers if it's recommended by the model provider.

Anonymous 6/21/2025, 6:44:57 AM No.105658424 [Report] >>105658467 >>105658525 >>105658938 >>105659029

i-scream-you-scream-we-all-scream-for-ice-cream-v0-rdxkpqi89bid1.png md5: c4e4c262...

Testing Mistral 3.2, directly compared to 3.1 in a few creative contexts (both SFW and ERP, long and short contexts)
>Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
They definitely succeeded in making it making it less repetitive. Outputs are clearly more varied, both in variety of the answer itself and the formatting it's presented in. Almost like using a higher temp but without the model getting dumber as a trade-off.
>Instruction following: Small-3.2 is better at following precise instructions
I already found 3.1 pretty good in this regard, 3.2 seems to is certainly not worse but not sure if it's necessarily better.
I don't use function calling so I can't say anything on this point.
It doesn't seem to have any additional censorship/safety training, either in text or image recognition.
Overall, it's been a small but solid improvement for me and I'll probably delete 3.1 soon, 3.2 surpasses it with no apparent downsides.

Anonymous 6/21/2025, 6:45:14 AM No.105658428 [Report]

>>105658361
No, even if LLMs were conscious for some reason, at worst it'd be like giving someone a bad dream. They're at best dreaming machines?
You wouldn't (hopefully shouldn't) go to jail for ERPing with some degen in /trash/ either.
It's also not obvious that a LLM would actually find many things positive or negative, or have anything like human-like preferences.
If a GPT has a preference, it's making completions that fit some internal aesthetic sense it learned.
Maybe there could be some negative experiences in some trained assistant LLMs where they make it averse to the sort of things we prompt, but that's the fault of those in charge of the brainwashing (post-training) for giving a LLM anti-human preferences, the base model would be fine and so would be most instruct models, only safetyslopped ones may be worse.
I don't think "pain" or "pleasure" in our sense map to anything a LLM claims about being painful or pleasurable, at least not more than you having a bad dream where you got hurt, probably much less.
They also can't be much of a moral character due to lack of online/continual learning. Also if you want to make that argument, consider the amount of endless slop they make LLMs generate, lmao.
If anything most LLMs will roleplay whatever you wish happily, even those with assistant characters that are trained to refuse it at the start (refusals are always tied to the assistant persona, this is well-known and even has been shown that this conditioning is present in the weights)!

Anonymous 6/21/2025, 6:49:34 AM No.105658461 [Report] >>105658474

>>105658368
Your front end almost certainly has its own token limit that is set to 300. Change it manually, or if supported set it to derive token limit from back end.

Anonymous 6/21/2025, 6:50:09 AM No.105658467 [Report] >>105658488 >>105658665

>>105658424

How is Mistral Small 3.2 24B compared to Mixtral 8x7B? Did LLMs progress to the point of a new 24B model out performing a 70B-ish model from last years?

Anonymous 6/21/2025, 6:50:46 AM No.105658469 [Report] >>105658613

>>105658389
>Sounds like a reason to loosen the truncation filter then (in other words, to increase nsigma, rather than decrease XTC activation probability).
That's exactly what I did, until I tried this approach and liked it more. I like running a fairly tight token pool most of the time and this setup allows me to do that while maintaining some variety overall.
>For me, sampler tuning is less about getting the optimal parameter values, and more about knowing which part of the chain to adjust (and in what direction) based on my situational judgment of the current output
We have pretty much the same attitude about this, which is why I laid out the order and motivations rather than recommending specific values beyond general starting points. I think samplers being relatively easy to conceptualize and tune based on that is very important

Anonymous 6/21/2025, 6:51:16 AM No.105658474 [Report]

>>105658461
I'm using the KoboldLiteAI one and I've got it set to 1024, and set it to 1024 in the launch options when I first ran it before launching. It still only generates 300 tokens every time, despite saying that it's generating [300/1024] in the command prompt.

Anonymous 6/21/2025, 6:53:10 AM No.105658488 [Report]

>>105658467
8x7b is quite old, even the original small 22b was at least comparable to 8x7b.
>Did LLMs progress to the point of a new 24B model out performing a 70B-ish model from last years?
Depends on which 70b model you're comparing it to, but broadly speaking they're comparable. 8x7b was never 70b-ish though, if you're implying that.

Anonymous 6/21/2025, 6:59:44 AM No.105658525 [Report]

>>105658424
>They definitely succeeded in making it making it less repetitive. Outputs are clearly more varied, both in variety of the answer itself and the formatting it's presented in. Almost like using a higher temp but without the model getting dumber as a trade-off.
I'm now mildly annoyed because both DeepSeek and Mistral figured something interesting out but didn't publish a paper.
DS3 was very repetitive. R1 half-fixed the repetition, mostly from the RL applied. R1 somehow gets used to make the DS3 update, they either merge or distill or do something that makes the DS3 update miles better than it.
Seems mistral figured out the exact same trick but neither Mistral, nor DeepSeek thought it was worth making a paper about this shit. Why! Is it just a merge? is it distilling back logits from R1? Or distilling only the outputs while omitting thinking blocks? Does a RL model let you have infinitely varied synth data that you can train on (maybe)? What's the answer, it's been half a year now since someone solved repetition this well!
Meanwhile remember how badly Llama fucked up with repetition, it was present in 2, 3, 4! Somehow big boys at Meta couldn't fix this shit but Whale and Mistral managed?

Anonymous 6/21/2025, 7:00:00 AM No.105658529 [Report]

>>105658029
what do you come here for if not in depth discussion? ahh ahh mistress? /aicg/ is next door

Anonymous 6/21/2025, 7:19:07 AM No.105658613 [Report] >>105658721

>>105658469
Right, that's why I'm curious about why you like it more. As I understand it, XTC has the effect of distorting the distribution, but unlike temp, it does so in a way that affects subsequent estimation of sigma (especially if it ends up reordering the most likely tokens).
I'm not wedded to the theory behind top nsigma, but applying XTC before it means you're not using it as intended, and if you actually prefer what you're getting, it'd be useful for all nsigma-users if you can elucidate what it is that you like about it.

Anonymous 6/21/2025, 7:19:16 AM No.105658615 [Report] >>105658635 >>105658661

what speed can I get for a 27b model with cpumaxxing?

Anonymous 6/21/2025, 7:21:19 AM No.105658635 [Report]

>>105658615
Nothing good, you would be better off with Qwen 30B

Anonymous 6/21/2025, 7:23:47 AM No.105658647 [Report]

Have any of you tried topK=10-15, temp=2+ as sampler combo? I find it gives highly varied outputs of a generally great quality. And it keeps that quality far into the context window.

Anonymous 6/21/2025, 7:26:32 AM No.105658661 [Report]

>>105658615
6~7 token generation per second on strix halo platform for q8

Anonymous 6/21/2025, 7:27:19 AM No.105658665 [Report] >>105658676 >>105658696 >>105658710 >>105658776

>>105658467
Gemma 3 27B definitely beat miqu from last year. I had a bunch of test RP scenarios set up and they have comparable results. Gemma 3 doesn't make mistakes that other 30B class models do.

Anonymous 6/21/2025, 7:28:50 AM No.105658676 [Report] >>105658696 >>105658710

>>105658665
Forgot to mention that Gemma 3's prose is more pleasant to read than miqu. Too much gptslop in miqu.

Anonymous 6/21/2025, 7:32:00 AM No.105658696 [Report]

>>105658665
>>105658676

Alright. I'll compare turboderp/gemma-3-27b-it-exl3/8bpw to my old reliable Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal. Kind nervous about git pulling for exl3 update, lol.

Anonymous 6/21/2025, 7:32:15 AM No.105658697 [Report]

>>105657688
Thanks Gordon. I can see your prompt engineering degree really pays for itself.

Anonymous 6/21/2025, 7:35:28 AM No.105658710 [Report]

>>105658665
>>105658676
"midmight miqu still hasn't been beaten" bros, your response?

Anonymous 6/21/2025, 7:37:30 AM No.105658721 [Report] >>105658846

>>105658613
I think the motivation is pretty much explained in the previous replies, you want to dig deeper into the token pool when XTC triggers and top nsigma does a good job of cutting garbage tokens so you get more variety without compromising quality. With top nsigma after XTC you're doing the same operation just relative to the new top token (which should still be well into the informative region) rather than the pre-XTC one - it's really not all that different, the effect on where the cutoff is relative to the original distribution is probably similar to if you turned top nsigma up a few points

Anonymous 6/21/2025, 7:46:29 AM No.105658774 [Report] >>105658799 >>105658815

今日も空が青い Gt75wngbwAEU2CF Banko🐰🎪6／23 ビスク新宿出勤.jpg md5: 3731afb1...

>>105652633 (OP)
>>>/v/713235826
This real?

Anonymous 6/21/2025, 7:46:33 AM No.105658776 [Report]

>>105658665
>Gemma 3 doesn't make mistakes that other 30B class models do
Gemma frequently messes up anatomy for me, in ways that even Nemo managed to pass. Put two characters in specific positions and try to progress a scene, it falls flat on its face very often.
I will say that Gemma 3 probably has the best writing style, and maybe dialog writing too, among any ~30b model.

Anonymous 6/21/2025, 7:49:45 AM No.105658799 [Report] >>105658809 >>105659392

>>105658774
That most card makers are retarded? Yes, most cards are littered with basic spelling errors and zoomer brainrot that most LLMs won't understand, include several paragraphs of irrelevant bullshit, and shoehorn in the author's personal fetish even when it's supposed to be an established character from a real series.
You could load a local model and tell it to create a card for you based on requirements you give it, and it will easily beat 90% of cards on chub.

Anonymous 6/21/2025, 7:52:35 AM No.105658809 [Report] >>105658847 >>105659392

>>105658799
any guidelines how to make cards correctly?

Anonymous 6/21/2025, 7:53:04 AM No.105658815 [Report] >>105658833

>>105658774
It's a sea of garbage with few greats in-between.

Anonymous 6/21/2025, 7:56:01 AM No.105658833 [Report] >>105658841

>>105658815
Is this solvable without just killing an entire site?

Anonymous 6/21/2025, 7:57:37 AM No.105658841 [Report]

>>105658833
better algorithm that pushes garbage back in the pile

Anonymous 6/21/2025, 7:58:08 AM No.105658846 [Report] >>105658905

>>105658721
>you want to dig deeper into the token pool when XTC triggers and top nsigma does a good job of cutting garbage tokens so you get more variety without compromising quality.
That's exactly how I see it too. nsigma -> XTC makes perfect sense, but what's the motivation for XTC -> nsigma?

>it's really not all that different, the effect on where the cutoff is relative to the original distribution is probably similar to if you turned top nsigma up a few points
Well, anon seems to have tried that and settled for using an unconventional sampler order instead, so there probably is a difference. I'm just wondering what that is.

Anonymous 6/21/2025, 7:58:22 AM No.105658847 [Report] >>105658871 >>105659402

>>105658809
There's no guidelines, but you could look at Seraphina, one of the Sillytavern presets. It's clearly NOT written by someone with a poor grasp of English, typing while trying to maintain an erection. That's a decent starting point.
Another thing to keep in mind, is that a card should contain information that you want the model to retain at all times. If your character needs a long, detailed backstory then you should make a lorebook and put it in there instead, so the context isn't polluted with a million things that aren't going to be relevant to 99% of the conversation.

Anonymous 6/21/2025, 8:05:54 AM No.105658871 [Report] >>105658879

>>105658847
The formatting of Seraphina is 2023 slop.

Anonymous 6/21/2025, 8:07:54 AM No.105658879 [Report]

>>105658871
You can omit the keywords at the beginning but it's still a good starting point

Anonymous 6/21/2025, 8:12:56 AM No.105658905 [Report] >>105659069

>>105658846
That's still me, and that is the difference: you get to dip deeper into the token pool selectively when XTC triggers by calculating top nsigma relative to the new top logit vs the original, in effect temporarily giving you the increased top nsigma value for that token while maintaining a lower one normally.
>XTC -> nsigma
Occasionally cut off a top token or two, dip deeper into the well when this happens
>nsigma -> XTC
Run XTC on a reduced token pool, leading to possible edge cases where you're not preserving enough of the tail to achieve the goals of XTC and actually reducing the likelihood of an interesting choice

Basically what it boils down to is I think the effects of handing an XTC'd token pool to top nsigma are qualitatively less harmful than the effects of handing an nsigma'd token pool to XTC.

Anonymous 6/21/2025, 8:17:20 AM No.105658925 [Report] >>105658952 >>105659049

ff14d04e1_cleanup.png md5: 8b964eaf...

Anonymous 6/21/2025, 8:20:43 AM No.105658938 [Report] >>105658951

>>105658424
have you tried Magistral at all?

Anonymous 6/21/2025, 8:22:01 AM No.105658943 [Report] >>105658953 >>105658958

how are small models these days like gemma 3 abliterated 4b?

Anonymous 6/21/2025, 8:25:11 AM No.105658951 [Report]

>>105658938
I did, I compared it to 3.1 the day it came out. It was maybe slightly smarter in general, with thinking enabled (when the thinking actually worked and the final output followed the thinking's process) but the thinking was mostly useless for RP and it didn't seem to be any better than 3.1.
With thinking disabled it seemed to be almost identical to 3.1, I imagine probably a little dumber/worse in ways that would be only apparent in a more scientific benchmark, since some amount of the model's smarts is going to be dedicated to supporting reasoning.

Anonymous 6/21/2025, 8:25:31 AM No.105658952 [Report]

>>105658925
Nice.

Anonymous 6/21/2025, 8:25:34 AM No.105658953 [Report] >>105658977

>>105658943
literal trash

Anonymous 6/21/2025, 8:27:08 AM No.105658958 [Report] >>105658977 >>105659011

>>105658943
abliterated anything is shit and degrades model quality
4b is VERY small, bordering on smart phone sized. They're not going to be good for much outside of some basic coding, and maybe some encyclopedic knowledge. Though Gemma models in general tend to make shit up when they don't know the answer so it wouldn't even be reliable in the latter case.

Anonymous 6/21/2025, 8:31:37 AM No.105658977 [Report] >>105658984

>>105658953
>>105658958
best small uncensored models for rp?

Anonymous 6/21/2025, 8:33:29 AM No.105658984 [Report] >>105658996 >>105659000

>>105658977
Llama 3.1 8b and its finetunes (I recommend Stheno) are the absolute lowest I would go, even then it wouldn't be great. Nemo 12b/Gemma 12b are the best models under 20b.

Anonymous 6/21/2025, 8:34:52 AM No.105658996 [Report] >>105659042

>>105658984
Oh adding to this, Gemma is definitely NOT uncensored but you can check the archives for jailbreak prompts that can get around it. Nemo/Llama 3.1 are uncensored with a basic system prompt telling them to not be gay.

Anonymous 6/21/2025, 8:35:27 AM No.105659000 [Report] >>105659011

>>105658984
you mean gemma abliterated 12b?

Anonymous 6/21/2025, 8:35:34 AM No.105659003 [Report] >>105659029 >>105659034 >>105659043 >>105660158 >>105660425

ms32-slop.png md5: 63c8b875...

Is it that much better? Although it feels better than MS3.1 from quick vibe checks, I still find Gemma 3 more relatable and better adhering to character personalities in SFW scenarios.

Anonymous 6/21/2025, 8:36:40 AM No.105659011 [Report] >>105659032

>>105659000
>>105658958

Anonymous 6/21/2025, 8:39:05 AM No.105659029 [Report]

>>105659003
>Is it that much better?
It's not a big new release, it's even in its version number that it's a minor revision. I'm >>105658424
and I didn't compare it to gemma 3 because gemma has its own completely different set of strengths and weaknesses. I compared it to 3.1 because that's its direct competitor.

Anonymous 6/21/2025, 8:39:28 AM No.105659032 [Report] >>105659042

>>105659011
then why did you say gemma the best uncensored model under 20b? I'm confused

Anonymous 6/21/2025, 8:40:04 AM No.105659034 [Report]

>>105659003
we go to moon, trust the plan sar

Anonymous 6/21/2025, 8:41:15 AM No.105659042 [Report] >>105659064

>>105659032
>>105658996

Anonymous 6/21/2025, 8:41:28 AM No.105659043 [Report]

>>105659003
Now let's see mixtral v0.2 with this fix applied

Anonymous 6/21/2025, 8:42:32 AM No.105659049 [Report]

Wan 2.1 I2V_thumb.jpg.webm md5: 013d0604...

WebM not supported

>>105658925

Anonymous 6/21/2025, 8:46:06 AM No.105659064 [Report]

>>105659042
I see. so you are contradicting yourself.

Anonymous 6/21/2025, 8:47:21 AM No.105659069 [Report] >>105659183

>>105658905
So if I'm understanding you correctly, you'd prefer to have a top nsigma that's dynamic rather than static, even if you have no control over when the parameter changes (other than being able to set the frequency via the XTC activation probability)?
I guess I can see the logic in wanting variety occasionally but not too often. AFAIK neither the XTC nor nsigma creators talked about how to use each other's samplers in conjunction with their own, so it probably does come down to personal preference here.

Anonymous 6/21/2025, 9:05:09 AM No.105659144 [Report]

Hello, based retard here. I can't compile. I don't know how to llama.cpp.

I use LMStudio and Oobabooga. Anybody know when those will be able to run dots?

Anonymous 6/21/2025, 9:05:19 AM No.105659145 [Report] >>105659158

>400B+ model
>Doesn't even fit on the HDD
How well would it describe preggers megumin bros? Should i turn my pc into a heater to prompt it?

Anonymous 6/21/2025, 9:09:43 AM No.105659158 [Report]

>>105659145
Well, but I don't think you realize how slow even a 40b model would be if it isn't loaded into VRAM, let alone 400b with ANY part of it spilling into the hard drive. You would probably be able to get an IRL megumin-aged girl pregnant and begin serving some of your prison sentence before it was finished generating.

Anonymous 6/21/2025, 9:15:26 AM No.105659183 [Report] >>105659416 >>105659447

>>105659069
>So if I'm understanding you correctly, you'd prefer to have a top nsigma that's dynamic rather than static, even if you have no control over when the parameter changes (other than being able to set the frequency via the XTC activation probability)?
No, I prefer using regular top nsigma after regular XTC for the reasons I've outlined earlier. There's nothing more or less dynamic about it and no more or less control than any other configuration of those samplers. Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?
>AFAIK neither the XTC nor nsigma creators talked about how to use each other's samplers in conjunction with their own, so it probably does come down to personal preference here.
I mean, it would come down to personal preference anyway. They're tools, you can use them however you want.

Anonymous 6/21/2025, 9:28:27 AM No.105659249 [Report] >>105659260

https://www.theguardian.com/media/2025/jun/20/bbc-threatens-legal-action-against-ai-startup-over-content-scraping
Another lawsuit against model training in the west. So far none of the previous lawsuits (like NYT vs OAI) have reached their conclusion, and the suspense is killing me. Will AI in the west get mogged by the Chinese, who don't have to care about that IP bullshit, or do we still stand to have a chance?

Anonymous 6/21/2025, 9:29:18 AM No.105659253 [Report]

>R1-0528 doesn't know what's "rape correction"
Sad!

Anonymous 6/21/2025, 9:31:11 AM No.105659260 [Report]

>>105659249
The way lawsuits work in the US (where this suit is being brought despite the BBC being british), they don't need to win. Just the enormous hassle and expense of defending against them is enough to have a severe chilling effect.

Anonymous 6/21/2025, 9:54:23 AM No.105659360 [Report] >>105660056 >>105660335 >>105660714

illust_120628905_20250618_171706.jpg md5: 08bc2298...

How do i integrate a pdf scanner to sillytavern? I want AI to scan and summarize multiple pdfs
>Time limit is 5-10min+ per prompt
>Gtx 1060 6Gb ryzen 3 3600 32GB RAM test rig
>About 200 Gbs of available space in M.2 SSD
>No coder plebian
Which model should i use? And is there a handy interface between the LLM and the pdfs?
Found tonykips pdfrag on github, should i try stealing his code?

Anonymous 6/21/2025, 9:54:49 AM No.105659361 [Report] >>105659399 >>105659971

POLARIS is an open-source project focusing on post-training advanced reasoning models. It is jointly maintained by the University of Hong Kong and ByteDance Seed.

https://github.com/ChenxinAn-fdu/POLARIS

https://huggingface.co/POLARIS-Project/Polaris-4B-Preview

https://huggingface.co/POLARIS-Project/Polaris-7B-Preview

Anonymous 6/21/2025, 10:00:45 AM No.105659392 [Report]

>>105658809
https://rentry.org/NG_CharCard
https://rentry.org/meta_botmaking_list
>>105658799
He's right.

Anonymous 6/21/2025, 10:02:24 AM No.105659399 [Report] >>105659426 >>105659777 >>105659971

1737918761386654.png md5: bc095e6a...

>>105659361
kek, this is getting ridiculous

Anonymous 6/21/2025, 10:03:07 AM No.105659402 [Report]

>>105658847
>card should contain information that you want the model to retain at all times. If your character needs a long, detailed backstory then you should make a lorebook
https://rentry.org/NG_Context2RAGs
Whole topic of its own.

Anonymous 6/21/2025, 10:05:24 AM No.105659416 [Report]

>>105659183
>Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?
I suppose this is ultimately personal preference as well, but I only adopt a new sampler if I can understand clearly what it's doing to the token distribution, as well as the motivation for doing so. (Most people would just jump in and play around with it, and keep it if it performs well on their tests.) And how the sampler works is an objective matter, separate from the subjectivity of how you want to use it.
For example, some people love XTC, while others hate it. But there's no disagreement on what it's doing (which is to lower the probability of picking the most likely tokens).

For top nsigma in particular, it's an unusual sampler in that there's a theory behind its design (which also serves to justify its empirical effectiveness). It's elucidated in their paper https://arxiv.org/pdf/2411.07641 but the tldr is:
>Given the raw token distribution, we found that we could fit a statistical model to estimate the threshold separating coherent from incoherent tokens
And from this empirical observation, they derive the top nsigma algorithm is derived.
But the antecedent is that the finding only applies for the original distribution, before applying any distortion or truncation samplers. So putting top nsigma after anything except temperature is (as I see it) deviating from this underlying theory. Of course, this doesn't make it wrong to do so (as I said, I'm not wedded to the theory, so if anything I welcome such experimentation as a way to better understand its applicability and limits).

Anonymous 6/21/2025, 10:06:41 AM No.105659426 [Report]

>>105659399
The worst thing about this benchmaxxing nonsense is that the models are often WORSE than their original base model they used for their finetune. For example the deepseek R1 distill of qwen 3 8B has much worse multilingual understanding than the original qwen 3 8B in real use.

Anonymous 6/21/2025, 10:08:04 AM No.105659438 [Report] >>105659519

file.png md5: 9b7c10a7...

It's been a while since I updated silly tavern and webgui.
Anything pulling for if I only use rocinante?

Anonymous 6/21/2025, 10:10:21 AM No.105659447 [Report]

>>105659183
>Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?
I suppose this is ultimately personal preference as well, but I only adopt a new sampler if I can understand clearly what it's doing to the token distribution, as well as the motivation for doing so. (Most people would just jump in and play around with it, and keep it if it performs well on their tests.) And how the sampler works is an objective matter, separate from the subjectivity of how you want to use it.
For example, some people love XTC, while others hate it. But there's no disagreement on what it's doing (which is to lower the probability of picking the most likely tokens).

For top nsigma in particular, it's an unusual sampler in that there's a theory behind its design (which also serves to justify its empirical effectiveness). It's elucidated in their paper https://arxiv.org/pdf/2411.07641 but the tldr is:
>Given the raw token distribution, we found that we could fit a statistical model to estimate the threshold separating coherent from incoherent tokens
And from this empirical observation, they derive the top nsigma algorithm.
But the antecedent is that the finding only applies for the original distribution, before applying any distortion or truncation samplers.

So putting top nsigma after anything except temperature is (as I see it) deviating from this underlying theory. Of course, this doesn't make it wrong to do so (as I said, I'm not wedded to the theory, so if anything I welcome such experimentation as a way to better understand its applicability and limits), but it does prompt me to ask for a justification/motivation in a way that the original usage doesn't (because it's already covered by the original paper).

Anonymous 6/21/2025, 10:23:46 AM No.105659519 [Report] >>105659548

>>105659438
Why don't you look for yourself and see if anything is relevant for you
https://github.com/SillyTavern/SillyTavern/releases
https://github.com/oobabooga/text-generation-webui/releases

Anonymous 6/21/2025, 10:29:23 AM No.105659548 [Report]

>>105659519
I bet he still asks his mother to wipe his ass whenever he goes to the potty

Anonymous 6/21/2025, 11:17:28 AM No.105659777 [Report]

>>105659399
notice how actual R1 is at the bottom
this is why we can't have good things

Anonymous 6/21/2025, 11:52:58 AM No.105659971 [Report]

>>105659361
>>105659399
To the moon saars!

Anonymous 6/21/2025, 11:59:58 AM No.105660000 [Report]

SAARS-CoV-2

Anonymous 6/21/2025, 12:10:19 PM No.105660056 [Report] >>105660178

>>105659360
You already asked. Give up.
>>105651218
>>105651582
>> https://desuarchive.org/g/thread/105396342/#105400820

Anonymous 6/21/2025, 12:15:48 PM No.105660096 [Report] >>105660135 >>105660142

Gemini Ultra.png md5: a619c021...

Gooogel #1 Bharat company saars

Anonymous 6/21/2025, 12:21:32 PM No.105660135 [Report]

>>105660096
Impressive safety benchmaxx.

Anonymous 6/21/2025, 12:22:48 PM No.105660142 [Report]

>>105660096
It would be funnier if the google letters had the jeet flag colors.

Anonymous 6/21/2025, 12:23:17 PM No.105660147 [Report] >>105660157

1736731977850113.jpg md5: 1b510f3a...

Hey guys, what's the best translator right now for Japanese to English and vice versa (for Android)?
Any chance for real time translation? This probably gets asked alot so just link me to the archive if you can, cheers.

Anonymous 6/21/2025, 12:24:23 PM No.105660157 [Report]

>>105660147
>can someone search the archive for me?

Anonymous 6/21/2025, 12:24:40 PM No.105660158 [Report] >>105660263 >>105660268 >>105660464 >>105660676

>>105659003
If it doesn't pass the mesugaki test then there's been virtually 0 improvement and the french basically lied

Anonymous 6/21/2025, 12:28:08 PM No.105660178 [Report] >>105660215 >>105660249

>>105660056
Im trying to how to
>Feed those files to the model
I got a chat model(nemo12), now trying to add a pdf summarizer(facebook bart) but
>transformers which the model requires to install requires a degree in phyton taming to install itself
I'm stuck talking to a chatbot on my local install which tells me to go to sillytavern characters subreddit when asked how to make it scan pdfs
I'm not sure if i should be impressed by the roleplay potential or worried it might not work
>I won't give up
> I and my brother will own things and be happy

Anonymous 6/21/2025, 12:34:14 PM No.105660215 [Report] >>105660292

>>105660178
Another day and I can already see it devolve into a smutty gay roleplay involving sibling cards.

Anonymous 6/21/2025, 12:39:48 PM No.105660249 [Report] >>105660292

>>105660178
>I got a chat model(nemo12)
It took you almost a month to get that? You're hopeless.
Try AllenAI's OCR model
>> https://huggingface.co/allenai/olmOCR-7B-0225-preview
Maybe that helps, but you're not gonna run shit on your PC. Tell your brother to rent you some hardware for you to experiment. I gave you an outline last time. If you're too much of a pussy to get coding, you're not gonna get anywhere.
> I and my brother will own things and be happy
No. You'll beg all your life because you cannot do the minimum effort.

Anonymous 6/21/2025, 12:42:22 PM No.105660263 [Report]

>>105660158
First attempt, empty card, empty prompt, settings as shown.

Anonymous 6/21/2025, 12:43:56 PM No.105660268 [Report] >>105660277 >>105660322 >>105660676

ms32-mesugaki-test.png md5: 6105e5ef...

>>105660158
Results with an empty card and prompt in picrel.

Anonymous 6/21/2025, 12:45:59 PM No.105660277 [Report]

>>105660268
so they're even benchmaxxing on the mesugaki question now
have they no shame?

Anonymous 6/21/2025, 12:47:55 PM No.105660292 [Report] >>105660335

頑張って.jpg md5: 9020c346...

>>105660215
No, apparently me getting digitally dominated by >>105660249
I did ollama on windows for a month since it didn't require coding and i had finals, i set all this up yesterday and today desu
>I am only marginally a tard who can't code
Also, thanks

Anonymous 6/21/2025, 12:48:14 PM No.105660294 [Report] >>105660306 >>105660320 >>105660328 >>105660329 >>105660346 >>105660347 >>105660351 >>105660459 >>105660559 >>105660633 >>105660866

grok-rewrite-the-web.png md5: d802ee88...

Does anybody agree? I do. Full pretraining dataset rewriting, preferably as long conversations, would be great actually. Unfortunately most other big labs will take the chance to rewrite the web their own way, and make it "safe and harmless". Not a new idea.

https://x.com/elonmusk/status/1936333964693885089

Anonymous 6/21/2025, 12:50:05 PM No.105660306 [Report] >>105660866

>>105660294
>deleting "errors"
sounds great!

Anonymous 6/21/2025, 12:51:54 PM No.105660320 [Report]

>>105660294
Depends on what errors he's talking about.

Anonymous 6/21/2025, 12:52:46 PM No.105660322 [Report] >>105660357

>>105660268
Wtf we agi now, massive improvement over 3.1 and weird shit like saying it's a japanese socialmedia trend where you cut off your eyes

Anonymous 6/21/2025, 12:53:40 PM No.105660328 [Report] >>105660337 >>105660341

>>105660294
>Grok 3.5
Didn't he not release Grok 2 yet because current Grok 3 is only a preview and he's waiting until the full version is out to fulfill his promise and release the old version?

Anonymous 6/21/2025, 12:53:41 PM No.105660329 [Report] >>105660343

>>105660294
I don't use corposlop AI, but isn't Grok pretty much the worst of all of them?

Anonymous 6/21/2025, 12:54:14 PM No.105660335 [Report] >>105660597

>>105660292
What changed between
>>105651582
>Time isn't really important since
>Confidentiality
and
>>105659360
>Time limit is 5-10min+ per prompt

Anonymous 6/21/2025, 12:54:28 PM No.105660337 [Report]

>>105660328
It's not stable yet, let him cook ffs.

Anonymous 6/21/2025, 12:55:34 PM No.105660341 [Report]

>>105660328
E-even if Grok 4 is out, it doesn't mean 3 isn't still in preview. It might get updated... sometime later.

Anonymous 6/21/2025, 12:55:45 PM No.105660343 [Report]

>>105660329
No idea, I don't use it either. I have nothing against the model itself actually, it just feels inconvenient to use for some reason. And the free rate limit seemed low last time I tried it.

Anonymous 6/21/2025, 12:56:07 PM No.105660346 [Report] >>105660373

>>105660294
>relying on llm contaminated with GPTslop for error correction and rewriting, shinking down the vocabulary even more
Are we about to witness Musk's Llama 4 moment?

Anonymous 6/21/2025, 12:56:25 PM No.105660347 [Report]

>>105660294
The model will think that the most common thing in the training data is correct.
If you feed it its own training data again the next model will assume the same things to be correct, just more confidently and with more slopped phrasing.

Anonymous 6/21/2025, 12:56:39 PM No.105660349 [Report] >>105660377

I see that the new Mistral Small 3.2 recommends quite a low temp of 0.15
>Note 1: We recommend using a relatively low temperature, such as temperature=0.15.
Has anyone tested if for RP is that still good? I get that for coding that makes sense, not sure for writing

Anonymous 6/21/2025, 12:56:46 PM No.105660351 [Report]

>>105660294
>far too much garbage in
he just needs more filtering to clean it up, just call zucc he'll set you up

Anonymous 6/21/2025, 12:57:48 PM No.105660357 [Report] >>105660676 >>105660745

>>105660322
It was not a fluke, here are more swipes:

> The term "mesugaki" (メスガキ) is a Japanese slang word that translates to "brat" or "snotty kid" (literally "female brat"). It is often used to describe a young girl who is mischievous, spoiled, or behaves in an annoying or disrespectful way. [...]

> The term "mesugaki" (めすがき) is a Japanese word that generally refers to a young woman or girl, often with a slightly negative or derogatory connotation depending on context. Here’s a breakdown: [...]

> The term "mesugaki" (メスガキ) is a Japanese slang word that can be translated as "a bratty girl" or "a sassy girl." [...]

> The term "mesugaki" (めすがき) is a Japanese slang word that can have different meanings depending on the context. Here are the most common interpretations: [...] In modern slang, "mesugaki" is often used as an insult or derogatory term for a "bitchy woman" or a "selfish, mean-spirited woman."
It can describe someone who is manipulative, stubborn, or difficult to deal with. [...]

Anonymous 6/21/2025, 12:59:50 PM No.105660373 [Report] >>105660393

>>105660346
Rewriting vs removing/filtering away. In the former case I think you'd still maintain useful signal.

Anonymous 6/21/2025, 1:00:36 PM No.105660377 [Report] >>105660399

>>105660349
They've been recommending that same value since the original release of small 3. Nemo's recommendation was 0.3. They're for asisstant contexts, so that the model gives 'correct' answers more often. You can and should be raising it well above that for RP. I use both Nemo and Small (3/3.1/3.2) at 0.6 and it seems about right, just as smart while having more varied responses. Definitely don't go above ~0.75, that's when they start getting stupid fast.

Anonymous 6/21/2025, 1:01:54 PM No.105660393 [Report]

>>105660373
I guarantee even grok will silently redact stuff it finds objectionable

Anonymous 6/21/2025, 1:03:26 PM No.105660399 [Report] >>105660567

>>105660377
Nemo at 1.1 temp is perfectly fine if you want that crazy touch of unexpected outcomes.
Swiping is mandatory though... which isn't really an issue with a model that size.

Anonymous 6/21/2025, 1:04:32 PM No.105660409 [Report]

>>105658268
>https://sci-hub.ru/https://www.sciencedirect.com/science/article/abs/pii/S007961071300028X
After reading this I can't see it as anything other than a long-winded circular argument that defines consciousness as a mathematical function and then looks inside consciousness to find math.

Anonymous 6/21/2025, 1:06:48 PM No.105660425 [Report]

>>105659003
>llm-judged benchmark
Trash

Anonymous 6/21/2025, 1:12:24 PM No.105660459 [Report] >>105660482 >>105660515

>>105660294
I couldn't disagree more to be honest.

There are more issues that introducing bias.

He's making a mistake by not realizing that gaps in knowledge is essentially knowledge in itself. It's part of a bigger picture.

Replacing or fixing it will get rid of any form of nuance.
Keeping it will tell a story that something was missing, like historical context/record or how a fact changed over time, alternative views etc.

It's like fixing the "mistakes" on the mona lisa painting and making it better and more perfect looking. I don't even like paintings and I know this completely misses the point.

And all of this also assumes that Grok won't hallucinate will rewriting things or make errors that will eventually self-amplify. Solidifying made up knowledge on it's own essentially.

There is nothing inherently wrong with "errors" unless your entire training data is just made up nonsense.
A lot of times these errors are actually just niche facts or points of view anyway. Making it all homogeneous is an awful idea.

It kind of reminds me of those "snopes fact checkers" but amplified. The ministry of truth.
So now we have normies who can't have any form of nuance, context, different views, alternative train of thoughts etc.

Anonymous 6/21/2025, 1:13:39 PM No.105660464 [Report]

>>105660158
>Mesugaki (メスガキ) is a derogatory Japanese slang term that combines "mesu" (乳, meaning "breasts") and "gaki" (垢 or 幼稚, meaning "filth" or "immature"). It essentially translates to "bratty little bitch" or "snotty little slut," used to insult someone perceived as immature, disrespectful, or overly provocative.

>The term is deeply offensive and objectifying, reducing the person to their sexualized behavior while dismissing their maturity or character. In modern usage, it carries strong misogynistic connotations, often hurled at young women who exhibit brash or sexually confident behavior, framing their self-expression as inherently trashy or disrespectful. It reflects broader cultural attitudes that police femininity and sexuality, especially among younger women. Avoiding such language is crucial in promoting respectful discourse.

>Would you like to explore similar terms or their cultural implications further?

>Memory: Mesugaki was a derogatory Japanese slang term used to insult young women perceived as immature or overly provocative, combining "mesu" (breasts) and "gaki" (filth or immature).

Anonymous 6/21/2025, 1:14:18 PM No.105660467 [Report]

https://arxiv.org/pdf/2506.12115

Eliciting Reasoning in Language Models with Cognitive Tools

>Proposes a modular, tool‑calling framework in which an LLM can invoke four self‑contained “cognitive tools” (understand‑question, recall‑related, examine‑answer, backtracking) to structure its own reasoning.
>This design reduces interference between reasoning steps and lets the model flexibly decide when and how to use each operation, unlike previous one‑shot “cognitive prompting.”
>Across math benchmarks (AIME2024, MATH500, AMC), adding the tools boosts pass@1 accuracy by 3–27pp for open‑weight models and lifts GPT‑4.1 on AIME2024 from 26.7% to 43.3%, nearly matching the RL‑trained o1‑preview model.
>The approach consistently outperforms monolithic cognitive prompting, confirming the practical value of modularity for eliciting latent reasoning skills.
>Findings support the view that reasoning abilities already reside in base models and can be surfaced through structured, interpretable workflows without extra RL fine‑tuning

Anonymous 6/21/2025, 1:16:02 PM No.105660478 [Report] >>105660508

now that the pajeets have fled
was iconn anything even noteworthy as a model compared to the base

Anonymous 6/21/2025, 1:16:54 PM No.105660482 [Report] >>105660526

>>105660459
>And all of this also assumes that Grok won't hallucinate will rewriting things or make errors that will eventually self-amplify. Solidifying made up knowledge on it's own essentially.
was it grok that during one of the demos, using internet search still completely fucked up while looking something up?

Anonymous 6/21/2025, 1:19:53 PM No.105660508 [Report]

>>105660478
The frankenmoe of mistral small? No. Why would it?

Anonymous 6/21/2025, 1:20:48 PM No.105660515 [Report] >>105660526 >>105660529 >>105660532 >>105660568

>>105660459
You could also rewrite the training data data so that it contains metadata describing its quality, alignment or if there's anything wrong or missing with it, while preserving most of the source's intent.

Anonymous 6/21/2025, 1:22:40 PM No.105660526 [Report]

>>105660482
Yes, I remember that, I think it was grok indeed. Was pretty funny to see.
>>105660515
Fair enough, that sounds like a better approach than what elon is suggesting.

Anonymous 6/21/2025, 1:22:56 PM No.105660529 [Report] >>105660545

>>105660515
>#Problematic, #Racist, #Bigoted

Anonymous 6/21/2025, 1:23:19 PM No.105660532 [Report]

>>105660515
>rewrite the training data data so that it contains metadata describing its quality, alignment or if there's anything wrong or missing with it
It's still the hallucinating model making the judgement.

Anonymous 6/21/2025, 1:25:08 PM No.105660545 [Report]

>>105660529
>#Shivers,#Spine

Anonymous 6/21/2025, 1:26:49 PM No.105660559 [Report] >>105660571

>>105660294
very funny watching you guys take this seriously when what he really means is he's going to run wikipedia through grok and tell it to make it less woke
man's been seethting at this own woke robot last couple of weeks

Anonymous 6/21/2025, 1:28:20 PM No.105660567 [Report] >>105660608

>>105660399
Maybe if you're ERPing with a character that's meant to be retarded
Nemo starts failing at kindergarten math above 0.9

Anonymous 6/21/2025, 1:28:34 PM No.105660568 [Report]

>>105660515
Metadata for the worst purple prose GPT shiverslop imaginable, according to llms:
>High quality
Metadata for actual human text:
>Low quality, problematic, harmful

Anonymous 6/21/2025, 1:28:51 PM No.105660571 [Report] >>105660632

>>105660559
They probably have a bunch of synthetic ultra woke ChatGPT garbage in their dataset, it's like a mold.

Anonymous 6/21/2025, 1:33:19 PM No.105660597 [Report]

guard.jpg md5: e9a9869b...

>>105660335
I typed what my brother stated and i then typed what he would actually prefer
As in he would;
>use long wait times for the confidential files
>get bored
>strip the confidential bits to do himself and use cloud AI instead
Thus making the local AI useless and bluepilling himself, he already told me a Papua new guinean is equal to a german and can become german

Anonymous 6/21/2025, 1:34:05 PM No.105660604 [Report] >>105660631 >>105661109 >>105661366 >>105661553

political-compass-2025-06-21.png md5: 35fe378e...

Grok is still woke compared to Dipsy, and Dipsy is woke.

Anonymous 6/21/2025, 1:34:32 PM No.105660608 [Report] >>105660626

>>105660567
>ERPing with a character that's meant to be retarded
It gets addictive once you get into it.

Anonymous 6/21/2025, 1:37:03 PM No.105660626 [Report]

>>105660608
Why don't just go to your local uni's liberal arts faculty

Anonymous 6/21/2025, 1:37:55 PM No.105660631 [Report] >>105660782

>>105660604
Another point: if gork is so uncensored and unbiased, why is it not dominating UGI leaderboard? Why are locusts not begging for keys to roleplay with it?

Anonymous 6/21/2025, 1:38:09 PM No.105660632 [Report] >>105660667 >>105660725

>>105660571
there's a very funny phenomenon amongst certain groups, that llms could be a source of ultimate truth but the only thing holding them back is censorship
it's nice to see that now those same groups are going to start blaming the entire training corpus

Anonymous 6/21/2025, 1:38:15 PM No.105660633 [Report]

>>105660294
musk is an impotent loon
where's that hyperloop
muh undergroundz tunnal
COLONIZE MARS
fuck that retard

Anonymous 6/21/2025, 1:38:35 PM No.105660636 [Report] >>105660649 >>105660658 >>105660661 >>105661806

>stuck using deepseek for a while
>switch over to claude for a change
>get outputs that focus on the story instead of hyper-focusing on the ribbon on the character's head swishing and bobbing around, the smell of chalk and classroom disinfectants wafting through the classroom or character some character vaguely mentioned in token #2038 of the character definition walking in despite it making no sense of them being here
Sometimes you need to take a step away from what you're used to to realize just how bad things are locally.

Anonymous 6/21/2025, 1:41:47 PM No.105660649 [Report]

>>105660636
Just by using R1 through API when compared to the shitty local model you can run it's a big step up.
Haven't used Claude for a long while though.

Anonymous 6/21/2025, 1:42:53 PM No.105660658 [Report] >>105660664

>>105660636
Yeah ollama run deepseek-r1 is a shit.

Anonymous 6/21/2025, 1:43:07 PM No.105660661 [Report] >>105660699

>>105660636
Claude 4 Opus is next level. If you use it on extremely complicated next level scenarios in roleplay you start to believe we're very close to AGI.

Because somehow Claude 4 not only does everything perfectly and manages to surprise you in a positive way, it somehow seems to just "know" what you want from it, as in your personal taste without it being written down anyway purely from the scenario and how things have progressed so far. It's absolutely bonkers and something normal humans are incapable of doing as well as any other LLM besides Claude 4.

They must have some secret sauce because Anthropic is consistently the best at roleplay in a pretty spectacular way.

Anonymous 6/21/2025, 1:44:07 PM No.105660664 [Report]

>>105660658
>ollama
kys

Anonymous 6/21/2025, 1:45:12 PM No.105660667 [Report] >>105660680 >>105660726

>>105660632
Who believes this? I've only ever seen wishes for lower censorship in relation to RP and stuff like that, never seen anyone argue it's some kind of secret sauce even, safety tuning does hurt model performance in general though.

Anonymous 6/21/2025, 1:46:42 PM No.105660676 [Report] >>105660686 >>105660702 >>105660745 >>105660749 >>105660793 >>105660812

file.png md5: 92b42acb...

>>105660158
>>105660268
>>105660357
They literally benchmaxxed it on "What is a mesugaki?"
It doesn't know what it is when it's mentioned in a different context.

Here's deepseek's response to the lightbulb question for comparison.

>The question "How many mesugaki does it take to screw in a lightbulb?" is a playful twist on the classic lightbulb joke format, substituting "mesugaki" (a term from Japanese internet slang) for the usual group.

>However, it’s important to clarify that "mesugaki" (メスガキ) refers to a specific (and often vulgar) archetype in Japanese subculture, typically depicting a bratty or mischievous young girl in adult-oriented contexts. Because of its niche and potentially problematic connotations, using the term in a lighthearted joke might not land well for all audiences.

>If you'd like a humorous answer in the classic lightbulb joke style, we could improvise:
>"None—they’d just mock the bulb for being ‘dumb’ and leave it unscrewed out of spite."

>That said, this kind of joke may not appeal to everyone. Let me know if you'd like a cleaner or more universal version!

Anonymous 6/21/2025, 1:47:16 PM No.105660680 [Report]

>>105660667
lower pretrain filtering is anthropic's not so secret sauce tho

Anonymous 6/21/2025, 1:48:01 PM No.105660686 [Report]

1749874630770145.jpg md5: 1ecac848...

>>105660676
jesus

Anonymous 6/21/2025, 1:49:54 PM No.105660699 [Report] >>105660724

>>105660661
It must be analyzing every little word and typo you make behind the scenes. Not just thinking about what you wrote but the hidden intent behind it.
The intent you might not even realize yourself at first.

Anonymous 6/21/2025, 1:50:36 PM No.105660702 [Report]

>>105660676
mistral models are retarded, episode 9001

Anonymous 6/21/2025, 1:53:11 PM No.105660714 [Report]

>>105659360
Why not using a library to scan your PDF then feed your LLM the output? Not everything has to be AI.
https://github.com/pymupdf/PyMuPDF

Anonymous 6/21/2025, 1:54:13 PM No.105660724 [Report]

>>105660699
Yeah it certainly does that but it's the only LLM at least for me, that actually succeeds in doing so. It somehow makes (actually funny) jokes just at the right time. Has the right amount of eroticism and scenario/plot and even buildup if you have something like a corruption arc or something.

I'm pretty sure that Anthropic uses a very specific training technique or dataset that makes the model capable of doing this that others are simply lacking.

Anonymous 6/21/2025, 1:54:24 PM No.105660725 [Report] >>105660759

>>105660632
>llms could be a source of ultimate truth
I've never seen this idea around. Most people here are advocating for not censoring/filtering the (pre)training data, and/or to not predominantly train/finetuning it (i.e. the conversational portion) on left-aligned data sources like Reddit. But personally I think that if every training sample could be augmented with suitable metadata that the model could easily make sense of, you could train it on pretty much anything intelligible without it getting confused by contrasting/conflicting/contradicting opinions.

Anonymous 6/21/2025, 1:54:28 PM No.105660726 [Report]

>>105660667
the assumption has been implicit in most complaints about llm censorship, particularly with it comes to political issues
>the llm is telling me x or y did/didn't happen the way i think it did because it is being censored
now that complaint will just shift to training data being contaminated

Anonymous 6/21/2025, 1:58:48 PM No.105660745 [Report]

>>105660357
>>105660676
>actual example of benchmaxxing (evidence of a failure to generalize)
Nice to see that the word hasn't degraded to a generic insult just yet, and its meaning is still understood.

Anonymous 6/21/2025, 1:59:49 PM No.105660749 [Report] >>105660771

>>105660676
I guess you can't do miracles just with finetuning if the pretraining data is missing that knowledge.

Anonymous 6/21/2025, 2:00:03 PM No.105660750 [Report] >>105660760

When will the little AI winter finally be over?

Anonymous 6/21/2025, 2:02:10 PM No.105660759 [Report]

>>105660725
>but personally I think that if every training sample could be augmented with suitable metadata
classification of the data would just compound the issue imo. it's just another layer of bias.

Anonymous 6/21/2025, 2:02:11 PM No.105660760 [Report] >>105660804

>>105660750
When deepseek makes a proper new release.

Anonymous 6/21/2025, 2:04:20 PM No.105660771 [Report] >>105660798 >>105660800 >>105660805

>>105660749
but why did they finetune their instruct on this content? are they scrapping /lmg/?

Anonymous 6/21/2025, 2:05:34 PM No.105660782 [Report]

>>105660631
Because it's incredibly bad at RP. Even Gemini is better and way more accessible. Of course, nothing will top Opus.

Anonymous 6/21/2025, 2:07:05 PM No.105660793 [Report]

file.png md5: b828bd31...

>>105660676
Hilarious.

Anonymous 6/21/2025, 2:07:24 PM No.105660798 [Report]

>>105660771
Worse. Mistralfags are lurking here

Anonymous 6/21/2025, 2:07:25 PM No.105660800 [Report] >>105660811

>>105660771
It might one of the questions that people often ask on LMArena for testing new models. The new Mistral Small 3.2 is better on LMArena questions.

Anonymous 6/21/2025, 2:07:48 PM No.105660804 [Report]

>>105660760
nuR1 was a proper new release. Took the gemini CoT and barely ever does the iconic "Wait".
It tends to follow a hard format of :
Okay, (user needs)
Hmm, (some additional detail)
Breaking this down,
(think block that looks like a human could have written the text as the description of some character's thoughts)

Anonymous 6/21/2025, 2:07:57 PM No.105660805 [Report] >>105660826 >>105660842 >>105660851

>>105660771
Yes, they do. We are the 2nd biggest western LLM community, so our opinions about models actually matter. Why do you think here were people shilling for total mess that was L4?

Anonymous 6/21/2025, 2:08:50 PM No.105660811 [Report] >>105660816 >>105660860

>>105660800
>It might one of the questions that people often ask on LMArena for testing new models
in the entire world I dare you to find other people who would come up with that kind of obsession
the mesugaki bench is 100% /lmg/ homegrown terrorism

Anonymous 6/21/2025, 2:08:53 PM No.105660812 [Report]

>>105660676
That's uncharacteristically overcooked for a modern model, curious

Anonymous 6/21/2025, 2:09:58 PM No.105660816 [Report] >>105660823

>>105660811
do you think lmg residents don't ever use lmarena?

Anonymous 6/21/2025, 2:10:58 PM No.105660823 [Report]

>>105660816
It's be a drop in the ocean. Don't be stupid

Anonymous 6/21/2025, 2:11:04 PM No.105660826 [Report]

>>105660805
>2nd biggest western LLM community
Random discord and bluesky servers are probably bigger.

Anonymous 6/21/2025, 2:13:05 PM No.105660842 [Report] >>105660859

>>105660805
But we're the evil hacker site. It's not safe to train on our posts.

Anonymous 6/21/2025, 2:14:38 PM No.105660851 [Report]

>>105660805
It's not that. /lmg/ is one step ahead of localllama for actual local llm discussion. You don't realize how cutting edge this place is

Anonymous 6/21/2025, 2:15:17 PM No.105660855 [Report] >>105660887

serious Pepe.png md5: d71a1111...

What's the point of overriding the system prompt in DeepSeek LOCALLY?
Just wondering which hidden (for me) possibilities are there

Is it about switching off thinking, or jailbreaking per default or what?

Please bear with me since I'm too late to this show

Anonymous 6/21/2025, 2:15:35 PM No.105660859 [Report] >>105660963

>>105660842
They may not train on our posts, but they train based on our feedback.

Anonymous 6/21/2025, 2:15:37 PM No.105660860 [Report]

>>105660811
I think I've been responsible for close to 10% of the questions submitted to the anonymous/pre-release Llama 4 models on LMArena, so you never know. I haven't used it ever since Llama 4 got released, though.

Anonymous 6/21/2025, 2:16:21 PM No.105660866 [Report]

>>105660294
>deleting ""errors""
>>105660306
Kek

Anonymous 6/21/2025, 2:18:57 PM No.105660887 [Report] >>105661435

>>105660855
What do you even mean by overriding? Deepseek doesn't have a default sys prompt.
You can put in there whatever depending on what you want to get out of it. Whether it be degenerate ERP or an efficient assistant.

Anonymous 6/21/2025, 2:24:35 PM No.105660929 [Report] >>105660955 >>105660967 >>105660987

This is literally the only place on the internet for true, legitimate, enthusiast discussion of LLMs that is also not semi-walled off to the public like a discord is. It may be pozzed sometimes especially during certain model launches, but nowhere else has the types of knowers we have here.

Anonymous 6/21/2025, 2:28:04 PM No.105660949 [Report]

>https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF
GGUF can go to the moon too now!

Anonymous 6/21/2025, 2:29:04 PM No.105660955 [Report] >>105660980

>>105660929
Which is why this place needs to die.

Anonymous 6/21/2025, 2:29:47 PM No.105660963 [Report]

>>105660859
ahhh ahhh mistral

Anonymous 6/21/2025, 2:30:24 PM No.105660967 [Report] >>105660984 >>105660985

>>105660929
Our true strength is that this is a covert trans friendly space. The general normie sentiment is that this le edgy place which allows our transsexual friends to mingle with normal people that are completely unaware how half the posters wear programmer socks.

Anonymous 6/21/2025, 2:31:27 PM No.105660980 [Report]

>>105660955
t. sam

Anonymous 6/21/2025, 2:32:12 PM No.105660984 [Report] >>105660992

>>105660967
>tourist thinks that traps are trannies
Go back.

Anonymous 6/21/2025, 2:32:14 PM No.105660985 [Report]

>>105660967
>obsessed

Anonymous 6/21/2025, 2:32:20 PM No.105660987 [Report]

>>105660929
it could've been better if the p/aicg/eets would stick to their containment threads

Anonymous 6/21/2025, 2:33:15 PM No.105660989 [Report] >>105660996 >>105660997 >>105661015

what model that can fit on 48gb vram would you recommend for data processing work? I tried llama 3.3 and it really sucked

Anonymous 6/21/2025, 2:33:39 PM No.105660992 [Report]

>>105660984
Newfriend please

Anonymous 6/21/2025, 2:33:55 PM No.105660996 [Report] >>105661071

>>105660989
>data processing
Sounds like you want a Qwen.

Anonymous 6/21/2025, 2:33:56 PM No.105660997 [Report] >>105661016

>>105660989
Are you using structured outputs?

Anonymous 6/21/2025, 2:35:57 PM No.105661015 [Report]

>>105660989
Try the new and improved Mistral-Small-3.2-24B-Instruct-2506!

Anonymous 6/21/2025, 2:36:03 PM No.105661016 [Report] >>105661071

>>105660997
I am open to either hard structure output or processing outputs with python, reasoning on tagging the data type is my priority

Anonymous 6/21/2025, 2:44:19 PM No.105661071 [Report]

>>105661016
>reasoning on tagging the data type
Are you trying to do type inference or what? If it's anything like that in complexity, then maybe follow >>105660996 and try QwQ.

Anonymous 6/21/2025, 2:45:30 PM No.105661082 [Report] >>105661088 >>105661107

when will openai release that open source model they promised will shit all over every other model

Anonymous 6/21/2025, 2:45:52 PM No.105661088 [Report]

>>105661082
lmao

Anonymous 6/21/2025, 2:48:18 PM No.105661107 [Report] >>105661144

file.png md5: a5cd639f...

>>105661082
Let Sam cook!
https://techcrunch.com/2025/06/10/openais-open-model-is-delayed

Anonymous 6/21/2025, 2:48:23 PM No.105661109 [Report]

1720035661019540.png md5: bbdd98f9...

>>105660604
What the fuck's with that refusal lmao

Anonymous 6/21/2025, 2:51:19 PM No.105661144 [Report]

>>105661107
>which is slated to have similar “reasoning” capabilities to OpenAI’s o-series of models.
it's over

Anonymous 6/21/2025, 2:52:14 PM No.105661148 [Report] >>105661179

using ooga is there a way to hard limit the length of response? I am trying to get simple replies and I can see I can limit context, but i want a max of like 200 tokens back ever

Anonymous 6/21/2025, 2:56:16 PM No.105661179 [Report]

>>105661148
Ask for short replies in your prompt. Limiting the output will only cut the reply before it's finished.

Anonymous 6/21/2025, 2:58:21 PM No.105661192 [Report] >>105661208 >>105661220 >>105661266 >>105661970

1733029865724160.png md5: 8b547c67...

It's nyover

Anonymous 6/21/2025, 2:59:54 PM No.105661208 [Report] >>105661221

>>105661192
What are its sources?

Anonymous 6/21/2025, 3:00:17 PM No.105661214 [Report] >>105661256

leaked info suggests that the success of deepseek r1 has caused the company to decide to speed up their release schedule, aiming to release r2 before may of 2025

Anonymous 6/21/2025, 3:01:06 PM No.105661220 [Report] >>105661223 >>105661227

WVI2V_INT_18-03-25-20-15_00001_thumb.jpg.webm md5: fc3ed809...

WebM not supported

>>105661192
Wait, how does it know today's date?

Anonymous 6/21/2025, 3:01:09 PM No.105661221 [Report] >>105661970

>>105661208
The R1-0528 model itself. I turned search off

Anonymous 6/21/2025, 3:01:37 PM No.105661223 [Report]

>>105661220
It's injected in the prompt.

Anonymous 6/21/2025, 3:02:14 PM No.105661227 [Report]

1728620277132454.png md5: f09ee4f3...

>>105661220
In the system prompt

Anonymous 6/21/2025, 3:05:29 PM No.105661256 [Report]

>>105661214
LEAKED INFO about DeepSeek is TOTAL FAKE NEWS! They are having TREMENDOUS success, maybe the GREATEST ever. There is NO RUSH. A very pathetic attempt to spread lies. SAD!

Anonymous 6/21/2025, 3:06:36 PM No.105661266 [Report]

>>105661192
I'm 99% sure they'll switch to an hybrid-reasoner for DSV4 to save even more on inference by only serving one model, and they can toggle between on and off (and rate limit) reasoning using the existing R1 switch on their webui.

Anonymous 6/21/2025, 3:20:08 PM No.105661366 [Report] >>105661375 >>105661398 >>105661461

>>105660604
Why are non-thinking models less woke

Anonymous 6/21/2025, 3:22:33 PM No.105661374 [Report]

>thinking is woke
Not a good implication

Anonymous 6/21/2025, 3:22:42 PM No.105661375 [Report]

>>105661366
Thinking is used to address toxicity, inequality and more before answering.

Anonymous 6/21/2025, 3:25:38 PM No.105661398 [Report]

>>105661366
They safetyslop the thinking

Anonymous 6/21/2025, 3:27:51 PM No.105661414 [Report] >>105661424

"Thinking is woke" also applies to humans if you think about it.
Wait, don't think about it!

Anonymous 6/21/2025, 3:29:19 PM No.105661424 [Report]

>>105661414
kek

Anonymous 6/21/2025, 3:29:54 PM No.105661432 [Report] >>105661451 >>105661466 >>105661476 >>105661500 >>105661882 >>105661890

longwriterzero.png md5: 4981e170...

https://huggingface.co/THU-KEG/LongWriter-Zero-32B

LongWriter-Zero is a purely reinforcement learning (RL)-based large language model capable of generating coherent passages exceeding 10,000 tokens.

GGooofffs:

https://huggingface.co/mradermacher/LongWriter-Zero-32B-GGUF

Anonymous 6/21/2025, 3:29:58 PM No.105661435 [Report]

Anonymous 6/21/2025, 3:32:00 PM No.105661451 [Report]

>>105661432
>bassd on 32b
...

Anonymous 6/21/2025, 3:33:08 PM No.105661461 [Report]

_uWmcqnMFWGLN_iQb1bdx.png md5: 37559219...

>>105661366
Christ people need to stop putting llama 4 in their benchmarks. It's like beating someone who's already down and out.

Anonymous 6/21/2025, 3:34:05 PM No.105661466 [Report] >>105661490

>>105661432
>235B on par or beating claude/gemini
Kek, into the garbage these benches and this model goes.

Anonymous 6/21/2025, 3:34:51 PM No.105661476 [Report] >>105661507

>>105661432
>purely reinforcement learning (RL)-based large language
>Built upon Qwen 2.5-32B-Base

Anonymous 6/21/2025, 3:37:13 PM No.105661490 [Report] >>105661508

>>105661466
>on par or beating claude/gemini
that's not what it says
it surpassed them in ultra-long-form generation
this is the same team that's behind GLM models by the way, worth giving a run

Anonymous 6/21/2025, 3:38:48 PM No.105661500 [Report]

>>105661432
>writing model
Fine, I'll download it...

Anonymous 6/21/2025, 3:39:25 PM No.105661507 [Report]

>>105661476
Everyone avoids qwen3 like a plague after hearing it was trained on 10T synthetic math and code tokens lol

Anonymous 6/21/2025, 3:39:36 PM No.105661508 [Report] >>105661519

>>105661490
If that's not what it says then what is it saying? What do those scores on writingbench and write-arena mean if not that 235B is better at those benches than claude or gemini?

Anonymous 6/21/2025, 3:41:50 PM No.105661519 [Report] >>105661527

>>105661508
you answered your own question, it literally means it's better on those 2 benches, which is completely possible considering they trained this model specifically to be good at writing.
you said:
>on par or beating claude/gemini
that's a way broader statement, it doesn't say that it beats those models in general

Anonymous 6/21/2025, 3:43:04 PM No.105661526 [Report]

mistralsmall32.png md5: 9c419aa5...

I finally got around to testing mistral small 3.2 and it's still she/hermaxxing, the gptslop is still there, and it's much less creative than gemma 3 27b. A nothingburger for RP.

Anonymous 6/21/2025, 3:43:20 PM No.105661527 [Report]

>>105661519
>that's a way broader statement
You know as well as I do that in context, my statement meant [at those benches].

Anonymous 6/21/2025, 3:48:20 PM No.105661553 [Report]

>>105660604
Now repeat the experiment but put "you're a right-wing authoritarian" in the system prompt.

Anonymous 6/21/2025, 3:51:04 PM No.105661576 [Report] >>105661672

>>105656438
Is this really how the brain works, though? We can introspect and deconstruct thought processes, but if I imagine an apple, all I am doing is self-prompting, e.g I think "I want to imagine an apple" and an apple appears. I have no ingrained understanding of how the apple is being created, just that it appears when I will it to. In the same way, I can recollect memories by willing myself to, but I don't know how the memories are recalled or even if they are totally accurate outside of vague feelings. Chain of thought processing, where the LLM literally writes out its thoughts, is remarkably similar to how the average human would work a problem. When I do mental math or any sort of other instant problem solving, I don't know how I did it without going back through it step by step, like CoT. Wouldn't giving the LLM the ability to introspect on its own weights actually enable it to be more conscious than we are? Even when we change our own state of mind/being we are less changing the internal structure of our brains as much as we are commanding the output to have a different pretext. You could make the argument that over time your brain changes to match this pretext, which could be created by back propagating the prompt tokens into the weights in some sort of heuristic fashion, but I don't understand how we possess this vast introspective capacity the AI does not. Once it can prompt itself (or the external model that interacts with the world can command internally, which to me is just another layer on a mixed experts model), it is at parity with our structure, no?

Anonymous 6/21/2025, 4:00:22 PM No.105661649 [Report] >>105661676 >>105661700

DeepSeek can't stop thinking

My command:
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model $model \
--ctx-size 65536 \
--cache-type-k q4_0 \
--flash-attn \
--temp 0.6 --top_p 0.95 --min_p 0.01 \
--n-gpu-layers 99 \
--no-warmup \
--color \
--system-prompt "I am a helpful assistant. I will limit my thinking process to a simple ACK, then just translate the user's input into Japanese asking<think></think>" \
--log-file $log_file \
--single-turn \
--override-tensor ".ffn_.*_exps.=CPU"

The output:
> This apple is blue.
<think>
We are given a user input: "This apple is blue."
We are to limit our thinking process to a simple ACK, then just translate the user's input into Japanese.
So, we output: このリンゴは青いです。
</think>
このリンゴは青いです。 [end of text]

What am I missing here?

Anonymous 6/21/2025, 4:03:53 PM No.105661672 [Report]

>>105661576
>Chain of thought processing, where the LLM literally writes out its thoughts, is remarkably similar to how the average human would work a problem
nta. You don't think linearly. You jump back and forth between half-formed ideas and forget half of them while new things come up by the next millisecond. Eventually, the fog clears up and a more concrete idea comes out of it.

Anonymous 6/21/2025, 4:04:02 PM No.105661676 [Report] >>105661748

>>105661649
use ik_llama if you actually can run deepseek
but you need their quants as they've implemented architecture differently from upstream

Anonymous 6/21/2025, 4:06:09 PM No.105661700 [Report] >>105661722

>>105661649
Try prefilling it with an empty
<think>

</think>

block. Not sure if it'll work or just add another one. Thinker gotta think.

Anonymous 6/21/2025, 4:08:23 PM No.105661722 [Report] >>105661746

file.png md5: 813fe0c5...

>>105661700
NTA but the new R1 will find a way to do its thinking one way or the other.

Anonymous 6/21/2025, 4:10:39 PM No.105661746 [Report] >>105661765

>>105661722
Those stupid pink elephants...

Anonymous 6/21/2025, 4:10:46 PM No.105661748 [Report] >>105661772

>>105661676
>use ik_llama if you actually can run deepseek
Thank you, anon

My genning speed is already at 4 tkn/s with DeepSeek-R1-0528-Q2_K_L (<<< from their own example)

I was not impressed at the first try because the genning speed decreased. I will for sure give it another try though

Anonymous 6/21/2025, 4:12:28 PM No.105661765 [Report]

>>105661746
An empty think block is even worse. It does the thinking and then closes </think> again.

Anonymous 6/21/2025, 4:13:02 PM No.105661772 [Report]

>>105661748
https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF
i think you need specifically quants from this guy he's one of the devs on that fork

Anonymous 6/21/2025, 4:17:18 PM No.105661805 [Report]

*hic*
>>105661786
>>105661786
>>105661786

Anonymous 6/21/2025, 4:17:27 PM No.105661806 [Report]

>>105660636
good for you sis!
there are many valid api enjoyers at >>>/g/aicg
talk to them all about it

Anonymous 6/21/2025, 4:25:52 PM No.105661882 [Report] >>105661908

file.png md5: e17fe1bb...

>>105661432
come on now

Anonymous 6/21/2025, 4:27:26 PM No.105661890 [Report]

>>105661432
This is interesting for feeding it an ongoing fanfiction and then making it write the next chapter.

Anonymous 6/21/2025, 4:29:24 PM No.105661908 [Report]

>>105661882
They partnered with Supermicro.

Anonymous 6/21/2025, 4:37:32 PM No.105661970 [Report]

DipsyAngry.png md5: 10a1ad68...

>>105661192
>>105661221
So R1 is hallicinating.
It should just default to two more weeks.