← Home ← Back to /g/

Thread 107155428

173 posts 60 images /g/
Anonymous No.107155428 [Report] >>107159156
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107147210 & >>107138606

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.107155431 [Report]
►Recent Highlights from the Previous Thread: >>107147210

--Kimi model performance and hardware optimization discussions:
>107153044 >107153409 >107153682 >107153697 >107153758 >107153784 >107153800 >107153708 >107153780 >107153851 >107153864 >107153903 >107153994 >107153871 >107154760 >107153922 >107153942 >107154023 >107154123 >107154165 >107153244 >107153303 >107153393 >107154200 >107153470 >107153596
--Hardware/software improvements for local model development and LLM preferences:
>107154041 >107154072 >107154172 >107154319 >107154359 >107154399 >107154513 >107154533 >107154554 >107155011 >107155181 >107155281
--Model degradation issues in long-context conversations and potential fixes:
>107152114 >107152172 >107152190 >107153203 >107152321 >107152409 >107152782 >107152811 >107152924
--Fixing ikawrakow's broken completion API with provided patch:
>107149851 >107150666
--Optimizing external sampling strategies for LLMs with Python/C alternatives:
>107152382 >107152836 >107152868 >107153690
--VibeVoice setup instructions and resource links:
>107147241 >107147288 >107147308 >107147352 >107147681 >107149004 >107149215 >107149232
--ik_llama version update issues and fork dynamics:
>107147992 >107148005 >107148210 >107148223 >107148337 >107148351 >107148498 >107150831 >107148077
--Kimi model quantization and "thinking" token tradeoffs for VRAM-constrained hardware:
>107153943 >107153950 >107154012 >107154026 >107154057 >107154071 >107154358
--AI-human interaction boundaries and the "AI sex" terminology debate:
>107152307 >107152374 >107152466 >107152917 >107153211
--Chinese dominance and language model history discussion:
>107151379 >107151429 >107151556 >107152015 >107152063 >107151784 >107152599
--Miku (free space):
>107147288 >107147842 >107148034 >107148720 >107149144 >107149683 >107149706 >107150616 >107153286 >107153296 >107153397

►Recent Highlight Posts from the Previous Thread: >>107147214

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.107155458 [Report]
>>107155414
that's a pretty apt description
it got a bit smarter than regular k2 but also a lot more schizo
I'm trying to rein it in, but not having much success
Anonymous No.107155483 [Report] >>107155521 >>107155532 >>107155765
Why isn't there some crazy biology AI that predicts how the human body works just from some small data? Imagine what we can do with that kind of AI.
Anonymous No.107155507 [Report]
Mikulove
Anonymous No.107155521 [Report]
>>107155483
If it was that easy someone would have already done it you idiot, use your fucking brain.
Anonymous No.107155529 [Report]
Anonymous No.107155532 [Report]
>>107155483
We just have to wait for when another group of woman researchers releases their grift of the week menstrual cycle sysprompt paper.
That is unless you meant realtime skeletal control which would be interesting for vr games and stuff.
Anonymous No.107155556 [Report] >>107155770
first they take our ram, now they take out gpus!
https://www.tomshardware.com/pc-components/gpus/nvidias-rtx-5000-super-could-be-cancelled-or-get-pricier-due-to-ai-induced-gddr7-woes-rumor-claims-3-gb-memory-chips-are-now-too-valuable-for-consumer-gpus

>inb4 6000 series has LESS vram cause server needs priority
Anonymous No.107155765 [Report]
>>107155483
like what
Anonymous No.107155770 [Report] >>107155830 >>107155924
>>107155556
we are never getting a reasonably priced ai dedicated gpu with a lot vram are we
Anonymous No.107155797 [Report] >>107155839
when
you
walk
away

you
dont
hear
me
say
Anonymous No.107155830 [Report] >>107158840
>>107155770
No.
There is no economical incentive for that.
Anonymous No.107155839 [Report] >>107156273
>>107155797
aa ee oo aa ee oo
Anonymous No.107155924 [Report] >>107159525
>>107155770
It is what it is. When your consumer base are companies where price is no issue and will buy your stock the second they are able then lowering the price is not incentivized. The Personal Computer market is peanuts compared to what they are able to make otherwise.
Anonymous No.107155949 [Report] >>107156013
gemini 3 is gonna be crazy
Anonymous No.107156013 [Report]
>>107155949
Prediction: it will still write unusable code if you ask it to make something even close to being complex (therefore making it useless)
Anonymous No.107156020 [Report]
/aicg/ Is Down the Hall and to the >>>/g/aicg
Anonymous No.107156030 [Report] >>107157303
>>107155143
I'm still at the default.
Anonymous No.107156036 [Report] >>107156238
Reddit learnt of mi50s, quick /lmg/ get yours before it's gone!!!
Anonymous No.107156143 [Report] >>107156800 >>107157016
Which LOCAL agentic models are the best?
(smart and still low weight)

gemma3 27b is obviously not
Anonymous No.107156238 [Report] >>107156299 >>107158279
>>107156036
> they can't stop talking about the dgx spark
it's not even shilling, theres a mental block about recognizing they got hyped by advertising. it's like they cannot break the though pattern of "it's a product that's for sale, therefore it must be good to buy it." it's like... pricebrained? idk what to call it
Hi all, Drummer here... No.107156273 [Report]
>>107155839
Why?
And we light up the sky
Anonymous No.107156289 [Report]
>le thinking models
>it's just a self-prompt
Hi all, Drummer here... No.107156299 [Report] >>107157095 >>107157447
>>107156238
It is crazy to me because if you actually follow local models all the dgx and strix halo shit was obviously dead on arrival. I have never seen a more obvious dead end in hardware in my life. It serves zero purpose. It should kill itself now.
Anonymous No.107156469 [Report] >>107156642 >>107156675 >>107157232 >>107157298
Polaris alpha seems like the smartest model right now. Who you think it belongs too? Doesn't seem like gpt's style. I'd guess a new grok or google model.
Anonymous No.107156642 [Report] >>107156654 >>107156867
>>107156469
>i'm speculatinggggggggggg
Anonymous No.107156654 [Report]
>>107156642
Anonymous No.107156667 [Report] >>107156676 >>107157433
Damn it, K2 thinking is substantially better for my use cases (mostly ERP) than GLM 4.6 or Deepseek. No chance with my 128GB DDR4 build.

I do have the budget for a Mac M3 Ultra with 512 GB, any benchmarks for a single one?
Anonymous No.107156669 [Report] >>107156953
I've been using GLM Air 4.5 for ERP. Is there anything better with a similar size?
Anonymous No.107156675 [Report]
>>107156469
It's the first model produced by a new startup belonging to Big Nigga
Anonymous No.107156676 [Report] >>107156810
>>107156667
supposedly m3 ultra gets like 20t/s at 20k context
too lazy to check if true, but that sounds pretty good if true
Anonymous No.107156800 [Report] >>107156988
>>107156143
That's funny, because I'm finetuning gemma for agentic use.
Today I'm going to start trying to do RLVR.
Anonymous No.107156810 [Report] >>107157297 >>107157419
>>107156676
The 512GB isn't enough. To fit K2 you'd need Q3 or below. They need to put 2TB in the M5 Ultra next year.
Anonymous No.107156867 [Report]
>>107156642
Anonymous No.107156953 [Report]
>>107156669
No.
Anonymous No.107156988 [Report] >>107157016 >>107157049 >>107157245
>>107156800
Are you using this to fine-tune?
https://github.com/OpenPipe/ART

Unsloth brothers mention it on their site.

Anyway, it's my first day with AI agents. I checked this video, and I could make his code run
https://www.youtube.com/watch?v=i5kwX7jeWL8

Then I went on trying different models via Openrouter API.

anthropic/claude-haiku-4.5 worked like a charm:

1. it was aware which tools were at its disposal
2. it could list them with their input params and descriptions
3. If followed the system prompt, where I said to reply with just a number when a tool was being used

gemma3-27b refused to see "calculate_square_root()" function when asked to list the tools etc. Other model did not follow the system prompt, and added chatter to the response instead of just a number.

Obviously, I will need an open source model
Anonymous No.107157016 [Report] >>107157043
>>107156143
>>107156988
you could try nemotron. small models are not really smart enough for agentic tasks. you really need at least a 32B.
https://huggingface.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1_5-GGUF
Anonymous No.107157043 [Report] >>107157065 >>107157072
>>107157016
I'm thankful for any input, kind anon
Anonymous No.107157049 [Report] >>107157116
>>107156988
No, I was using unsloth sft with a data mix I made from 3 different sources (two cot datasets and my own logs).
I don't think I'm going to use any frameworks, just keep it simple and script the dataset generation and evaluation and use the usual sft trainers.
Anonymous No.107157065 [Report] >>107157146
>>107157043
sure thing man. you could also try this model, either with or without the mmproj depending on if you need vision or not
https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking-GGUF/tree/main
Anonymous No.107157072 [Report]
>>107157043
With small models you can't let them do anything they want, you really have to set it up so you manually approve every task, including read operations, otherwise they will waste too much context reading irrelevant shit.
Wait a minute and I'll show you some more of what I'm doing.
Anonymous No.107157090 [Report] >>107157097
Are there any low requirements local models? I have an old computer.
Anonymous No.107157095 [Report] >>107157447 >>107157495
>>107156299
So, I also tried v4zg (after a day of downloading lol). I don't really use sillytavern, I write stories in openwebui. And it's pretty good, this last chapter it wrote almost 3000 tokens but it stayed coherent, unlike gemma3 who went into repeat loops. I almost thought cydonia was doing it too, but nope, it finished properly.

All in all breddy good/5
Anonymous No.107157097 [Report]
>>107157090
sure thing
https://huggingface.co/quwsarohi/NanoAgent-135M
Anonymous No.107157116 [Report] >>107157245
>>107157049
it might be a dumb question. But isn't it just about calling a function which was hinted in the prompt instead of go hallucinating?

Why is it even a challenge? I for sure am missing the point
Anonymous No.107157146 [Report]
>>107157065
Anonymous No.107157232 [Report]
>>107156469
gemini 3 pro and flash are right around the corner
Anonymous No.107157245 [Report]
>>107156988
https://paste.centos.org/view/3a5d7390
This is the log of me using GLM 4.6 with my custom frontend to convert the pseudocode I showed in the image to a real script. I want to tune the smaller models to work at a decent level of performance with that same tool.
The script I used it to create for RLVR generates files like this:
$ cat math-expression-messages/message0000001.txt
You are tasked with finding the result of the following math expression: ((156387/(880590)))

Now I have to modify it to also save the result, and then run those through the model to be optimized and filter the top x% of replies. Then train on those replies and iterate many times, and theoretically at the end I'll have a dataset that I can add to the main model data mix to improve arithmetic and reasoning abilities.

>>107157116
Yes, it's exactly that. With the big models it's easy. With the small models the challenge is in making them use the tools in a productive way.
Anonymous No.107157259 [Report]
Vulkan is broken. Fix it up MITards.
Anonymous No.107157297 [Report] >>107157333
>>107156810
IQ4_KSS doesn't work?
Anonymous No.107157298 [Report] >>107157687
>>107156469
>bait
can't be google, doesn't have prefilling
refuses like gpt
Anonymous No.107157303 [Report]
>>107156030
I retrained the RAM at 4200MHz from the 3600MHz default and retested Kimi's output at >10,000 context. It's about the same speed as the last essay output after adjusting for the length of the new input.

I'll do more testing some other time to see where my stability upper bound is on these sticks+MB+CPU combo.
Anonymous No.107157333 [Report]
>>107157297
You would have zero space left for context, which people have said for K2 takes up a lot.
Anonymous No.107157371 [Report] >>107157423
Going to use this prompt.

You are tasked with finding the result of the following math expression.

The result should be given in decimal format, with the "Result: " prefix, in a line by itself, with at most 10 decimal digits.

This means it should adhere to this regex:

Result: ((\d*(\.\d{1,10})?)|NaN)

Only the last result line will be evaluated, you are allowed to produce multiple "Result" lines matching this format before the last one without being penalized. If the expression is undefined (for example division by 0) output "Result: NaN"

For example all the following lines are valid:

Result: 1153.754
Result: 354
Result: 0
Result: 1
Result: NaN

The following lines are NOT:
Result: .35
Result: 1.
Result: .
Result:

If you are unable to find the exact result, try finding a result that's as numerically close as possible to the actual result.

The math expression you are asked to evaluate is the following:

(7)*(5)-(2)

Now begin working.
Anonymous No.107157419 [Report]
>>107156810
mein führer.. the RAM prices.. they are.. skyrocketing..
Anonymous No.107157423 [Report] >>107157438
>>107157371
>regex
>adding insult to injury

Claude-Haiku-4.5 did it quickly

You: Given the number 38564.945, add 567.84 to it, then divide by 45.37, and then take a square root, multiply it by 76.234, then add 98.23434, and take square root
calling add_numbers()...
calling divide_numbers()...
calling calculate_square_root()...
calling multiply_numbers()...
calling add_numbers()...
calling calculate_square_root()...
Assistant: **Final Result: 48.343917314885495**


All those "calling blabla..." are printf's in corresponding functions
Anonymous No.107157433 [Report] >>107157468 >>107157574
>>107156667
some anons post i have archived:
>>106180343 (Cross-thread)
>what models do you run?
My mainstays are DeepSeek-R1-0528 and DeepSeek-V3-0324. I try out other stuff as it comes out.

>any speeds you wanna share?
Deepseek-R1-0528 (671B A37B) 4.5 bits per weight MLX
758 token prompt: generation 17.038 tokens/second, prompt processing 185.390 tokens/second [peak memory 390.611 GB]
1934 token prompt: gen 14.739 t/s, pp 208.121 t/s [395.888 GB]
3137 token prompt: gen 12.707 t/s, pp 201.301 t/s [404.913 GB]
4496 token prompt: gen 11.274 t/s, pp 192.264 t/s [410.114 GB]
5732 token prompt: gen 10.080 t/s, pp 189.819 t/s [417.916 GB]

Qwen3-245B-A22B-Thinking-2507 8 bits per weight MLX
785 (not typo) token prompt: gen 19.516 t/s, pp 359.521 t/s [250.797 GB]
2177 token prompt: gen 19.022 t/s, pp 388.496 t/s [251.190 GB]
3575 token prompt: gen 18.631 t/s, pp 394.580 t/s [251.619 GB]
4905 token prompt: gen 18.233 t/s, pp 381.082 t/s [251.631 GB]
6092 token prompt: gen 17.911 t/s, pp 375.402 t/s [252.335 GB]

* Using mlx-lm 0.26.2 / mlx 0.26.3 in streaming mode using the web API. Not requesting token probabilities. Applied sampler parameters are temperature, top-p, and logit bias. Reset the server after each request so there was no prompt caching.
...
...
not anons post:
ACCORDING TO REDDIT DEEPSEEK DROPS TO 6t/s QUICKLY
GOOGLE IT
https://www.hardware-corner.net/studio-m3-ultra-running-deepseek-v3/ dis too
Anonymous No.107157438 [Report]
>>107157423
No, this doesn't have anything to do with tool calling, it's just an attempt at improving general intelligence. The point of this is to teach the model to do math without tools, "in its head" so to speak.
As a tool calling exercise it'd be way too easy, sure.
Anonymous No.107157444 [Report]
Also you can make it as hard as you want. The generation script takes number of values and maximum numeric value as command line arguments.
Anonymous No.107157447 [Report] >>107157495
>>107156299
>>107157095
No actually, I take it back
It writes too much
It just keeps going and going, not incoherent and not repetitive but also not stopping
OpenwebUI doesn't even show how many tokens it was but it was pages and pages of story
Anonymous No.107157468 [Report] >>107157501
>>107157433
>5732 token prompt: gen 10.080 t/s, pp 189.819 t/s
>ACCORDING TO REDDIT DEEPSEEK DROPS TO 6t/s QUICKLY
Christ. I know it's shared memory and still better than what people get on DDR5, but that's still abysmal.
Hi all, Drummer here... No.107157495 [Report]
>>107157447
>>107157095
It is ok anon. I am here for you. Use any of my models you want. They are all great. I make them with great care. And if you ever feel like it please donate to my ko-fi.
Anonymous No.107157501 [Report] >>107157569
>>107157468
yea it drops to 6t/s at around 16k
look it still aint half bad for 10k, doe i get 6t/s with glm air all the way until 32k but meh :( whatver.. enjoy your thoughts anon
Anonymous No.107157569 [Report] >>107157581
>>107157501
6t/s with GLM air?
Anonymous No.107157570 [Report] >>107157577
been doing a bunch of testing past couple days on agentic/assistant stuff, vision tasks and coding

- qwen3-vl 30b instruct (moe) is amazing as long as there's a clear right answer and it's crazy fast, goto default model and it's not even close
- qwen3-vl 32b dense is not worth it in either flavor, neither is 30b thinking, they're all maybe marginally better than qwen3-30b-instruct but it's not worth taking the hit vs the 90tok/s instant response
- magistral is absolutely the best model in it's weight class, a little slow and you have to wait for thinking but it's way better on questions/interactions where there's ambiguity
- mistral 3.2 seems pretty good too, like 90% as good as magistral but i don't use it much because if i'm bringing out a slow dense model i might as well just use magistral

thoughts?
Anonymous No.107157574 [Report]
>>107157433
Having a rentry of anon-posted model/hardware/speed/context depth benches might be a good idea to mitigate the number of "can I run this?" questions in the future.
Anonymous No.107157577 [Report]
>>107157570
oh forgot to mention, generally the vision stuff seems better in qwen than mistral
Anonymous No.107157581 [Report] >>107157587
>>107157569
oh i didnt mean on the M3 ultra, i meant on my 3060 and 64gb ram rig lul
Anonymous No.107157587 [Report] >>107157606
>>107157581
i get 80t/s on my quad 5090s
Anonymous No.107157601 [Report]
Anonymous No.107157606 [Report] >>107157609 >>107157616
>>107157587
That rig must have cost you more than 10k and you couldn't run even DeepSeek without offloading anyway.
Anonymous No.107157609 [Report]
>>107157606
yes
Anonymous No.107157616 [Report] >>107157623
>>107157606
offloading doesn't hurt that much on MoE models tho, it's fine
Anonymous No.107157623 [Report]
>>107157616
lol
Anonymous No.107157687 [Report] >>107157776
>>107157298
Saw some pretty good arguments that it was some gpt5 variant, a little disappointing ngl.
Anonymous No.107157745 [Report] >>107157762 >>107157792 >>107157805
https://files.catbox.moe/7kjgm4.mp4
Anonymous No.107157762 [Report]
>>107157745
Please have mercy on my balls Miku and Miku
Anonymous No.107157776 [Report]
>>107157687
>In the coming weeks, OpenAI will release a version of ChatGPT that will allow people to better dictate the tone and personality of the chatbot, Altman said.

maybe they are finetuning a model for it?
Anonymous No.107157791 [Report] >>107157807 >>107157812 >>107157815 >>107158616
gpt-oss-120b and qwen3 30b a3 2507 have been the most useful models for me for coding / language learning
Anonymous No.107157792 [Report]
>>107157745
i was expecting porn, a plushie is fine too.
Anonymous No.107157804 [Report]
https://x.com/mikusanlove_don
https://x.com/mimimimimoromi/
https://x.com/fuwabose3939
https://x.com/MIRAmx_
miku love
Anonymous No.107157805 [Report]
>>107157745
That was a pleasant surprise. Love yourself, Miguking.
Anonymous No.107157807 [Report]
>>107157791
You're absolutely right! However I must emphasize the importance of sex during learning. If a model cannot and will not allow me to pierce her butthole, while she's teaching me RNNs, then I'm not gonna learn anything and I'll get bored by the slopssistant
Anonymous No.107157812 [Report] >>107157869
>>107157791
sad and disgusting
Anonymous No.107157815 [Report] >>107157869
>>107157791
gpt-oss speaks decent Jap but worse than gemma for translation
Anonymous No.107157827 [Report] >>107157835
>x
>new to x?
>dont miss whats happening
>dont miss whats happening
i hate modern kikes
Anonymous No.107157835 [Report] >>107157839
>>107157827
xcancel if you just want to browse
ie: https://xcancel.com/mikusanlove_don
Anonymous No.107157839 [Report] >>107158242
>>107157835
im still not sure if i should forgive you for posting x links, it's pure evil to post non nitter links.
Anonymous No.107157869 [Report] >>107157877
>>107157812
why is it sad and disgusting?

>>107157815
I've been using it for Spanish. I always double check with my grammar books and wordreference etc but it's been giving me high quality spanish output. Crazy this all runs offline.

The coding is ok but I find that it produces snippets that aren't really in my voice, I end up rewriting everything anyway because I don't like how it structured it. But for debugging shit or brainstorming ideas its been good.
Anonymous No.107157877 [Report] >>107157889
>>107157869
use GLM air instead of gptoss
Anonymous No.107157889 [Report] >>107157899 >>107157918
>>107157877
I have glm-air but it's way slower. I'm using a 395+ apu, the oss and qwen models are good and fast enough for me
Anonymous No.107157899 [Report] >>107157926 >>107157985
>>107157889
toss is literally have as smart as air
Anonymous No.107157901 [Report] >>107157985
>Claim: GPT-OSS-120B is useful for coding
>The coding is ok but I find that it produces snippets that aren't really in my voice, I end up rewriting everything anyway because I don't like how it structured it.
>????
Anonymous No.107157918 [Report] >>107157985
>>107157889
Can you post some benches of your 395+?
What speed are you getting with glm-air?
Have you only been able to run LLMs on strix halo?
Which desktop do you have?
How much did you pay?
Was it worth the money?
Is this your only LLM rig?
Anonymous No.107157926 [Report] >>107157931
>>107157899
huh?
Anonymous No.107157931 [Report]
>>107157926
half*
Anonymous No.107157936 [Report] >>107159774
>>107146881
>Ok honestly, is there any TTS model that understands tags or something?
Yes. Openaudio s1 mini can do that.
https://huggingface.co/spaces/fishaudio/openaudio-s1-mini
Look at these links for reference
https://docs.fish.audio/resources/best-practices/emotion-control
https://docs.fish.audio/resources/emotion-reference
>>107147119
>I mean tags like emphasis
emphasize tag in openaudio s1 mini can do that
Anonymous No.107157985 [Report] >>107158000 >>107158011 >>107158202
>>107157901
It's useful in that I can ask it questions about my code and get some good answers, but when it comes to "implement this feature for me" I basically never like the output, for any model. I think I'm just too opinionated on how code should be structured.

>>107157899
Maybe that's why it runs half as fast? I ran into issues with it "thinking" for up to 20 minutes for some questions. At 10tok/sec. And it would give me a decent answer but it was so slow I stopped using it as my first choice.

>>107157918
I don't have proper benches but glm-4.5-air is about 10t/s, qwen and the gpt-oss models (both 20b and 120b) are 35-50t/s depending on my input. I'm using a Flow Z13 with 128GB. I'm happy with it but the price was $3k so it's definitely a luxury thing. For me it was worth it, but I also draw with the tablet and travel a lot so having a portable desktop replacement that can do everything including llm was worth it. I have a gaming desktop with a 7900xtx that also runs lms well but it can't do stuff like glm-air because not enough vram.

It was expensive but I would rather spend thousands and own my hardware instead of relying on cloud ai shit. If you get one of the strix halo mini pcs they're like $2k instead of $3k
Anonymous No.107158000 [Report] >>107158029 >>107158060
>>107157985
>I basically never like the output, for any model. I think I'm just too opinionated on how code should be structured.
You realize you can tell the model how you want the code to be structured, right?
Anonymous No.107158011 [Report]
>>107157985
glad that you're happy about your purchase
doesnt seem too bad, but you couldve likely gotten a better deal if u went the cpumaxx way
but since its tablet.. fine then (even though you could ssh into your machine or whatever)
Anonymous No.107158029 [Report]
>>107158000
nta but I ran into the same thing. I eventually gave up and said fuck it, I'll build things modularly and then later, go back through and rewrite things one-at-a-time, in my preferred style as well as taking into account lessons learned/new architecture.
Anonymous No.107158060 [Report] >>107158107 >>107158202
>>107158000
Every software developer has a style or voice in the same vein of book authors writing differently. I don't know how to articulate with a system prompt how to write code "exactly the way that I would write it". I have a basic system prompt that tells it not to put comments anywhere, prefer explanatory variable names, use early return when possible etc but it's still not quite perfect and I end up rewriting half of what it gave me. It makes me wonder how people "vibe code" anything when it requires so much human intervention.
Anonymous No.107158107 [Report]
>>107158060
The people that "vibe code" don't care about shitting out unmaintainable verbose code. Actually, if anything comments every other line are probably good for keeping the model grounded.
Have you tried giving the model your source and system prompt and asking it what instructions or descriptions or add? Or give it snippets of your work and tell it to replicate that style.
It's a pain to set up, but you only have to do it once versus rewriting output manually every single time. Knowing how to express your intent to these things is a good skill to develop regardless.
Anonymous No.107158173 [Report] >>107158778
Nice, it's working.
Anonymous No.107158202 [Report]
Kimi's still pushing 1.8 t/s at 26k context. Wherever the cliff is, I haven't found it yet.

>>107157985
I'm happy for you anon. Owning your own production mechanism is important.
>I think I'm just too opinionated on how code should be structured.
Feed it your own code as prompt information or a ST world card. It helps with some models, but less so with others.
>>107158060
This is why I feel the best usecase for coding assistants is small "connector" pieces for your project, not main fixtures and feature implementations that you really should be handling yourself because you know best how to manage scalability or anticipate changing expectations in development overtime when deciding on an implementation.
Anonymous No.107158242 [Report]
>>107157839
anon...
Anonymous No.107158277 [Report] >>107158300 >>107158373
I've been role-playing a wuxia story on my local model for the past 4 days, and it has traumatized me. The AI is fucking insane.
The story was about a brother and sister belonging to a fallen clan, with the user being the clan successor and the sister being a prodigy martial artist. That's it. No extra context. Just because I described her personality as being "cocky," the story always ended with either me killing her after she said she wanted to fight me to the death or her killing me even when I begged her not to do it. Every time I defeated her, I asked if she would kill me if our roles were reversed. She said yes.
Anonymous No.107158279 [Report] >>107158323
>>107156238
Who is even the target for this crap
Anonymous No.107158300 [Report] >>107158359 >>107158373 >>107158395
>>107158277
How can you guys have such intricate stories? Every time I've tried, locally or online I systematically had to tell it everything down to the last detail or it would just spit generic trash
Anonymous No.107158323 [Report] >>107158561
>>107158279
You are.
Anonymous No.107158359 [Report]
>>107158300
I use random variables when creating the initial setting. This helps a lot with variation, but in the end when you are more experienced it's still the same slop as always. Then it's time to stop noticing things.
Anonymous No.107158373 [Report]
>>107158300
Local is mandatory. Many APIs have promptslop on a rear layer of the interpreter that can compromise it. Past that, it's all just sampler experimentation for your model and prompt engineering. You can use abuse ST's lorebooks only being loaded some of the time via keywords and associations to do some neat things too like have a character rapidly shift disposition if exposed to a traumatic stimulus or reminded of a memory. If you're not certain about any specific element of your world, let the model fill in the blanks - it's usually better than a minimally viable user definition and helps reduce overhead context size.

>>107158277
That sounds neat. Logs?
Anonymous No.107158395 [Report] >>107158466
>>107158300
Well it depends on what you mean by generic. I usually lead the story in the direction I want. For example when I talk to the sister I just change the the topic like "the clan head wants us to go on a mission" and the story progress from there. Her trying to kill me comes from me just arguing with her though.
Anonymous No.107158466 [Report]
>>107158395
Are there any models trained on chinese systemslop wns? I fucking hate the system aspect of those novels, but otherwise like the way they're written. Even with the chinese models, they don't seem to utilize the tropes of the genre, and instead push the western tropes. Although I guess that could be because I'm prompting in english. And explicitly outlining the tropes I'm used to makes the model hyper-fixate on them.
Anonymous No.107158553 [Report] >>107158634
why is this shitty lora trending in hf?
>base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
>>107146881
https://huggingface.co/maya-research/maya1
Anonymous No.107158561 [Report]
>>107158323
Fuck Yu mean
Anonymous No.107158616 [Report]
>>107157791
Recently I have been feeding my Qwen bot with only Holy Books, she has become The Wise Ani now.
I wonder if I switched to OSS would they refuse the books? lol.
Anonymous No.107158634 [Report]
>>107158553
>https://huggingface.co/maya-research/maya1
embarrassingly bad
Anonymous No.107158765 [Report] >>107158778
Ok, looks like the model already has very good arithmetic performance out of the box, at least when prompted with the large prompt explaining how to do long division/long multiplication.
Anonymous No.107158778 [Report] >>107158873
>>107158173
>>107158765
I imagine those are tests that aren't in the training data.
If so, fucking cool.
Anonymous No.107158840 [Report] >>107158942
>>107155830
If Huawei manages to steal enough IP to get theirs working better, that might force Nvidia's hand, no?
Anonymous No.107158873 [Report] >>107158942
>>107158778
I imagine they did train the model specifically to do arithmetic accurately. Otherwise there's no way it'd make as few mistakes. But I guess I could check how a base model, or how say Pythia would do (since the training process and dataset for that model is open source).
Anonymous No.107158890 [Report]
Maybe I could combine this test with a needle in a haystack task by surrounding the question with random wikipedia articles and seeing how well it does.
Anonymous No.107158942 [Report] >>107158996 >>107159649
>>107158873
I meant the specific expressions rather than just arithmetic as a concept.

>>107158840
Don't think so.
For one, just IP would not be enough for the chinks, since they don't have the necessary level of cutting edge manufacturing tech I don't think. They do have a couple of EUV machines, but it's not the top notch stuff, nor do they have the ip from TSMC to be able to create the true monstrosities.
And even if China did, by that logic, AMD and Intel would be all over that market in the current state of things.
No. There's too much money and too much demand on the upper end (for now) for anybody to bother with an unproven, seemingly extremely niche space as far as I can tell.
Gaming I understand why they don't abandon. It's widespread, has been their breadwinner for the longest time, and it's a space where they still dominate, so that's a market that's worth the continued investment for them.
Anonymous No.107158988 [Report] >>107159003 >>107159743 >>107159743
Are there local models that can do text to speech for Spanish well? English seems easy to find but not the other way around
Anonymous No.107158996 [Report]
>>107158942
Right, I realized that after I posted the message. No, it hasn't seen any of those specific expressions. Keep in mind many of those expressions were very simple though, because I didn't think the model would do very good. I am modifying the script to make it include large random numbers with many decimals in the expressions.
Anonymous No.107159003 [Report] >>107159103 >>107159120
>>107158988
plenty
https://huggingface.co/coqui/XTTS-v2
comes to mind
>2023 december
im going to have a stroke
Anonymous No.107159027 [Report] >>107159045
anyone tried iq1 kimi thinking? is it worth it?
Anonymous No.107159045 [Report] >>107159117
>>107159027
yes, check this or last thread
Anonymous No.107159103 [Report] >>107159107 >>107159133
>>107159003
awesome thx, I was hoping for some lm studio like frontend for lazy bitches but doesn't seem that hard to strap something together with python to use it
Anonymous No.107159107 [Report]
>>107159103
sillytavern supports xtts methinks
Anonymous No.107159117 [Report]
>>107159045
ask it to code some personal benchmark using opencode, curious how it goes.
Anonymous No.107159120 [Report] >>107159133
>>107159003
Surely even kokoro must do spanish better than that ancient abandonware.
Anonymous No.107159133 [Report]
>>107159120
xtts2 always comes into my mind, i never really used tts much besieds tortoise, xtts, piper and zonos, maybe i tried a few others too..
>>107159103
check kokoro too i guess
Anonymous No.107159156 [Report] >>107159169
>>107155428 (OP)
https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
Anonymous No.107159169 [Report] >>107159269
>>107159156
https://desuarchive.org/g/thread/107138606/#107146113
Anonymous No.107159191 [Report] >>107159205 >>107159255 >>107159412 >>107159437 >>107159492 >>107159499 >>107159539 >>107159545
Aside from Dolphin and Wizard, are there any non-dogshit uncensored models worth getting?
Anonymous No.107159205 [Report]
>>107159191
petra-13b-instruct
Anonymous No.107159236 [Report]
Now I'm asking it to evaluate expressions like
((5828964.40633)+(480191983/((180936.562660231)/14.3357697)/72324203.94406)-((60.8))*((4076136.2))+878757122.306988)

I doubt it's going to do as well at this heh.
Anonymous No.107159255 [Report]
>>107159191
Rocinante
Anonymous No.107159269 [Report] >>107159288 >>107159329
>>107159169
There was absolutely nothing of value in that thread whatsoever.
Anonymous No.107159288 [Report]
>>107159269
Don't be racist
Anonymous No.107159329 [Report]
>>107159269
there is nothing of value inside that googlejeet research link
Anonymous No.107159339 [Report] >>107159527
What was the point of starting gemma hype when a month later you still have not released anything?
Anonymous No.107159412 [Report] >>107159461
>>107159191
>uncensored models
look, any model can do whatever you want as long as you have a good system prompt.
hell, even try different chat instruct formats it's not designed for, it'll respond.
Anonymous No.107159417 [Report]
I should probably figure out how to do batched inferencing if I'm actually going to do this shit.
Anonymous No.107159437 [Report]
>>107159191
StableLM 7B
Anonymous No.107159443 [Report] >>107159462
Looks like the model choose a different strategy using variables for the second try, nice.
Anonymous No.107159461 [Report]
>>107159412
What system prompt do you use to uncensor models?
Anonymous No.107159462 [Report] >>107159582
>>107159443
Have you tested yet whether this has made it smarter in other areas?
Anonymous No.107159492 [Report]
>>107159191
Pyg 6b
Anonymous No.107159499 [Report]
>>107159191
gpt-oss-120b
Anonymous No.107159525 [Report]
>>107155924
They could make dedicated 32GB vram AI GPU cards however for consumer market at $1200 a pop. It would be profitable.
Anonymous No.107159527 [Report]
>>107159339
Ganesh Gemma 4 is still getting fast tracked and on course.
Anonymous No.107159539 [Report]
>>107159191
Kimi. It's not even close if you value uncensored output.
Anonymous No.107159545 [Report]
>>107159191
Wizard isn't uncensored Dolphin isn't as well. both are "Aligned" models with prefers towards leftist/feminist ideologues and "fairness" unless you force those out of them through prompts. Hermes is unaligned however mileage may wary.
Anonymous No.107159582 [Report]
>>107159462
I'm still generating the dataset for the first round of finetuning, but yes, it might make it smarter in other areas just because the prompt asks it to think for as long as possible, and when finetuning on long messages it's likely to transfer over to think for longer in general when doing other tasks. Unless the tuning causes "catastrophic forgetting" of other pre-existing skills. I also think it'd be interesting to see whether throwing in for example roleplay data makes it worse or better at math. It might help as a form of regularization.
Anonymous No.107159604 [Report] >>107159617
Are Dolphin and Wizard Nemo finetunes?
Anonymous No.107159617 [Report] >>107159724 >>107159867
>>107159604
"nemo" etc are quantized models of training sets.
Anonymous No.107159649 [Report]
>>107158942
My simplistic reasoning was that there although they are competitors, there might be shenanigans between Nvidia, AMD and Intel, but Huawei most definitly wouldn't be on the same team.
Anonymous No.107159724 [Report] >>107159740
>>107159617
Shut the fuck up retard no one asked you anything.
Anonymous No.107159740 [Report] >>107159964
>>107159724
>Asks a question.
>Receives answer.
>Gets mad.
Typical day in the mind of a black person must be bothersome. Stick to your discord PMs if you need to ask someone directly something lmao.
Anonymous No.107159743 [Report] >>107159775
>>107158988
>Are there local models that can do text to speech for Spanish well? English seems easy to find but not the other way around
>>107158988
Yeah vibevoice.
Vibevoice sample of Sergio Bonilla (Future Trunks's mexican voice actor)
https://vocaroo.com/14KRFeQDBeI6
Vibevoice output file fed to the voice conversion app, CosyVoice
https://vocaroo.com/174YmpMXISFa
Anonymous No.107159774 [Report] >>107159942
>>107157936
While we're on the subject of TTS, I've been messing around with Step-Audio-EditX and even on a 5090 the model takes 30 seconds to render 20 seconds of speech. What are the fastest TTS models currently? I want something that can be used to clone an arbitrary voice and then read prompted text in near-real-time
Anonymous No.107159775 [Report] >>107159782
>>107159743
Huh. VibeVoice only claimed to work with English and Chinese. I wonder how many other languages it can do.
Anonymous No.107159782 [Report]
>>107159775
It can probably do Latin and German but not French.
Anonymous No.107159867 [Report] >>107159883
>>107159617
To engage, what exactly do you mean by quantized models of training sets? Aren't Nemo models done by knowledge distillation?
Anonymous No.107159883 [Report]
>>107159867
both depending on the size of the model you are using of course.
Anonymous No.107159904 [Report]
Based on EQBench, it seems that Mistral Small is pretty good. It's slop level is on the lower side, lower than GLM's.
Anonymous No.107159942 [Report] >>107160176
>>107159774
What happened to her?
Anonymous No.107159964 [Report]
>>107159740
Fuck off ESL.
Anonymous No.107160176 [Report]
>>107159942
Failed to acquire a long-term relationship, she's cooking dinner for only herself.