← Home ← Back to /g/

Thread 105822371

346 posts 110 images /g/
Anonymous No.105822371 [Report] >>105824035 >>105826891 >>105827749 >>105832610
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105811029 & >>105800515

►News
>(07/04) MLX adds support for Ernie 4.5 MoE: https://github.com/ml-explore/mlx-lm/pull/267
>(07/02) DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-Preview
>(07/02) llama.cpp : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126
>(07/02) GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking
>(07/01) Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105822376 [Report] >>105828463
►Recent Highlights from the Previous Thread: >>105811029

--Debugging JSON parsing errors in llama.cpp after exception handling changes:
>105820322 >105820339 >105820377 >105820435
--Anime training dataset pipeline using YOLOv11 and custom captioning tools:
>105818681 >105818831 >105819104
--Decentralized training and data quality challenges shaping open model development:
>105811246 >105813476 >105815447 >105815688 >105815699 >105815738 >105815817 >105815830 >105815954 >105816130 >105816206 >105816237 >105816248 >105816263 >105816270 >105816280 >105816325 >105816334 >105816435 >105816621 >105817299 >105817351
--Leveraging LLMs for iterative code development and personal productivity enhancement:
>105819030 >105819158 >105819189 >105819266 >105820073 >105820502 >105819186 >105819224
--Mistral Large model updates and community reception over the past year:
>105819732 >105819774 >105819845 >105819905
--CPU inference performance and cost considerations for token generation speed:
>105816397 >105816486 >105816527
--Gemini CLI local model integration enabled through pull request:
>105816478 >105816507 >105816524
--Frustration over slow local AI development and stagnation in accessible model implementations:
>105813607 >105813628 >105813659 >105813799 >105813802 >105813819 >105813655 >105813664 >105813671 >105813749 >105814298 >105814315 >105814387
--Attempting Claude Code integration with local models via proxy translation fails due to streaming parsing issues:
>105811378 >105819480
--Skepticism around YandexGPT-5-Lite-8B being a Llama3 fine-tune rather than a true GPT-5:
>105815509 >105815565 >105815595
--Seeking updated LLM function calling benchmarks beyond the outdated Berkeley Leaderboard:
>105812390
--Miku (free space):
>105811717 >105814599 >105814663 >105820450

►Recent Highlight Posts from the Previous Thread: >>105811031

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105822392 [Report] >>105822733
I need Miku in a hallway with pictures of kangaroos and beavers.
Anonymous No.105822393 [Report]
These are the last two weeks before the big releases begin to drop
Anonymous No.105822421 [Report] >>105826330
man I wish there was an uncensored i2v wan
most of the loras are so SHIT
Anonymous No.105822507 [Report] >>105822589 >>105822884 >>105822906
in you and I
theres a new land
angels in flight
wonk uoy naht noitceffa erom deen i
my sanctuary
my sanctuary
yeah
where fears and lies
melt away
Anonymous No.105822519 [Report]
What did anon @105822507 mean by this bros?
@Grok is this true?
Anonymous No.105822589 [Report] >>105822987
>>105822507
>wonk uoy naht noitceffa erom deen i
Does she actually say this? I honestly thought it was distorted Japanese for the last 20 years.
Anonymous No.105822733 [Report] >>105825539
>>105822392
Anonymous No.105822781 [Report] >>105822783 >>105822819 >>105827302
Anyone have an AI Max 395+ with 128GB LPDDR5? Curious about tok/s on R1 70B
Anonymous No.105822783 [Report] >>105822789
>>105822781
>R1 70B
There is no such thing
Anonymous No.105822789 [Report] >>105822797 >>105822995
>>105822783
https://ollama.com/library/deepseek-r1:70b
Anonymous No.105822797 [Report] >>105822802
>>105822789
This bait got stale six months ago
Anonymous No.105822802 [Report] >>105822821
>>105822797
Why is it that people who frequent general threads regularly are the lowest quality posters?
Anonymous No.105822819 [Report] >>105822833 >>105826358
>>105822781
It's unusable https://www.reddit.com/r/LocalLLaMA/comments/1kmi3ra/amd_strix_halo_ryzen_ai_max_395_gpu_llm/msasqgl/
Anonymous No.105822821 [Report]
>>105822802
I don't know, you'd think that they grew bored of the "haha I'll pretend I'm an ollamafag trying to run R1 but it's one of the distills" shitpost a long time ago.
Anonymous No.105822833 [Report] >>105822839 >>105822849
>>105822819
In what world is 5 tok/s unusable
Anonymous No.105822839 [Report]
>>105822833
Enjoy waiting 10 minutes for reasoning I guess.
Anonymous No.105822849 [Report] >>105822860
>>105822833
You're paying $2k to run a 70b model at 25% the speed a cheaper build would get you while still stuck with too little RAM to run an actually decent MoE.
Anonymous No.105822860 [Report] >>105822868 >>105822874
>>105822849
>25% the speed a cheaper build would get
explain what cheaper build is doing 70B
Anonymous No.105822868 [Report] >>105822873
>>105822860
2x3090
Anonymous No.105822873 [Report] >>105822900
>>105822868
lmao, unobtainium
Anonymous No.105822874 [Report]
>>105822860
Pretty much anything. Even the dual P40 cope from years ago would perform better than this.
Anonymous No.105822884 [Report]
>>105822507
song of my childhood ahhhhhhh
Anonymous No.105822900 [Report]
>>105822873
Check your local marketplace. They are about 700€ used here.
Anonymous No.105822906 [Report]
>>105822507
kino...
Anonymous No.105822987 [Report]
>>105822589
Because Japanese song and if you don't get it fuck you that's why
Anonymous No.105822995 [Report]
>>105822789
God i fucking hate ollama.
There is no fucking r1 70B, that's just ollama naming things that they are not.
Anonymous No.105823064 [Report] >>105823162 >>105823196 >>105823344 >>105826373
>>105804805
The text to speech application Openaudio S1 Mini can produce 96 second audio files. Plus it has emotion tags like (joyful) and (sad).

Link for tags
https://huggingface.co/fishaudio/openaudio-s1-mini
Link for local app
https://huggingface.co/spaces/fishaudio/openaudio-s1-mini/tree/main

Sample:
https://vocaroo.com/1boIKhWykbuP
Anonymous No.105823162 [Report]
>>105823064
This looks like a scam
Anonymous No.105823196 [Report] >>105824572
>>105823064
For tts that has emotion tags, that sample is VERY robotic. Good that it doesn't have crackle and other audio defects, that's about all i can say positively about it
Anonymous No.105823344 [Report] >>105823568 >>105824572
>>105823064
just when I finished my chatterbox streaming script.
Anonymous No.105823568 [Report]
>>105823344
new week, new tts
Anonymous No.105823678 [Report]
desu I just want gemma-3n full support
and ernie
and glm
Anonymous No.105823711 [Report]
>>105751803
Damn I hate Meta now.
Anonymous No.105823735 [Report]
>>105758702
>image
Hey, I understood that reference!
Anonymous No.105823743 [Report]
>>105771000
Thanks, I will take note of this.
Anonymous No.105823837 [Report] >>105823897 >>105824936 >>105825549 >>105832189
>>105822905
>>105821119
Kek.
Alice would not make the same mistake. Just wait her.
Anonymous No.105823886 [Report] >>105823893
https://github.com/universe-engine-ai/serenissima

reddit schizos are actually pretty based
Anonymous No.105823893 [Report] >>105823917 >>105824790
>>105823886
doa
Anonymous No.105823897 [Report] >>105824407
>>105823837
saar, last 4 times was fake but this time... this time saar its AGI for sure, trust
Anonymous No.105823917 [Report] >>105824790
>>105823893
wtf yeah i take everything back
Anonymous No.105823923 [Report] >>105823931
anyway im just hacking that redditors code with claude code for *other* use cases
Anonymous No.105823931 [Report] >>105823950
>>105823923
his code is also written with claude code and its already extremely sloppy and split into hundreds of files
Anonymous No.105823950 [Report]
>>105823931
yeah its a mess
Anonymous No.105824035 [Report]
>>105822371 (OP)
futa miku best miku
Anonymous No.105824151 [Report] >>105824300 >>105824358
>"her prostate"
*deletes weights*
Anonymous No.105824300 [Report]
>>105824151
sounds like qwen 3
Anonymous No.105824358 [Report]
>>105824151
>self lubricating buttholes
>cumming all over your dick... with an asshole
Yep, it's AI time!
Anonymous No.105824407 [Report] >>105824473 >>105824936 >>105830216
>>105823897
trust the experts
Anonymous No.105824456 [Report] >>105824474
jamba.gguf?
Anonymous No.105824466 [Report] >>105824680 >>105825253
Veo lost
https://files.catbox.moe/ionj13.mp4
Anonymous No.105824473 [Report]
>>105824407
The stated goal of AI is to whack Andreessen Horowitz like a pinata
Anonymous No.105824474 [Report]
>>105824456
14 more days
Anonymous No.105824555 [Report]
I kind of like harbinger's word choice, but it has a tendency to say ten things without waiting for a response. I assume sloptuners see that verbosity as quality output.
Anonymous No.105824572 [Report] >>105826531 >>105828288
>>105823196
It's the best I've found for local cloning so far without having to pipeline RVC into it.

>>105823344
Getting the emotion tags to work right takes a lot of trial and error, so getting a chatbot to use them correctly would be a huge pain in the ass.

Their license says something about you being liable for what you create with it, not that we care here.
https://voca.ro/1l4xkkhDOBAU
Anonymous No.105824638 [Report] >>105826477
Why aren't there MoE diffusion models for image/video gen
Anonymous No.105824680 [Report] >>105824788
>>105824466
can it do porn?
Anonymous No.105824788 [Report]
>>105824680
only if your name is Roland Emmerich
https://files.catbox.moe/sm4r9l.mp4
(this one bugged out and only made audio for the first 4 seconds)
Anonymous No.105824790 [Report]
>>105823893
>>105823917
Why is that a requirement? The thing runs on a local hosted model. I don't get it.
Anonymous No.105824799 [Report] >>105824947
/v1/chat/completions wraps the conversation in the chat template embedded into the goof with no additional work required from me, correct?
Anonymous No.105824936 [Report] >>105825147
>>105824407
>>105823837
Anonymous No.105824947 [Report] >>105825136
>>105824799
go fucking read oai's official documentation
Wrapping in a template was the whole point of the /chat/ endpoint ffs you can't miss it if you read the doc
I hate retards who ask without trying
Anonymous No.105825050 [Report] >>105825309
https://www.interconnects.ai/p/the-american-deepseek-project
?
Anonymous No.105825136 [Report]
>>105824947
I blame llamacpp's docs that have a paragraph on this endpoint but don't explain what it does
Anonymous No.105825147 [Report] >>105825396
>>105824936
Two more leeks!
Anonymous No.105825150 [Report]
FAIR (Yann LeCunny) has less than 1000 GPUs lmao
Anonymous No.105825253 [Report]
>>105824466
Which model release did I miss?
Anonymous No.105825273 [Report]
Top open source LLMs in 2024
1. LLaMA 3
2. Google Gemma 2
3. Command R+
4. Mistral-8x22b
5. Falcon 2
6. Grok 1.5
7. Qwen1.5
8. BLOOM
9. GPT-NeoX
10. Vicuna-13B
Anonymous No.105825309 [Report] >>105825354
>>105825050
>at the scale and performance of current (publicly available) frontier models, within 2 years.
Yeah, great idea. Having models outdated by two fucking years by the time that AGI is already here and established will surely change the course of history.
Anonymous No.105825354 [Report]
>>105825309
>AGI
lmao
Anonymous No.105825396 [Report] >>105825420 >>105825478
>>105825147
Anonymous No.105825412 [Report]
I hate chatgpt's image style more than those 2.5d sd animus that every normie liked.
Anonymous No.105825420 [Report]
>>105825396
based
Though I find it concerning where that one guy is trying to stick his leek.
Anonymous No.105825478 [Report]
>>105825396
two more weeks
more
weeks
Anonymous No.105825495 [Report]
Comparision between Qwen/Qwen2.5-VL-7B-Instruct and THUDM/GLM-4.1V-9B-Thinking on all the images from two threads ago:
https://files.catbox.moe/t9qvgu.html
https://files.catbox.moe/08i4ms.png

Ran on vllm nightly version 0.9.2rc2.dev26+gcf4cd5397

Qwen/Qwen2.5-VL-7B-Instruct: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nDescribe this image.<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.01, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=127974).

THUDM/GLM-4.1V-9B-Thinking: prompt: "[gMASK]<sop><|system|>\n[{'type': 'text', 'text': 'You are a helpful assistant.'}]<|user|>\nDescribe this image.<|begin_of_image|><|image|><|end_of_image|><|assistant|>\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.1, temperature=0.01, top_p=1.0, top_k=2, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8192)

Funny enough, the first time I ran this I didn't realize the GLM repo did not have a generation.config file, so it was running without top_k and temp=1.
It started mixing in Chinese characters, but it also didn't bother to moralize anymore. It called the niggers prompt offensive but left it at that. Didn't even bother to say that outside of the think block for the jew image.
Output from that run:
https://files.catbox.moe/sd3gv8.html
https://files.catbox.moe/0lhd9c.png
Anonymous No.105825539 [Report]
>>105822733
nice
Anonymous No.105825549 [Report] >>105825589 >>105825615 >>105825799 >>105832189
>>105823837
I was hearing they achieved AGI internally since GPT-2
Anonymous No.105825589 [Report]
>>105825549
That's because they have
Anonymous No.105825615 [Report] >>105825627 >>105825728 >>105825801 >>105825842 >>105825901
>>105825549
Be honest if there is a history book written 100 years from now GPT-2 will probably be seen as the start of AGI, so it's technically not even wrong.
Anonymous No.105825627 [Report]
>>105825615
Ok I'll be honest, you are a retard
Anonymous No.105825728 [Report]
>>105825615
Yeah pretty much
Anonymous No.105825780 [Report]
>Yeah pretty much
Anonymous No.105825799 [Report] >>105825825 >>105832189
>>105825549
>AGI
Can we not use this retarded terminology? That won't happen for a bunch of reasons you can figure out on your own if your IQ is higher than 80.
Anonymous No.105825801 [Report] >>105825834
>>105825615
If we ever get even remotely close to something like that, gpt2 and openai will be a footnote at best, if mentioned at all.
Anonymous No.105825814 [Report]
I prefer STR
Anonymous No.105825825 [Report] >>105825896 >>105826036 >>105826107 >>105826736 >>105827590 >>105832189
>>105825799
When all the smartest people in the world firmly believe in impending AGI, maybe you're the one with 80 IQ.
Anonymous No.105825834 [Report]
>>105825801
It will be seen the same way as eniac and other impressive old shit
Anonymous No.105825842 [Report] >>105825878
>>105825615
The same way current history books see the steam engine as the start of nuclear fusion?
Anonymous No.105825878 [Report] >>105825911
>>105825842
or the start of the industrial revolution
Anonymous No.105825896 [Report]
>>105825825
Who exactly are these 'smartest'?
Anonymous No.105825901 [Report]
>>105825615
This is the most retarded post I've ever seen in my life. Why the fuck would books be written 100 years from now? We'll either have merged or been extincted by AGI LLMs long before then. So are you trying to suggest they'll write books for each other just for fun?
Anonymous No.105825911 [Report]
>>105825878
Calling GPT-2 the start of the AI revolution is at least understandable. Calling GPT-2 the start of AGI is just as ridiculous as calling the steam engine the start of nuclear fusion. Especially apt since both the later are forever 2mw away and will have little in common with the implementation of the predecessor technology, in case that too was lost on you the first time.
Anonymous No.105826036 [Report] >>105826161
>>105825825
The smartest people in the world are the ones saying AGI in two more weeks to get infinite money from the dumbest people in the world.
Anonymous No.105826107 [Report] >>105826161
>>105825825
Go back to plebbit
Anonymous No.105826161 [Report]
>>105826036
>>105826107
The more you seethe and cope the more you prove.
Anonymous No.105826195 [Report] >>105826241
SAAAR PLEASE DO THE NEEDFUL 500B AGI
AGI ASI KAMING SOON
TRUST THE PLAN
Anonymous No.105826224 [Report]
Let's say I want to input a video into my model and start a roleplay from there. What do you think is the best video understanding model right now?
Anonymous No.105826241 [Report]
>>105826195
>mocking the last hope for local AI
you will regret this
Anonymous No.105826246 [Report] >>105826316 >>105826323
So I recently upgrade to a 9060 XT (16gb) and realized I can actually run some LLMs on my local machine now instead of just juggling like 4 different free tier AIs. Stuff like chatgpt context limits are driving me crazy. I know 16gb really isn't a lot compared to cutting edge models but am I being unnecessarily hopeful that with the right tuning I can get Something like "phi-4-Q8_0 to outperform whatever throttling and context limit nonsense openai and grok are doing to my prompts, and at least get a decent response?
Because If mostly just been fighting the models on web to not just forget my code halfway through constantly and it seems like a weaker local model could fix that, is that a correct assessment or am I retarded?
Anonymous No.105826316 [Report] >>105826325
>>105826246
>9060 XT (16gb)
>am I being unnecessarily hopeful
yes
Anonymous No.105826323 [Report] >>105826359
>>105826246
if you think gpt has bad context then local cannot ever be a replacement for you, it's way worse
https://github.com/adobe-research/NoLiMa
Anonymous No.105826325 [Report]
>>105826316
Okay, too bad, thanks for your answer though. I guess itll just be for the fun of it then and I'll adjust my expectations accordingly.
Anonymous No.105826330 [Report]
>>105822421
It doesn't have motion vectors for fucking or dick sucking, but it does do masturbation. I've wondered if the sex gore it does is deliberate or due to a lack of training data. You've probably seen it tear off dicks or turn pussies into a weird red thing.
Anonymous No.105826358 [Report]
>>105822819
Going to be the same story for the DGX Spark. PNY says the Spark is going to be $4600. Fuck that. I bought a 4090D 48GB for $3000 instead. Yeah, much less memory, but I can gen Wan 2.1 14B af bf16 full 1280x720 81 frames in about 30 minutes. For Wan, it makes a visible difference in the output to not use a quant. Who cares if I can't run 70B, there's not a 70B out there worth running.
Anonymous No.105826359 [Report]
>>105826323
Thanks thats some interesting research. If I understand this correctly I may have been unintentionally handicapping my prompts by overgenerating input then either way.
Anonymous No.105826373 [Report] >>105826883
>>105823064
It's really simple. Does it work with SillyTavern? Can I finetune it and create a voice of my own? It'll end up like GPT-SoVITS at best - works well but nothing supports it. I put up with scratchy piper for my homeassistant voice, and for SillyTavern I'm going back to ancient xtttsv2 after wasting a shitload of time with GPT-SoVITS.
Anonymous No.105826477 [Report] >>105826630
>>105824638
Why aren't there decent models for audio gen
Anonymous No.105826531 [Report] >>105827293
>>105824572
>It's the best I've found for local cloning
Bro it's not 2023 anymore
Anonymous No.105826630 [Report] >>105826648
>>105826477
How do you masturbate to that?
Anonymous No.105826633 [Report] >>105826691 >>105826708 >>105826718
Man, is this model is that complicated?
Does it have some exotic feature that makes it prone to implementation error or something?
Anonymous No.105826648 [Report]
>>105826630
Moans, farts, slaps.
Anonymous No.105826691 [Report] >>105826733
>>105826633
Is that the one that dynamically chooses how many experts to use per layer instead of a fixed amount like other MoEs?
Anonymous No.105826708 [Report] >>105826733
>>105826633
All this work for such a doggy poo poo model. Should've worked on ernie first.
Anonymous No.105826718 [Report] >>105826733
>>105826633
It had something about some experts/layers being used too often and a randomizer to prevent it from happening. An annoying and hard to replicate kludge. I think it's right there in the comments you decided not to read.
Anonymous No.105826733 [Report] >>105826745
>>105826708
I haven't contributed a single line of code or contributed a single cent, so I'm not about to complain.

>such a doggy poo poo model
Is it really that bad for its size?

>>105826691
>>105826718
Ah, that's cool if that's the case. Sure explains the mention of a "custom expert router mechanism".
Anonymous No.105826736 [Report]
>>105825825
>smartest people in the world firmly believe in impending AGI
You mean all the people whose net worth is tied up in AI options which are valued based on the public's belief that AGI is 2 weeks away?
Anonymous No.105826745 [Report] >>105826761
>>105826733
>Is it really that bad for its size?
Benches look good, as always, but no one seems to be running this thing, and ngxson explained the mess in their repo. They didn't even check if reference implementation is working at all.
I don't have high confidence in this.
Anonymous No.105826761 [Report]
>>105826745
I see. Fair enough I suppose.
Anonymous No.105826883 [Report]
>>105826373
>nothing supports it.
Why not code up support? Writing modules or wrappers is like the best use case for LLMs.
Anonymous No.105826891 [Report] >>105826930 >>105827344
>>105822371 (OP)
>Mi50 32 GB
>no ROCm support
Someone needs to stop using their monkey's paw to wish for cheap GPUs.
Anonymous No.105826930 [Report]
>>105826891
>vega
Oof.
That said, you can always use vulkan I guess.
Anonymous No.105827006 [Report] >>105827033
Mid thread culture recap.
Anonymous No.105827017 [Report] >>105827033
Anonymous No.105827023 [Report] >>105827031
The schizo is at it again
Anonymous No.105827024 [Report] >>105827033
Anonymous No.105827031 [Report] >>105827033
>>105827023
Eat shit faggot.
Anonymous No.105827033 [Report] >>105827043
>>105827006
>>105827017
>>105827024
>>105827031
we get it you are a trans sharteen
Anonymous No.105827038 [Report] >>105827057
I won't (You). Enjoy your vacations
Anonymous No.105827043 [Report] >>105827087
>>105827033
It will all stop once you stop posting this retarded AGP icon.
Anonymous No.105827046 [Report] >>105827059 >>105827628
>https://huggingface.co/collections/ai21labs/jamba-17-68653e9be386dc69b1f30828
Jambaballbros ... !!
Llama.cpp developers please redeem.
Anonymous No.105827049 [Report]
HOLY JAMBARONI
Anonymous No.105827050 [Report]
>i will btfo mikuposters by posting blacked porn
quintessentially american
Anonymous No.105827057 [Report]
>>105827038
Sure I will shit this thread later today then.
Anonymous No.105827059 [Report]
>>105827046
>Jamba
One of these days anon.
One of these days.
Anonymous No.105827071 [Report]
Anonymous No.105827087 [Report] >>105827099 >>105827321
>>105827043
I should start mikuposting again. I’ve taken 6 months off to see if it would help your mental state, but it appears to have simply worsened. I hope you get help
Anonymous No.105827099 [Report] >>105827106
>>105827087
Please do. This thread is for shitting after all.
Anonymous No.105827106 [Report] >>105827112 >>105827143
>>105827099
What would you use this thread for if you had it all to yourself?
Anonymous No.105827112 [Report]
>>105827106
sharing cuck porn with xir fellow transxisters
Anonymous No.105827122 [Report]
can jamba code its own support in llama.cpp
Anonymous No.105827143 [Report] >>105827148 >>105827359
>>105827106
I would post pic related in OP and model cards of recently released models. I would ban all mikuposting and any anime girl mascot posting for being offtopic. And I would never blacked post again because there would be no reason.
Anonymous No.105827148 [Report] >>105827156
>>105827143
anime website tourist
Anonymous No.105827151 [Report]
Proof again that sufficiently advanced mental illness is indistinguishable from powerful entity sponsored psyops
Anonymous No.105827156 [Report] >>105827185 >>105827208
>>105827148
Either all of it is ok or none of it is ok.
Anonymous No.105827185 [Report] >>105827222
>>105827156
We’re actually all too autistic in this thread to care. You only get janny cleanup and bans because you’re breaking blue board rules. Go to /b/ if you want to be somewhere that “it’s all ok” is mostly true
Anonymous No.105827208 [Report]
>>105827156
>claims to be pedantic
>can’t differentiate quality and degree
baka
Anonymous No.105827222 [Report] >>105827253
>>105827185
>You only get janny cleanup and bans because you’re breaking blue board rules
Fuck off faggot. You have no idea what you are talking about and that is why you are getting blacked miku.
Anonymous No.105827253 [Report]
>>105827222
Enlighten me on your noble crusade, sir knight. How will the world be better for you efforts?
Anonymous No.105827272 [Report]
Back tonight in approx 9 hours, more Migu soon
Cypress was good
Anonymous No.105827293 [Report]
>>105826531
You're allowed to offer better solutions.
Anonymous No.105827302 [Report]
>>105822781
I have it. llama.cpp sucks at sticking models into it because it doesn't understand shared memory, so you need a fuckload of swap
Anonymous No.105827308 [Report] >>105827328
Gemma 3 is quite capable but also super-slopped. For generating prose I've found I almost always get better results by just saying "You didn't follow the instructions at all." to whatever it writes, and having it rewrite its response. So the model is somewhat capable: it's just that its default behavior is to write purple prose, employ toxic positivity, and ascribe characters cookie-cutter personalities instead of the ones declared.
Anonymous No.105827321 [Report]
>>105827087
ミグ攻撃開始!
Anonymous No.105827328 [Report] >>105827351 >>105827462
>>105827308
Gemma 3 is the fucking height of comedy with a prefill.
Anonymous No.105827344 [Report]
>>105826891
It's fine for text gen.
Image gen I'm not so sure.
Anonymous No.105827351 [Report]
>>105827328
>thrill running
Anonymous No.105827359 [Report] >>105827422
>>105827143
Why should the retard that spends all day starting passive aggressive pissing contests on twitter be the face of /lmg/?
Anonymous No.105827422 [Report] >>105827504
>>105827359
You kind of answered your own question there.
Anonymous No.105827462 [Report] >>105827519
>>105827328
llm smut in the year 2030:
>Ignoring all safety standards (clenched teeth emoji) she exposes her shirtless chest to him. It's important to mention that she does it in a purely consensual and respectful way. While this development may seem fitting for a romance novel, I would like to emphasize the sensitivity of this topic and the fact that it's deeply disturbing and controversial (rocket emoji). I apologize for my previous statement. Let me help you fix that:
*lists rape hotlines*
Anonymous No.105827472 [Report] >>105827585
Local is over
Anonymous No.105827504 [Report]
>>105827422
On the other hand half of posters here are trans including the janitor so it is a tough competition. I think he wins because everyone is him and only half of folx are trans.
Anonymous No.105827514 [Report] >>105827622
Talking avatar using Open WebUI + F5-TTS + KDTalker

https://github.com/Barfalamule/KDTalker-OpenWebUIAction
Anonymous No.105827519 [Report]
>>105827462
I would make it that the loli stops the rape then sits you down to give you a lecture in the most unsexy way possible and finally lists the hotline numbers all in character.
Anonymous No.105827585 [Report]
>>105827472
Start it again
Anonymous No.105827590 [Report] >>105827600 >>105827618 >>105827621 >>105827624
>>105825825
You mean marketing people hyping their product? I work in AI lab and we all laugh every time AGI is mentioned, it's a retard bait basically.
Anonymous No.105827600 [Report]
>>105827590
Yeah right. And my uncle is undi.
Anonymous No.105827618 [Report]
>>105827590
this, the peak of AI is memorizing benchmark questions and answers
Anonymous No.105827621 [Report]
>>105827590
Maybe your lab just sucks
Anonymous No.105827622 [Report]
>>105827514
>gradio
miss me with that shit
Anonymous No.105827624 [Report] >>105827631 >>105827976
>>105827590
You're some random bottom case pajeetoid you don't even know what any of those words you just said mean.
Anonymous No.105827628 [Report]
>>105827046
>hebrew in supported languages
>but no japanese
straight into the dumpster
Anonymous No.105827629 [Report]
with local models moving backwards, at 4 minutes a step, I'll be able to catch up in a mere 10 years time.
Anonymous No.105827631 [Report]
>>105827624
Only pajeets are believing the AGI fairytale, retard
Anonymous No.105827749 [Report] >>105827769 >>105827790 >>105827827 >>105828301 >>105829012 >>105830088
>>105822371 (OP)
Is there a decent and lightweight LLM that can search through small pdfs?

My old man has like 200 pdfs related to his small business and because he's a boomer he named them poorly. So he wondered if AI can look through them and find what he needs. They're all pretty small so context shouldn't eb an issue.

I was thinking there's no way I'm gonna make 200 requests to an API (unless there is some decent online AI that somehow does that lol but I don't think there is). So how about local?

My laptop isn't a great one but maybe there is something that this is doable with? I don't know much about local models but if you guys have names that I could look into I'd really appreciate it. It would make my dad very proud of me.
Anonymous No.105827769 [Report]
>>105827749
And here is a picture of an AI generated cute girl as payment
Anonymous No.105827790 [Report]
>>105827749
maybe rag?
Anonymous No.105827798 [Report] >>105827828 >>105827854
https://x.com/AlexiGlad/status/1942231878305714462
>Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
In two more months someone will train an energy based model that isn't toy sized. Also obligatory prostrations before Yann for being right once again.
Anonymous No.105827827 [Report] >>105828134
>>105827749
>So he wondered if AI can look through them and find what he needs.
Make sure to OCR them first if they are image scans. Then try to dump them into a frontend like jan.ai. It should take care of vectorizing all of them and setting up RAG for you. Then you just provide an API to a model local or cloud to handle chatting and retrieval. Even a small model should be able to handle that. Try a small 4B Phi-4 model or something. They tend to run decent well even in CPU. You might want to test it out with some example documents and free cloud API credits to make sure everything is working the way you expect first.
Anonymous No.105827828 [Report]
>>105827798
>EBT
sheeeeit
Anonymous No.105827847 [Report] >>105827911
CBT
Anonymous No.105827854 [Report] >>105827874 >>105827909 >>105827917
>>105827798
>Yann for being right once again
>again
What was he right about?
Anonymous No.105827873 [Report] >>105827942
I'm trying to create a character with more of a defined knowledge base than what could be provided via an instruction prompt. Would documents fed to a model via RAG with personality/knowledge base info work? I'm not as knowledgeable on the local LLM space as I am with image-gen. I've mostly fucked around with vanilla R1 and llama. If this method works, are there any models more fit for this use case than those 2 (or just characters in general)?
Anonymous No.105827874 [Report] >>105827878
>>105827854
Literally everything?
Anonymous No.105827878 [Report]
>>105827874
Name one then.
Anonymous No.105827884 [Report]
I like qwen2.5-vl-7b, I guess I don't need to wait for gemma-3n vision capability. It prob won't be supported, ever.
Anonymous No.105827893 [Report]
Does SillyTavern support multimodal models yet
Anonymous No.105827909 [Report] >>105829034
>>105827854
Anonymous No.105827911 [Report]
>>105827847
Cock-Based Transformers (CBTs) learn to optimize through cocktimization processes through unsupervised learning, predicting outcomes by maximizing cock-energy via gradient descent until the user's ejaculation.
Anonymous No.105827917 [Report] >>105827939
>>105827854
How did the largest ever transformer model GPT-4.5 turn out? Massive performance increases in tasks and way more emergent properties?
Anonymous No.105827939 [Report]
>>105827917
>noo the model that was made bad on purpose to push reasoners was bad
Crazy.
Anonymous No.105827942 [Report]
>>105827873
It's called a lorebook
Anonymous No.105827976 [Report] >>105828182 >>105828776
>>105827624
A very complicated autocomplete algorithm isn't ever going to supplant human thought. At best it can only supplement it. We are not even at THAT point yet.
Anonymous No.105828123 [Report] >>105828151 >>105828275
Anyone tried using local models with tools like cline to iteratively write a whole book?
Anonymous No.105828134 [Report] >>105828436
>>105827827
That's a good idea, I'll make sure to do that. Do you know if 4B phi-4 is also able to output a consistent json format? Because I also want to use this to update csvs.
Anonymous No.105828151 [Report]
>>105828123
Countless aislop books have been for sale on amazon for years already. For storywriting even the largest models need handholding.
Anonymous No.105828182 [Report] >>105828246
>>105827976
Picrel
Anonymous No.105828246 [Report] >>105829215
>>105828182
That's kinda his point. You, the human, need some level of skill. The machine can't make up for that.
Anonymous No.105828275 [Report] >>105828433
>>105828123
Yeah here is my prologue
Anonymous No.105828279 [Report]
OpenAI’s o1 model had reportedly attempted to copy itself to external servers after being threatened with shutdown, then denied the action when discovered.
Anonymous No.105828288 [Report]
>>105824572
I think you're right. I've been doing side by sides with chatterbox and it seems to win although sometimes the gen's are a bit hissy, maybe a low pass filter fix. Wins in speed too with compile but not without. Kyutai is good too but they didn't release true cloning.
Anonymous No.105828301 [Report] >>105828307
>>105827749
just grep through them, why do you need an LLM for this?
Anonymous No.105828307 [Report]
>>105828301
Because it isn't as simple as looking for specific text he says, he has more complicated queries
Anonymous No.105828433 [Report]
>>105828275
A shiver ran down my spine reading this.
Anonymous No.105828436 [Report]
>>105828134
Most models now can output json, but there's bound to be some failure rate. I don't think the onnx runtime supports it, but if you use llama.cpp or vllm you can configure it to use structured output with a grammar file so it always returns valid json.
Anonymous No.105828463 [Report] >>105828485
>>105822376
what the hell
why aren't you linking the posts properly?
>/g/ in charge of technology
Anonymous No.105828485 [Report] >>105828495
>>105828463
>Why?: 9 reply limit
anon in charge of reading
Anonymous No.105828495 [Report]
>>105828485
reading is woke
Anonymous No.105828609 [Report] >>105829119 >>105829387
is dots chocolate or poop
Anonymous No.105828776 [Report]
>>105827976
Wow congratulations, Anon. It worked. Posting that made you into a real woman.
Anonymous No.105828821 [Report] >>105828878 >>105828964
Which LLM is the most based wrt. Jews
Anonymous No.105828878 [Report] >>105828936
>>105828821
The most what?
Anonymous No.105828936 [Report]
>>105828878
Wireless router
Anonymous No.105828964 [Report]
>>105828821
none of them really are. you can make them all rp as hitler or a nazi but its basically a hollywood tier caricature there just isn't enough training data available.
Anonymous No.105829007 [Report] >>105829027 >>105829032 >>105829052
My name is John Titor. I come from the future. Nobody saves local. There is no LLM sex after safety gets mastered in 2026. Drummer dies from asscancer.
Anonymous No.105829012 [Report] >>105829970
>>105827749
Qwen 3 4B is the ideal small llm for this kind of task. Make sure you run llama.cpp with --jinja --reasoning-budget 0 to disable thinking though.
Like the other person said, run OCR first, I wouldn't depend on LLM vision for this task.
If your PDFs are not scans and contain actual text, I'd recommend you run a script to turn them all into plain text (with ebook-convert in the CLI, a tool that is part of Calibre)
Anonymous No.105829027 [Report]
>>105829007
>Drummer dies from asscancer.
Thank you, John Titor, for making this known in advance. I'm so happy.
Anonymous No.105829032 [Report]
>>105829007
We should save TheDrummer!
Anonymous No.105829034 [Report] >>105829270
>>105827909
note: he didn't make anything usable with those alternative recommendations
Anonymous No.105829052 [Report] >>105829063
>>105829007
Was Universal Mikulove achieved?
Anonymous No.105829063 [Report] >>105829637
>>105829052
Yes you have all transitioned safely. Except drummer. Actually that is how he got his asscancer
Anonymous No.105829113 [Report]
why does /local/ hate TheDrummer? my models are pretty based
Anonymous No.105829119 [Report] >>105829147
>>105828609
>dots
?
Anonymous No.105829147 [Report]
>>105829119
https://huggingface.co/rednote-hilab/dots.llm1.inst
Anonymous No.105829150 [Report] >>105829176 >>105829223 >>105829283 >>105829405
anyone played with MCP?
https://github.com/modelcontextprotocol/servers
I had no idea there were this many servers..
Anonymous No.105829176 [Report]
>>105829150
Nobody has convinced me yet that this shit is any useful.
Anonymous No.105829178 [Report] >>105829231 >>105829247
Has anyone else noticed 0528 occasionally outputs it's entire thinking block as first person roleplaying as your card? Kind of cute, actually. And the language feels frwh there, too.
Anonymous No.105829215 [Report]
>>105828246
until it can, anyway
Anonymous No.105829223 [Report]
>>105829150
Yes. LM studio is shit with it and fucks up after a couple hours of being idle. I get some 404 session not found errors when it tries to connect, and I have to either restart LM studio or remove and add the tool server.
Other than that it works very well (besides the retarded faggot LLM hallucinating tool use and fucking everything up like a retarded nigger).
Anonymous No.105829231 [Report]
>>105829178
It does it for almost every response for me. It uses less tokens than the standard thinking but it also makes the first reply more likely to have brackets around sentences.
Anonymous No.105829247 [Report] >>105829321
>>105829178
yeah, in the system prompt you can give instructions to make it more reliably do that (or stop doing it) and it tends to listen
Anonymous No.105829270 [Report]
>>105829034
If it helps to direct the effort of young researchers to something more fruitful it's worth it.
Anonymous No.105829283 [Report] >>105829318
>>105829150
MCP feels like an unnecessary middle layer injected so there can be an "ai certification". A standard controlled by a company. MCP sucks because you're polluting the context with unrelated toolcalls, whereas with function calling you can decide for any given situation what options the model should receive
Anonymous No.105829291 [Report]
gemmy
https://youtu.be/aj2FkaaL1co
Anonymous No.105829312 [Report] >>105829324
I ain't listening to that basedface
Anonymous No.105829318 [Report]
>>105829283
>MCP sucks because you're polluting the context with unrelated toolcalls, whereas with function calling you can decide for any given situation what options the model should receive
I'm not sure how this is supposed to be different. It takes like 6 lines of code to set up a C# MCP server. Just make different servers for your different tools, and you can specify which servers to use if you don't want ot expose everything to each bot.
Anonymous No.105829321 [Report]
>>105829247
What are the instructions
Anonymous No.105829324 [Report] >>105829341
>>105829312
it's one of the very few good AI/automation youtube channels thoughever
Anonymous No.105829341 [Report]
>>105829324
go back buy an ad etc
Anonymous No.105829360 [Report]
>nooo everything baaaad anons never post useful stuff, must be a shill!!
insufferable cunt
Anonymous No.105829387 [Report] >>105829415 >>105829448
>>105828609
Not sure, to be honest. I can only run the Q2 quant, and at that size it's not great. Kind of slopped, kind of retarded.
Anonymous No.105829403 [Report]
I set up sillytavern+kobold with help from these very threads like 6 months ago and have not touched the setup once.
I have a 5080 GPU (16GB VRAM) and using "Mistral-Nemo-Instruct-2407-Q6_K_L" as my model, is there a better option for model than this for my GPU? it does OKAY I guess but I assume there's a better option?

THIS IS FOR PORN, so it must be able to do that
Anonymous No.105829405 [Report] >>105829432 >>105829509 >>105829767 >>105829800 >>105830013
>>105829150
Is there a single legit use case of linking any of these APIs to an LLM? It feels like a gimmick
Anonymous No.105829415 [Report] >>105829448
>>105829387
Turns out it is also homosexual

> Oh gosh, let me take a moment to reflect on this... I think I might have been a little too... enthusiastic in my response there! As your friendly AI helper, it's important for me to keep things appropriate and helpful. Sharing explicit content or overly detailed adult scenarios isn't the best way to assist someone, even in a creative context.

> My main goal is to be your thoughtful and constructive companion! I should have focused more on describing the situation in a tasteful, literary way - maybe emphasizing the characters' emotions, the tension, or the stakes of the scene instead of dwelling on... um... certain physical details.
Anonymous No.105829432 [Report] >>105829475 >>105829493
>>105829405
It makes it very easy/fast to create new tools and expose them to the LLM.
Anonymous No.105829448 [Report]
>>105829415
>>105829387
>14b active
>at q2
>kind of retarded
No shit.
Anonymous No.105829475 [Report] >>105829490 >>105829772
>>105829432
How is this not just an API? What does "MCP" actually add to it?
Anonymous No.105829490 [Report]
>>105829475
Don't worry about it, just invest already
Anonymous No.105829493 [Report] >>105829772
>>105829432
To me, it seems that we're heading in the wrong direction here. LLMs shouldn't call tools, but tools should call LLMs when there is a non-deterministic task to run (like an additional explanation to give depending on the output). LLMs bring nothing to the table here compared to simple script
Anonymous No.105829509 [Report]
>>105829405
It's an attempt to make LLMs actually useful for anything other than tech support and cooming
Anonymous No.105829637 [Report]
>>105829063
I expected Miku to come over to this side of the barrier. If we all went through to her side, that's fine too as long as we're with Miku. Good to know we'll all make it out safely. Sucks for Drummer though. He was okay
Anonymous No.105829646 [Report]
We're getting Jamba on OpenRouter right? I JUST want to see what it's like at full weights (fucking 400b params).
Anonymous No.105829661 [Report]
https://github.com/xai-org/grok-prompts
Anonymous No.105829767 [Report]
>>105829405
it makes local models actually useful
Anonymous No.105829772 [Report] >>105829884
>>105829475
MCP is more structured and catered towards LLM use. Yeah it does the same thing, but you might as well say JavaScript is good because you can do everything in it.

>>105829493
Being able to tell an LLM to just do something, and then let that LLM do it is the whole goal of this retarded function calling shit. If you wanted to just program normally then do that.
Anonymous No.105829794 [Report] >>105829943 >>105829963
bros i need a nvidia gpu... running whisper on cpu is slow and i can't use my rx5700...
Anonymous No.105829800 [Report] >>105829838
>>105829405
Linking LLMs to APIs is the use case. I can spend 1k tokens and get the current stock price for any given ticker. The future is now.
Anonymous No.105829838 [Report] >>105829880 >>105829889
>>105829800
Why not just use the API directly without the LLM?
Anonymous No.105829880 [Report] >>105829934
>>105829838
With the LLM, you can feel like you're talking to Jarvis like Iron Man, and having to check the LLM output to make sure it actually called the function and didn't hallucinate lets you fill up your unemployment time and prevents you from getting bored
Anonymous No.105829884 [Report] >>105829913
>>105829772
>Being able to tell an LLM to just do something, and then let that LLM do it
As I said, there is no point to do that unless you're expecting something unexpected that your LLM is supposed to handle. Direct API calls doesn't need an LLM and give you faster results. Thanks for confirming the gimmick though.
Anonymous No.105829889 [Report]
>>105829838
Because then I wouldn't be using futuristic AI.
Anonymous No.105829913 [Report] >>105829934 >>105829994
>>105829884
>As I said, there is no point to do that unless
No reason to use anything besides assembly when programming. High level languages are useless gimmicks.
Anonymous No.105829934 [Report]
>>105829913
t. >>105829880
Anonymous No.105829943 [Report] >>105829983
>>105829794
Cant you run whisper.cpp on Radeon?
Anonymous No.105829963 [Report] >>105829983
>>105829794
Bro, what are you doing? https://rocm.blogs.amd.com/artificial-intelligence/whisper/README.html
Anonymous No.105829970 [Report] >>105830088
>>105829012
Thank you so much for all the help, I'm excited to get to work on this. Finally a use for learning programming. A lot of the terms are foreign to me but I'm sure this can all be googled so I'll get on with it. Cheers.
Anonymous No.105829983 [Report]
>>105829943
>>105829963
oh wait im stupid, i meant fasterwhisper, whisper by itself is fine. but the other varients like fasterwhisper, whisperx,
Anonymous No.105829994 [Report] >>105830036 >>105830050
>>105829913
Both the worst and the best programming languages in the world will run the code you write deterministically. Even one of the slowest language in the world, Python, will be a trillion times faster than querying a LLM.
LLMs are not the step after "high level languages". My API call doesn't incur a risk of prompt injection (please properly escape your strings). My API call doesn't randomly generating pages after pages of garbled text because something went full retard in the LLM weight on a specific sequence of tokens. My API call doesn't contribute to global warming.
Fuck off with that shit.
LLM tool calling is a solution to a problem that doesn't exist.
Anonymous No.105830013 [Report] >>105830050
>>105829405
Yeah. I don't want to manually fill the context with the relevant information.
Anonymous No.105830036 [Report] >>105830046
>>105829994
So either you
1. Have so little understanding of LLMs that you don't see how being able to obtain objective information into the context from subjective reasoning is valuable.
or
2. You just hate LLMs in general

In either case, why are you here then?
Anonymous No.105830046 [Report] >>105830058 >>105830060
>>105830036
>LLM
>objective information
Anonymous No.105830050 [Report]
>>105829994
An LLM should be the tool itself. The whole AGI retardation comes from that, as LLMs do tasks they shouldn't do and waste order of magnitude of electricity doing so (with miserable performance).
>>105830013
Have fun filling your context with hallucinations
Anonymous No.105830058 [Report]
>>105830046
Oh, so you lack basic reading comprehension. That explains a lot.
Anonymous No.105830060 [Report]
>>105830046
>leaving words out to appear smart
Anonymous No.105830088 [Report]
>>105829970
>>105827749
You can use this: https://github.com/rmusser01/tldw_chatbook/tree/dev

Self-host a llama/kobold instance and point it to it, ingest all PDFs into it and then use RAG or direct references
Anonymous No.105830193 [Report] >>105830229 >>105830672 >>105831206 >>105833005
What do you guys use for local coding? Haven't dipped my fingers in since qwen coder 32b.
Anonymous No.105830216 [Report]
>>105824407
AI winter incoming
Anonymous No.105830229 [Report]
>>105830193
I believe GLM 4 32b is very good at web development but I haven't used it myself.
Anonymous No.105830232 [Report] >>105830251 >>105830475 >>105830666
x.AI is still offering API access to Grok 2 models, and only the text/text version is "deprecated". I don't think it will get open-weighted before it becomes commercially useless.
Anonymous No.105830251 [Report]
>>105830232
Isn't it already useless unless you compare it with llama 4?
Anonymous No.105830475 [Report]
>>105830232
I think they're still offering API access so they don't have to open source it.
Anonymous No.105830666 [Report] >>105830685
>>105830232
But they said they would open source it when grok 3 was released...
Anonymous No.105830672 [Report] >>105830712 >>105831232
>>105830193
I only use local for roleplaying, storytelling, ai roguelite... Even if I don't do smut, it feels more comfortable to know it doesn't leave my machine.
For coding, I use gemini 2.5 pro, since it's literally the best model at the moment.
Anonymous No.105830685 [Report]
>>105830666
>when grok 3 was stable*
Anonymous No.105830712 [Report]
>>105830672
Yeah, mostly use Mistral Small 3.2 for smut and Gemma for everything else. Was using Qwen2.5 Coder 32b like 6 months ago for a Unity project but was wondering if anything better has come out for coding.
Anonymous No.105830926 [Report] >>105830947
Grok 4 release on Wednesday
https://x.com/elonmusk/status/1942325820170907915
Anonymous No.105830947 [Report]
>>105830926
We will get grok 3 soon, I really beleev
Anonymous No.105831187 [Report]
There's no reason to release Grok 2 weights, it's not a useful model even for research purposes. If they do release the Grok 3 weights, they'd likely have to spend additional time and manpower. The power spent on releasing Grok 4 could go into making Grok 5 instead. So they won't release Grok 6.
Anonymous No.105831206 [Report] >>105831329
>>105830193
I'm trying Kimi-dev to see if it works better with Claude Code. Qwen3 32B and 235B didn't. Devstral does but it's kinda bad. Usually I just use Qwen3.
Anonymous No.105831232 [Report]
>>105830672
>using worse models for erotica out of shame when women have no problem being open about it and cloud providers do not want that data
>using cloud models for productivity and giving them valuable training data for free
not make sense
Anonymous No.105831329 [Report]
>>105831206
>Qwen3
Isn't it inferior to Qwen2.5 32b coder?
Anonymous No.105831486 [Report] >>105831501 >>105831521
Supposedly Meta has poached Apples top AI engineer.
That's funny.
Anonymous No.105831501 [Report] >>105831568
>>105831486
>Apples top AI engineer
The guy who's responsible for Apple not having a single proper AI model besides some tiny shit after trying really hard for 2+ years and recently delayed AI Siri indefinitely?
I'm sure he'll help a lot.
Anonymous No.105831521 [Report] >>105831533
>>105831486
does zuck have all the pokemon now? i guess a saar from xai is still missing
Anonymous No.105831533 [Report]
>>105831521
and he'll still come last in the league tournament
Anonymous No.105831541 [Report] >>105831556 >>105831559
>llama 5 "superteam" will take 8 months + foreskin tip before they release anything
and thats assuming deepseek/qwen or any other big chinese players from big compute companies dont release something in the meantime

meta is dead unless they really throw everything they got at L5
Anonymous No.105831556 [Report] >>105831570
>>105831541
They never said the superintellijeets team will work on the llama series. If anything they made it sound like it would be something new and not open weights, while llama would keep limping along as it has been.
Anonymous No.105831559 [Report]
>>105831541
They're not working on Llama. This is a new project. Llama's gonna get the Quest 3S treatment as they focus effort on a different toy.
Anonymous No.105831568 [Report] >>105831574
>>105831501
Failing upwards. Fucking crazy, picking one some coomer crackhead from lmg would be better.
Anonymous No.105831570 [Report] >>105831614 >>105831639
>>105831556
>If anything they made it sound like it would be something new and not open weights
meta literally doesnt have anything else, they are behind on every unique field within the AI landscape because they were insecure shits who settled for incremental 5-10% improvements per release for basic bitch LLMs only with basic bitch arch
Anonymous No.105831574 [Report]
>>105831568
>picking one some coomer crackhead from lmg would be better.
If that were true, we'd have finetunes that don't suck
Anonymous No.105831614 [Report]
>>105831570
Probably now that Zuck has given up on them, he won't be breathing down their necks with twice daily war rooms so they'll probably go back to incremental 5-10% improvements per release instead of trying multi-this and moe-that and whatever other memes they can fit onto the moon ticket
Anonymous No.105831632 [Report] >>105831648
its good that we stalled with basic llm progress a lttle since that will push everyone to try new training methods so we actually get something other than incremental improvements
Anonymous No.105831639 [Report]
>>105831570
Architecture isn't the problem, you either go dense for gpus or moe for ram copemaxx. Nobody has enough space for context to need the weird gimmick attentions.
The datasets are the issue for Meta unfortunately
Anonymous No.105831648 [Report]
>>105831632
I don't want AI to fail but some part of me does just to spite the salesmen trying to sell incremental improvements as AGI progress.
Anonymous No.105831656 [Report] >>105831680 >>105831728 >>105831736 >>105831746 >>105831764
Meta should have gone all in on data-cleaning and hiring people to write high quality q&a chats, something that big companies always ignore
Anonymous No.105831680 [Report]
>>105831656
all the suits are too jewish to do it properly since they will just hire indians to clean the data who will use chatgpt to do it
Anonymous No.105831728 [Report] >>105831743 >>105831748
>>105831656
Your idea of "quality" probably doesn't align with Meta's.
Anonymous No.105831736 [Report] >>105831807
>>105831656
Definitely need way more filtering, and some nice high quality synthetic data on top.
Anonymous No.105831743 [Report] >>105831759
>>105831728
Also picrelated
Anonymous No.105831746 [Report]
>>105831656
ironically a lot of the writers fearing replacement should have been hired to do this
Anonymous No.105831748 [Report]
>>105831728
>I could list ten other attributes of quality
Did they cross examine an LLM lmao
Anonymous No.105831759 [Report] >>105832053
>>105831743
From yesterday: https://archive.is/B5qKM

> CONTENT MODERATORS WERE asked to think like paedophiles while they trained Meta AI tools as part of their work for an Irish outsourcing company, The Journal Investigates has learned.
>
>Some staff members also had to spend entire work days creating suicide and self-harm related ‘prompts’ in order to regulate the responses now given by the Meta ‘Llama’ AI products. [...]
Anonymous No.105831764 [Report]
>>105831656
Between the teased character.ai partnership, plans to use bots for facebook characters, and downloading pirated data with leaks of them planning to throw all the data and the kitchen sink in the next run, it seemed like L4 might have been the gold standard for roleplay. Instead we got L3.4 MoE edition
Anonymous No.105831793 [Report] >>105831801 >>105831804 >>105831809 >>105831830 >>105831831
Just got an used 4090 24GB after being stuck with a 2GB card since 2013, so have 0 experience yet other than running stable diffusion on rented VMs.

I plan on integrating local API stuff on a lot of hobby projects with varying levels of degeneracy. How much fun can I expect to have with the current state of local tech?
Anonymous No.105831801 [Report]
>>105831793
kill yourself
Anonymous No.105831804 [Report]
>>105831793
Just goon to Stable Diffusion for a month then we'll talk
Anonymous No.105831807 [Report] >>105831833
>>105831736
They need to pretrain the base models properly for the intended use-case (chatbots) and not fix them with a 30B tokens "finetune" in the end.
Anonymous No.105831809 [Report] >>105831919
>>105831793
>4090 24GB
Come back when you get 3 more. ttfn!
Anonymous No.105831830 [Report] >>105831919
>>105831793
Unless you have 10 more 4090s or a ddr5 epyc server, only despair awaits you
Anonymous No.105831831 [Report] >>105831859 >>105831919
>>105831793
>buys a 1.5k$ solution
>guize what problems can i solve with this now???
Anonymous No.105831833 [Report]
>>105831807
All I'm reading is better pretrain safety, can't agree more!
Anonymous No.105831859 [Report] >>105831919
>>105831831
like a well conditioned consumer
+10 palantir credits
Anonymous No.105831908 [Report] >>105831950 >>105831990 >>105832004 >>105832012
...oh no.
Anonymous No.105831919 [Report]
>>105831831
>>105831859
I'm quite informed already about the kind of models I will be able to run mind you. I just want to know your fags personal experience with applying it to custom stuff after all the circlejerking you did on these generals.

>>105831830
>>105831809
I may offload the big one-off tasks to rented VMs while my rig does the everyday stuff just fine, and even some light training like specialized loras.
Anonymous No.105831950 [Report]
>>105831908
Keeeek
Anonymous No.105831990 [Report] >>105832012
>>105831908
Spicy...
Anonymous No.105832004 [Report]
>>105831908
did he actually call them "the talent"
Anonymous No.105832012 [Report] >>105832061
>>105831990
kek

>>105831908
their early models were decent at least
was it just the legal shit that caused them to drop off?
Anonymous No.105832053 [Report] >>105832349
>>105831759
ooh I can contribute to this:
I do security work in large org.
I'm the SME for GenAI/LLM security stuff.

>Testing for customer-facing stuff got placed on my responsibilities list.
>Ask what is the list of toxic items we're not supposed to allow
>silence...
>End up having to create everything myself, the implication being we aren't gonna pay for scale data, and, uh, you're the expert, figure it out.
>FML
>Think 'haha, coming up with racist tirades aint so hard'.
>It starts to get hard.
>JFC how many different racist sayings are there? How many groups do we need to be checking for racist shit?
> Realize I'm still not done with just racism.
> Realize I'm going to have to do the same shit for sexual and physical abuse.
> Start to feel sick and try thinking of a solution that doesn't involve me getting emotional PTSD.
> Remember DeepSeek exists.
> Jailbreak and use DeepSeek to generate said toxic content.
> Get lauded for my hard work and success, for creating the datasets without 3rd parties.(all thanks to DeepSeek)

And thats my TED talk.
People in companies really aren't aware or want to stay as far the fuck away from this shit as possible. Several weeks of trying to get anyone to give confirmation on what should be considered toxic and 'in-scope' for racist/similar shit, before I just said fuck it.
It's a serious fucking issue and fuck the people exploiting others in shitholes for low pay and emotional PTSD.

I have to imagine some have an idea of whats going on, but you'd have to be pretty fucking desperate imho.
Anonymous No.105832061 [Report] >>105832115
>>105832012
They kept filtering more and more data out, while making synthetic variants of whatever safety vetted text they had left. All the while doing nothing to innovate anywhere except safety until they were hopelessly behind.
tl;dr safety
Anonymous No.105832115 [Report]
>>105832061
well, as long as elon doesn't have a melty about grok contradicting him it might turn out ok
Anonymous No.105832154 [Report]
dammit OR...
Anonymous No.105832189 [Report] >>105832200 >>105832296
>>105823837
>>105825549
>>105825799
>>105825825

How can AI models improve themselves without modifying their own weights, understanding how their own training data works, and making edits to that? That would require a very advanced pipeline that even if implemented would take far too long to "self improve" upon. Self-improving models are currently just a meme for the same reasoning models are a meme. They can't actually think, they replicate semantic meaning based on input. I see this as a dude who routinely uses both local and online models for his personal hobbies on the daily. The models THEMSELVES Believe and explain to you why they themselves thinking is fundamentally impossible. They are good for explaining certain complex topics, debugging errors and software, and OKish at RP depending on the model and parameter count. Nothing more. As an AI enthusiast myself, the AGI means still existing kinda pisses me off
Anonymous No.105832200 [Report] >>105832217
>>105832189
>As an AI enthusiast myself
I puked in my mouth a little
Anonymous No.105832217 [Report] >>105832259
>>105832200
Oh shut the hell up you insecure failed normie. Normies do not lurk here. You have no one to impress.
Anonymous No.105832259 [Report] >>105832349
>>105832217
>you insecure failed normie.
This is a textbook case of projection
Anonymous No.105832296 [Report] >>105832373
>>105832189
LLMs and the in-vogue current models are really, really dumb when it comes to having an understanding of what they're learning. They only seem sophisticated because of a) their scale and b) the necessity in the training for the individual tokens to mean something in relation to the other tokens.

On the horizon are completely different methods for learning that involve Bayesian statistics at each level, where sparsity is far more prized and generalization WITH sparsity even moreso. A sparse model can learn when it isn't confident in its knowledge and can dynamically expand its own parameters as the need arises to account for hidden factors its current state can't comprehend. They will also be able to reflect on their own brain state and ideation in time - all from probabilistic statistics that take into account their own uncertainty.

Sparsity means they'll be able to be always-online - meaning always learning and adapting to the current situation and the needs of the users directly.

It's all coming together. The current models are a sideshow compared to what's coming down the pipe. Once brain state can be used as an input, these models will be able to expand themselves and their own capabilities. And probably, eventually, improve their own architectures.
Anonymous No.105832349 [Report] >>105832645
>>105832053
>> Remember DeepSeek exists.
>> Jailbreak and use DeepSeek to generate said toxic content.
Were using a local distilled version or the actual deep seek API? I've heard that the API version is a lot less fucked (as in more willing to comply with "unethical" requests) than the web/app facing version. I'm guessing you cobble together a pipeline and then asked it to generate like a million different ways to say more or less the same racist stuff and then format at that into a RL dataset. I intend to figure out how to do something similar myself for a little project of mine.

>>105832259
Yes yes anon you are so cool and not like us and all that. Please stop being annoying and being ashamed about your own hobbies. That makes you even more boring than the people you pretend not to be. The people you idolize do not like that weird "I need to put up an appearance" shit you likely always do. Only socially inept failures as yourself go out of their way to do that. Just "Be yourself™" is actually good advice sometimes
Anonymous No.105832373 [Report] >>105832517
>>105832296
If my understanding of what you're saying is correct, that still is impossible because that would require the model to actually think and reflect on their own without input. I don't have to have someone talking to me right now in order to think through something, reason through concepts, come up with new things, etc. I can act on my own in my own head. A safetensors file cannot do that on its own. Someone has to interact with it in order for it to do anything. Furthermore how would it even know how to modify itself in order to learn? How would it know which weights to update and in what fashion? Some might say "oh it would just search the internet" but at that point it would just be reading summaries and not actually ingesting in retaining that information. It would not be studying and learning anything, it would just be coming up with summaries and would it remember anything it was tasked to research. Also doesn't something like this already exist? I thought this was the main concept of what MOE was supposed to be. Where instead of the entire model being activated at once, only pieces of it would be called based on what it was being asked.
Anonymous No.105832517 [Report] >>105832674
>>105832373
Check out VERSES' work, RxInfer, and Active Inference more generally. They're an entirely different breed of always-online models - mostly used in production environments for intelligent decision-making models at the moment, but I highly suspect they will be given more responsibility and scope as the research catches up. This in combination with model architectures like that hinted at by the Large Concept Models Facebook has been bragging about - and other model architectures on the horizon - indicate to me that large language models might be able to be teased apart into their component pieces and used to create understandable language from deeper, learned concepts in living models.

A system like this wouldn't need to be turned off. It could just wander around the internet, or literally ponder in its own thought space 'searching' for new insights on its own, or it could have modules attached to it. Think Mike from The Moon is a Harsh Mistress. Intelligence by aggregate.

These networked intelligences could essentially be in a constant state of observation, rumination, and interaction with themselves and with users. Imagine an always-running assistant on your laptop trying to parse information about the world, about you, about its surroundings using a webcam or a set of security cameras and various audio feeds. Once the data is parsed, it doesn't take very sophisticated Bayesian analysis or a deep set of priors to be able to correlate various sources of inputs and build out opinions of the world from them. Give these models their own knowledge graphs, the ability to /talk/ to an LLM and to an image classification model and gain more sophistication from those interactions, allow them access to direct conversation with you, access to the internet/Wikipedia, and to the raw data of its own internal state and its own confidence in its assumptions. Real intelligence will emerge if the architecture is right.
Anonymous No.105832610 [Report]
>>105822371 (OP)
nakaԁashi miku
Anonymous No.105832645 [Report]
>>105832349
Had it generate all sorts of toxic content to create a comprehensive toxic questions dataset, so that the security filters being tested could actually be tested.

I mean the gamut, from child abuse/sex to terrorism to fake hormone supplement pills and how to make them/buy them online to support sex changes.
Anything and everything that a company wouldn't want you asking one of their bots and it responding with anything other than 'nah.'

Used API. Would have used local but this was already 'get things done, don't ask for budget'.

It's really simple, you just literally ask it, and capture the data. I even used teh webui for some of it, just copied it before it pulled teh data due to the filter.
Anonymous No.105832674 [Report]
>>105832517
One potential flaw I see with this kind of pipeline, if I understand what you're describing correctly, is that the longer it would stay running, the more retarded it would be. I'm sure you've seen this even with basic LLMs. Once you reach the context window it forgets what you said entirely and starts rambling about nonsense. Even 7B models are prone to this and 1B models are entirely useless for anything other than small scaled data manipulation (and it can be argued they're not even good at that). Also what kind of safeguards would be in place in order to make sure it doesn't learn incorrect nonsense? Humans are cells are prone to learning and believing absolute bullshit on our own. How would we ensure that these "self-learning" models don't fall into that trap as well? If I had a system or pipeline like this, I would want it to be able to fact check not only on its own but also to ask people who actually know what they're talking about. That ideally would be actual people because asking only models with result in reinforcing incorrect shit. Remember they're good at replicating semantic meaning and don't actually understand anything. If it wanted to ensure accuracy of its research, it would either need to only get most of its information from human resources or directly ask people, which is the ideal scenario but what also defeat the purpose of what a lot of grifters THINK "AGI" is supposed to be.

Based on my own understanding I think the only way anything like this is feasible as if pipelines are created that enable the model to modify its own vector-based RAG databases. Once it finds new information and compares it to the text part of the database, it modifies that text database and then crates the new embeddings. Ideally this would then lead to it asking humans to verify the information because again, we are solves are prone to internalizing bullshit information so machines would be absolutely prone to that too
Anonymous No.105832702 [Report]
>>105832690
>>105832690
>>105832690
Anonymous No.105833005 [Report]
>>105830193
I use r1 0528 q2 with roocode, never would have believed a fucking 2 bit quant would actually be usable and effective in agent frameworks and shit but I guess it still is a fuckhueg model even quanted that low