Search Results

Found 3 results for "cf8bf57614332b59dc9c40e7570009db" across all boards searching md5.

Anonymous Albania /int/211865965#211886449
6/19/2025, 8:50:11 AM
>>211886226
they should try adopting a sense of responsibility and accountability for their actions instead of being overgrown manchildren and blaming the world for their problems(late stage capitalism made their funkopops more expensive, only communism can save them)
Anonymous /g/105611492#105611494
6/16/2025, 5:44:04 PM
►Recent Highlights from the Previous Thread: >>105601326

--Papers:
>105606869 >105606875
--Evaluation of dots.llm1 model performance and integration challenges in local inference pipelines:
>105601735 >105604736 >105604782 >105604857 >105604810 >105604838 >105605017 >105605319 >105605475 >105605551 >105605556 >105605609 >105605671 >105605701 >105605582 >105605670 >105605965
--llama-cli vs llama-server performance comparison showing speed differences and config inconsistencies:
>105601495 >105601540 >105601746 >105601830 >105601953 >105601967 >105602123 >105602170 >105602190 >105602380 >105601654
--Evaluating budget hardware options for local LLM deployment with portability and future model scaling in mind:
>105609676 >105609743 >105609808 >105609858 >105610000 >105610275 >105610095
--VideoPrism: A versatile video encoder achieving SOTA on 31 of 33 benchmarks:
>105610184
--Sugoi LLM 14B/32B released via Patreon with GGUF binaries and claimed benchmark leads:
>105606204 >105606305 >105606399 >105609562 >105609620
--Interleaving predictions from multiple LLMs via scripts or code modifications:
>105609453 >105609499 >105609500 >105609534
--Hailo-10H M.2 accelerator questioned for real-world AI application viability:
>105602205 >105602335
--Radeon Pro V620 GPU rejected due to driver issues and overheating in LLM use case:
>105603370 >105603394 >105603418 >105603454 >105603762 >105603893 >105604087
--Sycophantic tendencies in cloud models exposed through academic paper evaluation:
>105601903 >105602389 >105602410 >105602064 >105603398 >105603416
--MiniMax-M1, hybrid-attention reasoning models:
>105611241 >105611443
--Qwen3 models released in MLX format:
>105608806
--Miku (free space):
>105601934 >105603103 >105604354 >105604389 >105604736 >105605940 >105606009 >105606217 >105610016 >105610160 >105610284 >105610486 >105611108 >105611119

►Recent Highlight Posts from the Previous Thread: >>105601330

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous /g/105601326#105604736
6/15/2025, 11:39:45 PM
>>105601735
A QRD from some basic testing Q3 and Q4 quants.

(E)RPs without a system prompt.
Passes a couple of basic tests reciting classic texts and counting variants of 'r'.
Fails defining "mesugaki" and "メスガキ". Potentially deliberately since it prefers one or two specific explanations.
Q4_K_M is 80-90GB meaning it's too big for 3*GPU 24GB configurations.
Q3_i-something might fit in 72GB/mikubox with context but for whatever reason all Q3 quants available are L and too (L)arge.
Prose is....OK I guess. It might be somewhat better but I can't say it's immediately obvious it's free from synthetic data as claimed.
Uses 6 experts by default so slower than expected. You can use "--override-kv dots1.expert_used_count=int:n" to lower this but speed gains are fairly minimal and the brain damage severe.

Overall, this model is in an odd spot. You can run it in VRAM with 96GB but then you could just as well go for a deepseek or qwen quant instead with offloading for higher quality outputs. On paper it could be interesting for 64 to 128GB systems with a 16 - 24GB card but it feels too slow compared to the alternatives without any obvious quality edge. But then again I haven't tested anything serious like RAG or coding, it's possible it might shine there.

It might be an interesting option for RTX PRO 6000 users since you can run the model fully in VRAM and with exl3 it would be VERY fast and, potentially, the "smartest" choice in that bracket.

Current GGUFs might potentially be broken and lack stopping token and either ramble on or stop outputs with the GPUs still under load. Might differ for different llama quant versions so check the model page for overrides to fix it.