>>106502406
>llama-server -m Qwen3-30B-A3B-Instruct-2507-Q5_K_M.gguf --verbose --threads 8 --threads-batch 16 --batch-size 512 --ubatch-size 512 --n-cpu-moe 41 --gpu-layers 99 --override-kv qwen3moe.expert_used_count=int:10 -c 32000 -fa auto --no-mmap --cache-reuse 256 --offline
>7.5GB VRAM + 25ish GB RAM used
Yeah, alright.
The response was really good too.
I'll do more testing, but this will do.