Search - 4rchive

>>107174665
update
got GLM Air working with
`llama-server -m "GLM-4.5-Air-Q6_K-00001-of-00003.gguf" --ctx-size 32384 -fa on -ub 4096 -b 4096 -ngl 999 -ncmoe 42`

anything I should tweak?

llama.cpp is quite a bit faster than LMStudio (5.5t/s) which is strange, I didn't expect this drastic of a difference. Thanks to all the anons in the archives who explained the flags. There was some conflicting info so I also put the source code file responsible for handling the flags into Claude too.

I hear the logs for llama-server are stored in localstorage is that stable or should I be regularly exporting them elsewhere?