How do you understand what quantization to use? For example with llama.cpp and GLM 4.5 Air if it differs based on model or backend.
16 vram, 64 ram.