>>106318374
Yeah, if someone compares against another method in a paper you should always assume it's the dumbest approach possible with that name. They want to look good. If they compare to "INT4" it will be the most naive shitty uniform quantization possible. Quant papers never compare against llama.cpp.

>>106316643
Sadly RAM setups are ultimately cope, because doing matmuls on CPU will always be shit. It's the wrong tool for the job, like eating soup with a fork. The hardware isn't meant for it. Of course GPUs with tiny VRAM are even worse, but the ultimate issue is that no consumer hardware is suitable for LLMs right now. Either we need better hardware, or a different architecture which either needs less space (for GPU) or less compute (for CPU)