Report Content - 4rchive

Those retards are basically ollama of quanters. I have no idea what they are doing with this shit but you can quant all attention layers to at least 5bpw. shared experts to 4.5bpw and routed experts to 2.1-2.3bpw and get the same filesize.

Report

Post Preview