>>106160296
>>106160304
The way the mxfp4 weights are encoded in llama.cpp/ggml is as quantized blocks of 4 bit integers with an FP8 scale per block.
Like with i-quants the 4 bit integers are then used as indices for a table of 8 bit integers that can be used in the actual dot products.