>>106246342
I admit I've been out of the loop for a while since Mistral fell off last year but the benchmaxx scores of 120b got my attention so I compiled the latest llama.cpp to see if it was any good. It is very good. Fortunately, I don't do ERP and for what I do I haven't received a single refusal yet. On a 12900k and a single RTX 3090 I get something like 10 t/s which isn't great but it's a hell of a lot better than what I was getting with llama 3 70b which is what I was messing with last time. Only thing I can say is for my use case 120b is by far the closest thing to the "real thing" I can get going locally. Defaulting it to high reasoning helps though, out of the box it wants medium