>>720055334
>>720055697
Yeah lower quants are just dumber, more streamlined, fewer possibilities etc, but I would take a lobotomized 24B over an unquantized 7B any day, with the same 8gb filesize, any day.
Thats just my experience though. 7Bs are cool to tool around with, but they're also likely to have ethical objections to literally anything interesting you might want to do.
So if you're really stuck for processing power and want a snappy response, honestly even the 8GB Q2 model is way better than Silicon Maid or any other standard recommended 7B recommendation, and gives very fast results.
Also stay away from MOE (Mix of Experts) models. It's like a panel of lobotomized dipshits fighting for airtime, and none of them are worth listening to. Cool idea, shite results.