Newfag here, Im trying to build fast local models for erp conversations, what are some models that are on par with qwen-flash's speed? Because those 1-3s delays in most LLMs are a huge turndown for me. We are talking about around 3-500ms with like 100-300 input tokens.
Also in picrel the numbers of gigabytes in parentheses are the memory needed right? How tf are you supposed to have 200GB in your local machine?