Search - 4rchive

Newfag here, Im trying to build fast local models for erp conversations, what are some models that are on par with qwen-flash's speed? Because those 1-3s delays in most LLMs are a huge turndown for me. We are talking about around 3-500ms with like 100-300 input tokens.

Also in picrel the numbers of gigabytes in parentheses are the memory needed right? How tf are you supposed to have 200GB in your local machine?

Search results for "a62d4473a3112320b8da2698bc924aa3" in md5 (1)