Search Results
7/15/2025, 6:04:40 AM
>>105910857
Can anyone explain how it is possible to run a 1T model at 200-300 tokens/second without quantizing it to death? Even on LPUs.
(see >>105910860, it did actually make greentext, there was just markdown mode enabled)
Can anyone explain how it is possible to run a 1T model at 200-300 tokens/second without quantizing it to death? Even on LPUs.
(see >>105910860, it did actually make greentext, there was just markdown mode enabled)
7/15/2025, 6:02:03 AM
>>105910665
>>105910689
this is fucking illegal btw. and I tested it on my programming questions, its basically the same normal K2, they didn't quantize it much. That speed is insane, also on the API I'm seeing up to 350 tokens/s in some cases.
>>105910689
this is fucking illegal btw. and I tested it on my programming questions, its basically the same normal K2, they didn't quantize it much. That speed is insane, also on the API I'm seeing up to 350 tokens/s in some cases.
Page 1