Anonymous
8/3/2025, 1:25:38 PM
No.106125632
>>106125391
>>106125381
Yeah, also similar difference for dense (although not as pronounced as for moe):
turboderp-Qwen3-8B-exl3-6.0bpw (qwen3, 8B, 6.5 GB) tabbyapi 056527c exllamav3: 0.0.4
3 Requests gen: 31.3 Tokens/sec Total: 782 processing: 3743.2 Tokens/sec Total: 12989
Qwen-Qwen3-8B-Q6_K.gguf (qwen3, 8B, 6.3 GB) llama.cpp 5937 (bf9087f5)
3 Requests gen: 36.5 Tokens/sec Total: 1536 processing: 4775.2 Tokens/sec Total: 13352
So exl2 is faster than lcpp for pp, but lcpp is faster than exl3, on 3090.
>>106125381
Yeah, also similar difference for dense (although not as pronounced as for moe):
turboderp-Qwen3-8B-exl3-6.0bpw (qwen3, 8B, 6.5 GB) tabbyapi 056527c exllamav3: 0.0.4
3 Requests gen: 31.3 Tokens/sec Total: 782 processing: 3743.2 Tokens/sec Total: 12989
Qwen-Qwen3-8B-Q6_K.gguf (qwen3, 8B, 6.3 GB) llama.cpp 5937 (bf9087f5)
3 Requests gen: 36.5 Tokens/sec Total: 1536 processing: 4775.2 Tokens/sec Total: 13352
So exl2 is faster than lcpp for pp, but lcpp is faster than exl3, on 3090.