Search - 4rchive

i've setup a server with Qwen3-Next-80B-A3B-Instruct for you anons, give it a try. 64K context. It's on vllm with pipeline parallelism so not the best but its should support quite a few parallel request. It's a bit of a frakenmix with a mixture of gpus but it's running at 65t/s for me.

url in picrel

api-key: "sk-miku"
model-name: "Qwen/Qwen3-Next-80B-A3B-Instruct"

I tested with openwebui and it requires the /v1, sillytavern does too in chat completions but not on text completions.

Someone do a cockbench