Search Results
8/6/2025, 9:33:35 PM
spec: rtx 4070 ti super (16gb)
wtf this is actually true, with ollama gpt oss 20b was taking up all my vram (like the loaded model was 15GiB) and max speed was ~85 tok/s, I tried llama.cpp now (with lmstudio) and i get up to 130 tok/s (with enabled flash attention) and the model takes 12GiB as seen by nvtop, so I have plenty of free space for the browser and the rest. wtf...
wtf this is actually true, with ollama gpt oss 20b was taking up all my vram (like the loaded model was 15GiB) and max speed was ~85 tok/s, I tried llama.cpp now (with lmstudio) and i get up to 130 tok/s (with enabled flash attention) and the model takes 12GiB as seen by nvtop, so I have plenty of free space for the browser and the rest. wtf...
Page 1