my local LLM is generating the code I asked for at half a token per second