Search - 4rchive

>>106838343
User of API pays for think tokens at the output pricing.

Since reasoning counts to output tokens, it can become a problem b/c it does reasoning first, then output, and if the total output budget isn't high enough it'll truncate the response. e.g. 1000 output token limit, it does 900 of think, you get a truncated 100 output token reponse that's often just cut off.