>>106521925
> real benches
Fair. This was done with a standardized 4-round prompt set and short model, and judgement is based on my prior months of experience reading DS V3-0324 API outputs. I spent about 1 hour on the whole effort.
All but GMICloud output responses that were well within my expectations of the old DS API. I reran GMICloud and got 2 sets of "off" responses (basically, one run-on paragraphs with no formatting). Given there's little difference b/t provider cost, they fall off the list as sus and that's it.
>>106521612
/wait/ used to get tons of "why isn't DS working / giving crummy outputs" and digging in, root cause was always using OR and running free versions. Looking at outputs it was 7B-tier responses so go figure.
>>106521951
I've recently had issues with -chat breaking down. This got discussed here
>>106476967
tldr use -reasoner for longer context.