>>106517457
Test sequence. Lock model and provider (OR can flex them; we don't want that for testing.)
Run same model, same 4 prompts. Look for refusals or any other oddities. I'm not going to try to lock the seed; we can do that next round, maybe.
We can short list that way.