>>106996665
>to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

I'll be very impressed if you manage to achieve this through fine tuning, but I'd temper my expectations if I were you