VLM1 is based on DeepseekV3 and is SOTA for vision outside of closed source models. It shouldn't be that hard to make it goofable.