the model also stays mad coherent with many images in a single prompt
>24469 tokens