Report Content

>>106948126
According to the paper, image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio, and fair quality at a 1:20 ratio.
In a way, I've noticed something along these lines with text-rich images in Gemma 3. Sometimes it appears as if it can extract more information than the 256 visual tokens it encodes images in, although I've never analyzed this in detail.

Post Preview