>>106965435
Deepseek-OCR paper claims they discovered that 1000 text tokens can be represented in 100 visual tokens. And you get graceful degradation of context by reducing the resolution of the visual tokens. They explicitly call out the implication for huge context. Z.ai just published a paper claiming the same finding with their VLM.