Anonymous
6/12/2025, 8:26:41 AM
No.105568644
>>105568451
I don't see why it couldn't be used instead of a regular vision encoder in regular LLMs. It wouldn't be a plug-and-play modification, though, and the training resolution of the big one is just 384x384 pixels. You can't just "merge" it in any case.
I don't see why it couldn't be used instead of a regular vision encoder in regular LLMs. It wouldn't be a plug-and-play modification, though, and the training resolution of the big one is just 384x384 pixels. You can't just "merge" it in any case.