Report Content - 4rchive

>>105568451
I don't see why it couldn't be used instead of a regular vision encoder in regular LLMs. It wouldn't be a plug-and-play modification, though, and the training resolution of the big one is just 384x384 pixels. You can't just "merge" it in any case.

Report

Post Preview