Anonymous
7/12/2025, 4:48:34 PM
No.105881738
>>105881247
The kv_cache has been undergoing significant rewrites recently.
It started with making SWA support less hacky, and most recently to better support state for recurrent layers.
I believe the MLA K-cache-only PR was held back because there's another PR open for a more general way to split the K and V caches.
The kv_cache has been undergoing significant rewrites recently.
It started with making SWA support less hacky, and most recently to better support state for recurrent layers.
I believe the MLA K-cache-only PR was held back because there's another PR open for a more general way to split the K and V caches.