Search Results
7/2/2025, 9:06:23 PM
Summary of changes
Add support for Mamba2ForCausalLM (including the official Mamba-2 models, and Mamba-Codestral-7B-v0.1)
Note that config.json needs to contain "architectures": ["Mamba2ForCausalLM"], for the convert script to properly detect the architecture.
View Mamba-1 as having d_inner (aka 2 * n_embd) heads of size 1.
This simplifies the handling of shapes in ggml_ssm_scan
ggml
Implement Mamba-2's selective state update in ggml_ssm_scan.
Re-using the same operator as Mamba-1, because it's pretty much the same operation. (except for how ssm_a is broadcast)
Fuse the operation with ssm_d into ggml_ssm_scan
Otherwise it would need to be transposed, because the dot-products are done head-wise.
Implement Mamba-2's SSM scan with GGML_SIMD.
This is possible because there is no element-wise expf in the state update unlike with Mamba-1.
Avoid state copies for the SSM state (both for Mamba-1 and Mamba-2) by passing state ids to ggml_ssm_scan.
Mamba-2 states are huge. Otherwise masking and copying took close to 10% of the CPU time according to perf.
2/2
Add support for Mamba2ForCausalLM (including the official Mamba-2 models, and Mamba-Codestral-7B-v0.1)
Note that config.json needs to contain "architectures": ["Mamba2ForCausalLM"], for the convert script to properly detect the architecture.
View Mamba-1 as having d_inner (aka 2 * n_embd) heads of size 1.
This simplifies the handling of shapes in ggml_ssm_scan
ggml
Implement Mamba-2's selective state update in ggml_ssm_scan.
Re-using the same operator as Mamba-1, because it's pretty much the same operation. (except for how ssm_a is broadcast)
Fuse the operation with ssm_d into ggml_ssm_scan
Otherwise it would need to be transposed, because the dot-products are done head-wise.
Implement Mamba-2's SSM scan with GGML_SIMD.
This is possible because there is no element-wise expf in the state update unlike with Mamba-1.
Avoid state copies for the SSM state (both for Mamba-1 and Mamba-2) by passing state ids to ggml_ssm_scan.
Mamba-2 states are huge. Otherwise masking and copying took close to 10% of the CPU time according to perf.
2/2
Page 1