Search Results
7/17/2025, 9:37:48 PM
At least as far as I can tell, here are the most recent LORA training settings posted in the thread as .toml files:
https://files.catbox.moe/1zxhfw.toml
https://files.catbox.moe/g26bov.toml
https://files.catbox.moe/9u9jei.toml
https://files.catbox.moe/3kx4ms.toml
https://files.catbox.moe/72zp6j.toml
Looking at all of them, it seems like the vary a lot. Is there no general consensus on what works the best?
Just in regards to the optimizer, one uses AdamW, one uses AdamW8bit, two use Came (which I had to look up because it's not in my Kohya install by default, it's https://github.com/yangluo7/CAME/ ), and one uses something called "torchastic.Compass" which I couldn't even find trying to look it up.
It was my impression that AdamW was basically the default option, AdamW8bit was smaller if you needed faster training or less memory, and that algorithms which can change the learning rate dynamically like Prodigy were supposed to deliver better results.
Is that not the case, or are there other considerations? I see from https://github.com/yangluo7/CAME/ that CAME seems to be optimized for speed and lower memory usage, so is that just what people are valuing more?
https://files.catbox.moe/1zxhfw.toml
https://files.catbox.moe/g26bov.toml
https://files.catbox.moe/9u9jei.toml
https://files.catbox.moe/3kx4ms.toml
https://files.catbox.moe/72zp6j.toml
Looking at all of them, it seems like the vary a lot. Is there no general consensus on what works the best?
Just in regards to the optimizer, one uses AdamW, one uses AdamW8bit, two use Came (which I had to look up because it's not in my Kohya install by default, it's https://github.com/yangluo7/CAME/ ), and one uses something called "torchastic.Compass" which I couldn't even find trying to look it up.
It was my impression that AdamW was basically the default option, AdamW8bit was smaller if you needed faster training or less memory, and that algorithms which can change the learning rate dynamically like Prodigy were supposed to deliver better results.
Is that not the case, or are there other considerations? I see from https://github.com/yangluo7/CAME/ that CAME seems to be optimized for speed and lower memory usage, so is that just what people are valuing more?
Page 1