Anonymous
8/21/2025, 10:50:06 AM
No.106333721
What does batch refer to in training? In inference, I can imagine how it works, but I'm not sure how the reverse of it would somehow work for training. Do they use a hack to make it work, and that's why we're arguing that large batch size is bad?