Let's say we decide to put together an /lmg/ dataset for a 10–24B model. What should it include besides the baseline data, reasoning, and math that's required for any functional model?
i can think of
>Fandom wiki
>Literotica
>AO3
>light novels for weeb shit