>>106181368
synthetic data only works in niche areas where you can verify correctness like math, games, and leetcode.
if you add longform text already from an LLM you are adding a tiny amount of real info (contained implicitely in the prompt) and a large amount of non-information which will just bias the model towards existing tendencies