Anonymous
6/24/2025, 4:44:27 PM
No.105690616
>>105690538
not the same anon, but dude, try to understand that there's a reason you don't see datasets in the wild.
compute is relatively cheap, but the time and effort it takes to curate a really good dataset does not scale. it's a massive labor of love, and ones made by anons like us can be personally identifying. I'm not combing through 1000s of images to make sure exif and metadata are scrubbed, a photo of my car didn't get mixed in, etc.
you can use a vllm to try to auto-tag, and you can pull tags and metadata from certain places, but you'll still need to do a ton of work to make a really good base for training. expect to work on your first one for at least a few months if you have a full time job.
not the same anon, but dude, try to understand that there's a reason you don't see datasets in the wild.
compute is relatively cheap, but the time and effort it takes to curate a really good dataset does not scale. it's a massive labor of love, and ones made by anons like us can be personally identifying. I'm not combing through 1000s of images to make sure exif and metadata are scrubbed, a photo of my car didn't get mixed in, etc.
you can use a vllm to try to auto-tag, and you can pull tags and metadata from certain places, but you'll still need to do a ton of work to make a really good base for training. expect to work on your first one for at least a few months if you have a full time job.