New from 404 Media: Bluesky may have said it won't use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for "machine learning research".
-
New from 404 Media: Bluesky may have said it won't use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for "machine learning research". Already very popular dataset, your data may be scraped https://www.404media.co/someone-made-a-dataset-of-one-million-bluesky-posts-for-machine-learning-research/
-
@josephcox I guess you could also pull this together from Mastodon, but Bluesky is going to make readily available data much faster.
-
@tehstu @josephcox I'm pretty critical of Bluesky (see my timeline) but I don't see why this would be any harder or slower to do from Mastodon/fedi
-
Evan Prodromoureplied to Tom Walker on last edited by [email protected]
@tomw @tehstu @josephcox all Bluesky data is public. Many ActivityPub posts are private or followers-only.
-
@evan Small clarification as I know you’ve avoided the Bluesky literature: Bluesky DMs are not public because they’re not part of ATProto. They’re a separate service.
-
Evan Prodromoureplied to amd on last edited by [email protected]
@amd thanks. Updated.