It's not completely clear to me how various zones of the Fediverse distinguish "scraping" from "non-Mastodon ActivityPub services functioning according to spec in ways I didn't expect."
-
Last thing before I re-submerge—I think big emotional/instinctive reactions are so interesting and worthy of examination and usually point to meaningful low-level structural problems or disjunctures even when they seem to be about something else. (By low-level here I mean lower than protocol. Social contract stuff.)
@CyberneticForests's look at similar things here is really interesting
-
@oliphant @lmorchard I think…there are several separate (through related) arguments about this class of behaviors that always get smashed together and become essentially unaddressable.
Like—technical questions, legal questions, many kinds of ethical questions, the right way to handle or even communicate conflicting social norms, assumptions about the inner lives of others. It's a lot.
-
@kissane Hmm. I mean, the structure of the Fediverse that allowed the former-OpenAI guy's data scraper to hoover up ppl's posts and profiles, then post that on their own website to feed to their AI and ChatBots with no notification or consent seeeeems like a bug?
Non-dev here, but if that's Fedi functioning to spec, it's by design then b/c of federation?
I don't know enough to be able to tell, but is the AI company's person credible when they say some of their actions here were unintentional?
-
@fembot I mean, this is an oversimplification, but the Fediverse is essentially a network of websites that make copies of information that originated on other websites and link them together into feeds, so whether a site using ActivityPub to ingest messages is "scraping" or not is…largely in the eye of the beholder.
Mastodon accounts also have RSS feeds, so distribution is really built in—but also public information is going just always going to get indexed and scraped, here in 2024.
-
@fembot I think what we're dealing with every time this happens (several times a year, at the moment) is a clash between what's technically possible and some of the emergent social norms of the Fediverse. Which are simultaneously uncodified, hotly contested, and fiercely protected by their promoters.
-
@kissane Yeah. Clarity of both would probably be helpful. Agreement on both would be even better, its own process I'm sure, and may not even be a solid goal. It would be difficult, I think. (Not impossible? Over time? No idea if that's on any radar, just thinking out loud and appreciating how thoughtful you've been with it.)
-
@kissane I thought when the networked Fedi pulls info from one site to another it's b/c a request was initiated in-house, so to speak.
And when m*ven populated its site with Fedi profiles & posts (and fed them into an AI processor) it doesn't seem like it was fulfilling an Activity Pub service.
Like: barging into a stranger's house to raid their fridge (and post pics of them online) vs. having a meal with the fam at home.
Our profiles & posts have been scraped like this before? When?
-
@fembot I mean I don’t think it’s a *good* implementation of ActivityPub integration, just that “scraping” is hard to define except via relatively diffuse social norms, in this system. (I don’t have a list of previous incidents on my phone but this is a pattern that repeats.)
-
@fembot I’m just me but I think that although the odds of achieving consensus are vanishingly small, there’s room to turn a lot of diffuse and conflicting norms into at least a mutually intelligible patchwork of policies. I’m definitely thinking about it a lot.
-
Agreed, although I think it's actually mutually intelligible today: some people think that consent matters even for public posts, others don't. But the people with the loudest megaphones are in the "if it's public anybody can do anything they want with it" camp, and position the "consent matters" perspective as problematic, naive, wrong, and/or only held by a small minority. So it's very easy for people who aren't deeply familiar with fediverse dynamics and history to assume that the same norms as everywere else on the internet apply ... and get taken by surprise that they don't.
At FediForum several guys expressed concern that these dynamics are giving the fediverse a bad reputation for developers ... which makes sense: it's clearly a minefield, who wants to venture in without a map? It was interesting, though, when I wrote up an article about it I got a lot of compliments from people in both camps ... but none of the guys who expressed concern about developers getting scared away actually shared it. Oh well. Of course, who knows, if the Maven people had read the article they might well have ignored it anyhow ... but at least they wouldn't be taken by surprise.
https://privacy.thenexus.today/consent-for-fediverse-developers/
@[email protected] @[email protected]