It's not completely clear to me how various zones of the Fediverse distinguish "scraping" from "non-Mastodon ActivityPub services functioning according to spec in ways I didn't expect."
-
I think this formulation matches my own sense of why some things feel weird and others don't, and I'm really interested in pinning down what it is about some implementations that produce that impression.
(I think obviously it's more than one thing.)
Nora, Tech Aspect (@[email protected])
fundamentally the difference between scraping posts from fedi to put on a closed platform and federating via ActivityPub is the difference between *exploiting* the communities here, as if they are a natural resource, and *participating* in the communities here.
Ten Forward (tenforward.social)
-
tx @noracodes fo permission to include this, tagging her here instead of above to reduce unintentional reply-bombing
-
@kissane for me it is indeed the lack of linkbacks, the content is presented as being part of that platform, not part of a group of sites sharing a platform
-
@luis_in_brief @CyberneticForests
<whispervoice>I think protocols largely try to exclude vibes but they get through anyway, just in weird/uncanny ways</whispervoice>
-
@anniegreens Thank you!
The person representing Maven here said that the lack of linkbacks is unintentional, which makes me wonder about the effects of encountering half-built things. (We've seen this a few times just in the past few months.)
This is not a defense, just to be clear, just an open question.
-
@fraying @kissane Something I brought up a few times today is this idea of having a robots.txt file for each of our social media profiles.
Much like a website that gets to decide who can access its publicly available content, some people would also like to exercise this level of control.
EDIT: "gets to decide"
Yes, well, robots.txt isn't binding, but I hope I made my point.
-
Another POV I like a lot from Robert Gehl, who did a whole blog post:
Robert W. Gehl (@[email protected])
Latest #FOSSAcademic post: "Maven Ain't So Mavenly": https://fossacademic.tech/2024/06/12/Maven.html In which I argue that #Maven, a new social media site, is not only breaking norms of the #fediverse by #scraping without consent -- they're ironically violating their own stated reason for existing in the first place. [Responses to this will appear as comments on my blog, unless you set privacy to followers-only or stronger. CWs will work]
Mastodon (aoir.social)
Maven Ain’t So Mavenly
The ever-alert Liaizon Wakest has informed the rest of us on the ActivityPub-based fediverse of a new social media site, Maven, which has ingested millions of posts from fediverse accounts, including mine. Multiple people have pointed out how this violates consent on the fediverse. In response, the CTO of Maven, Jimmy Secretran, has explained their reasoning: We are trying to connect up to the Fediverse, to allow interaction with other ActivityPub servers. This definitely seems to me to be within the spirit of what ActivityPub enables, but of course, I don’t want to have Maven connect to anybody who doesn’t want it. [Note that I normally do not quote fediverse posts without permission, but in this case, I am making an exception, for reasons that I think will be obvious.] I replied in the thread, arguing that, no, they are not really abiding by the spirit of ActivityPub: This isn’t how this works. No one starts a fediverse (AP) server by ingesting a bunch of posts from others without their consent. They start servers and start federating with the rest of the network. Please stop ingesting posts from AoIR.social (I’m the admin, btw). and The custom is to start a server with a code of conduct, including clear moderation rules, so that the rest of us can make informed choices about federating. What you’ve done with Maven is a pretty massive violation of norms, and likely it will result in your being defederated from many other instances. It’s a poor way to start an ActivityPub implementation. To be fair to Secretran and Maven, they have since stopped scraping my posts and, I presume, those of others who have asked them to stop. Still, I eagerly await Maven’s full ActivityPub implementation so that we can block them effectively. This incident got me to thinking about norms and customs on the fediverse and how important they are.
FOSS Academic (fossacademic.tech)
-
tx @rwg for permission to include
-
@kissane Hmm.
-
@kissane One aspect that seems to get folks into trouble, from what I've seen:
If you're federating properly with AP servers, they generally push content at your server. You don't need to go out and fetch it. And those servers will stop pushing content if you get blocked. That's kind of an implementation of consent in the protocol.
If your server is going out and pulling in content to ingest to then remix & republish, that's where trouble starts. I think that's what folks colloquially call "scraping" - even if technically it comes from RSS & JSON feeds and not literally scraped from HTML or other sources.
I'm not sure which of these Maven in particular did
-
@lmorchard @kissane As I'm trying to point out, as soon as you mention AI ingesting art without permission, or crypto bros thinking they can scrape the entire Spotify catalog and re-sell it, people fundamentally understand this issue.
So long as it involves art or music, and big corporations with massive legal departments.
But if it's not covered by the DMCA it's fair game, they assume, I guess. Suddenly laws don't matter, ethics don't matter, it's the wild west.
-
@vivtek "Social protocol is harder than technical protocol" x ♾️
-
@kissane Woof, it really is, partly because debugging is so expensive. This is the first I'm hearing about this Maven thing. I find myself having two diametrically opposed kneejerk reactions and it's uncomfortable.
-
@vivtek "two diametrically opposed kneejerk reactions" is my whoooole zone
-
Last thing before I re-submerge—I think big emotional/instinctive reactions are so interesting and worthy of examination and usually point to meaningful low-level structural problems or disjunctures even when they seem to be about something else. (By low-level here I mean lower than protocol. Social contract stuff.)
@CyberneticForests's look at similar things here is really interesting
Context, Consent, and Control: The Three C’s of Data Participation in the Age of AI | TechPolicy.Press
Eryk Salvaggio says it is naive to believe tensions in AI policy development and norms of use could be resolved through a focus on copyright alone.
Tech Policy Press (www.techpolicy.press)
-
@oliphant @lmorchard I think…there are several separate (through related) arguments about this class of behaviors that always get smashed together and become essentially unaddressable.
Like—technical questions, legal questions, many kinds of ethical questions, the right way to handle or even communicate conflicting social norms, assumptions about the inner lives of others. It's a lot.
-
@kissane Hmm. I mean, the structure of the Fediverse that allowed the former-OpenAI guy's data scraper to hoover up ppl's posts and profiles, then post that on their own website to feed to their AI and ChatBots with no notification or consent seeeeems like a bug?
Non-dev here, but if that's Fedi functioning to spec, it's by design then b/c of federation?
I don't know enough to be able to tell, but is the AI company's person credible when they say some of their actions here were unintentional?
-
@fembot I mean, this is an oversimplification, but the Fediverse is essentially a network of websites that make copies of information that originated on other websites and link them together into feeds, so whether a site using ActivityPub to ingest messages is "scraping" or not is…largely in the eye of the beholder.
Mastodon accounts also have RSS feeds, so distribution is really built in—but also public information is going just always going to get indexed and scraped, here in 2024.
-
@fembot I think what we're dealing with every time this happens (several times a year, at the moment) is a clash between what's technically possible and some of the emergent social norms of the Fediverse. Which are simultaneously uncodified, hotly contested, and fiercely protected by their promoters.
-
@kissane Yeah. Clarity of both would probably be helpful. Agreement on both would be even better, its own process I'm sure, and may not even be a solid goal. It would be difficult, I think. (Not impossible? Over time? No idea if that's on any radar, just thinking out loud and appreciating how thoughtful you've been with it.)