It's not completely clear to me how various zones of the Fediverse distinguish "scraping" from "non-Mastodon ActivityPub services functioning according to spec in ways I didn't expect."
-
It's not completely clear to me how various zones of the Fediverse distinguish "scraping" from "non-Mastodon ActivityPub services functioning according to spec in ways I didn't expect."
Given how frequently protocol behaviors act as ethical markers ("if you *can* do it, it's fine") this seems like a fruitful territory to try to map…
(I say this as someone who has myself been surprised more than once by AP implementations that put Fedi posts into unfamiliar-to-me contexts, don't eat me.)
-
@kissane Given that the entire protocol copies things from server to server on purpose, the periodic freakouts about new services doing exactly that while others are applauded for "implementing the protocol" (which copies things by default) is so mystifying to me.
-
@fraying I think whenever I get confused by this stuff I get *really interested* because it's usually a sign that there are cultural norms being obscured by technical capacities. And of course I love that shit.
-
@kissane in a discussion last week[1], someone explained a related vibe as "I used to create works, now I produce points in a corpus" and I've been thinking about that a lot as an efficient encapsulation of the current moment.
Among other things thoughts, I'm not aware of any protocol, or implementation, or legal supplement to a protocol, that has terminology or other mechanisms to capture that vibe.
[1] Maybe it was @CyberneticForests ?
-
I think this formulation matches my own sense of why some things feel weird and others don't, and I'm really interested in pinning down what it is about some implementations that produce that impression.
(I think obviously it's more than one thing.)
Nora, Tech Aspect (@[email protected])
fundamentally the difference between scraping posts from fedi to put on a closed platform and federating via ActivityPub is the difference between *exploiting* the communities here, as if they are a natural resource, and *participating* in the communities here.
Ten Forward (tenforward.social)
-
tx @noracodes fo permission to include this, tagging her here instead of above to reduce unintentional reply-bombing
-
@kissane for me it is indeed the lack of linkbacks, the content is presented as being part of that platform, not part of a group of sites sharing a platform
-
@luis_in_brief @CyberneticForests
<whispervoice>I think protocols largely try to exclude vibes but they get through anyway, just in weird/uncanny ways</whispervoice>
-
@anniegreens Thank you!
The person representing Maven here said that the lack of linkbacks is unintentional, which makes me wonder about the effects of encountering half-built things. (We've seen this a few times just in the past few months.)
This is not a defense, just to be clear, just an open question.
-
@fraying @kissane Something I brought up a few times today is this idea of having a robots.txt file for each of our social media profiles.
Much like a website that gets to decide who can access its publicly available content, some people would also like to exercise this level of control.
EDIT: "gets to decide"
Yes, well, robots.txt isn't binding, but I hope I made my point.
-
Another POV I like a lot from Robert Gehl, who did a whole blog post:
Robert W. Gehl (@[email protected])
Latest #FOSSAcademic post: "Maven Ain't So Mavenly": https://fossacademic.tech/2024/06/12/Maven.html In which I argue that #Maven, a new social media site, is not only breaking norms of the #fediverse by #scraping without consent -- they're ironically violating their own stated reason for existing in the first place. [Responses to this will appear as comments on my blog, unless you set privacy to followers-only or stronger. CWs will work]
Mastodon (aoir.social)
Maven Ain’t So Mavenly
The ever-alert Liaizon Wakest has informed the rest of us on the ActivityPub-based fediverse of a new social media site, Maven, which has ingested millions of posts from fediverse accounts, including mine. Multiple people have pointed out how this violates consent on the fediverse. In response, the CTO of Maven, Jimmy Secretran, has explained their reasoning: We are trying to connect up to the Fediverse, to allow interaction with other ActivityPub servers. This definitely seems to me to be within the spirit of what ActivityPub enables, but of course, I don’t want to have Maven connect to anybody who doesn’t want it. [Note that I normally do not quote fediverse posts without permission, but in this case, I am making an exception, for reasons that I think will be obvious.] I replied in the thread, arguing that, no, they are not really abiding by the spirit of ActivityPub: This isn’t how this works. No one starts a fediverse (AP) server by ingesting a bunch of posts from others without their consent. They start servers and start federating with the rest of the network. Please stop ingesting posts from AoIR.social (I’m the admin, btw). and The custom is to start a server with a code of conduct, including clear moderation rules, so that the rest of us can make informed choices about federating. What you’ve done with Maven is a pretty massive violation of norms, and likely it will result in your being defederated from many other instances. It’s a poor way to start an ActivityPub implementation. To be fair to Secretran and Maven, they have since stopped scraping my posts and, I presume, those of others who have asked them to stop. Still, I eagerly await Maven’s full ActivityPub implementation so that we can block them effectively. This incident got me to thinking about norms and customs on the fediverse and how important they are.
FOSS Academic (fossacademic.tech)
-
tx @rwg for permission to include
-
@kissane Hmm.
-
@kissane One aspect that seems to get folks into trouble, from what I've seen:
If you're federating properly with AP servers, they generally push content at your server. You don't need to go out and fetch it. And those servers will stop pushing content if you get blocked. That's kind of an implementation of consent in the protocol.
If your server is going out and pulling in content to ingest to then remix & republish, that's where trouble starts. I think that's what folks colloquially call "scraping" - even if technically it comes from RSS & JSON feeds and not literally scraped from HTML or other sources.
I'm not sure which of these Maven in particular did
-
@lmorchard @kissane As I'm trying to point out, as soon as you mention AI ingesting art without permission, or crypto bros thinking they can scrape the entire Spotify catalog and re-sell it, people fundamentally understand this issue.
So long as it involves art or music, and big corporations with massive legal departments.
But if it's not covered by the DMCA it's fair game, they assume, I guess. Suddenly laws don't matter, ethics don't matter, it's the wild west.
-
@vivtek "Social protocol is harder than technical protocol" x ♾️
-
@kissane Woof, it really is, partly because debugging is so expensive. This is the first I'm hearing about this Maven thing. I find myself having two diametrically opposed kneejerk reactions and it's uncomfortable.
-
@vivtek "two diametrically opposed kneejerk reactions" is my whoooole zone
-
Last thing before I re-submerge—I think big emotional/instinctive reactions are so interesting and worthy of examination and usually point to meaningful low-level structural problems or disjunctures even when they seem to be about something else. (By low-level here I mean lower than protocol. Social contract stuff.)
@CyberneticForests's look at similar things here is really interesting
Context, Consent, and Control: The Three C’s of Data Participation in the Age of AI | TechPolicy.Press
Eryk Salvaggio says it is naive to believe tensions in AI policy development and norms of use could be resolved through a focus on copyright alone.
Tech Policy Press (www.techpolicy.press)
-
@oliphant @lmorchard I think…there are several separate (through related) arguments about this class of behaviors that always get smashed together and become essentially unaddressable.
Like—technical questions, legal questions, many kinds of ethical questions, the right way to handle or even communicate conflicting social norms, assumptions about the inner lives of others. It's a lot.