Would you share your Fediverse data with researchers?

Hiker

@evan
I still worry when data is to be “researched”. How do you define “researcher” here? For what purposes is “researched”? What are the limits and who determines them? Too many questions for 4 options.

Evan Prodromou

@Hiker if it depends on different factors, choose "qualified". If it's default yes, choose "qualified yes". If it's default no, choose "qualified no".

Evan Prodromou

@tassoman so, you think everyone who makes a public post has consented to participate in any and all research projects?

Jason Pester

@evan If it was on your poll list, I would choose "Not without my permission." If you do provide data to researchers, give us the option to Opt-In. Providing an explanation of the research requesting the data might be helpful too. I think people enjoy the Fediverse because it's more transparent than other platforms... so keep it this way.

Manu :fediquebec:

@evan Rsearchers too needs cats pictures! ...
Seriously : Qualified no.

Michael Vogel

@evan The moment someone uses your data to get information, you go from being a user to being a product. I don't have problems with things like the number of users on a server, the number of posts on a server and things like that. But I really don't like anything that analyses the social graph (who interacts with whom and so on).

Evan Prodromou

@heluecht even for academic research?

Evan Prodromou

@JasonPester if *you* share it, isn't that permission?

Evan Prodromou

@manu why qualified?

Michael Vogel

@evan To use a famous quote: "I have a bad feeling about this".

Every observation influences the object being observed. Knowing that my interactions may be observed and analysed is likely to make me overthink my actions, because I would automatically try to behave in a way that would influence the outcome.

Evan Prodromou

@heluecht last question: have you ever participated as a research subject before in another part of your life?

Jason Pester

@evan I don't view it that way. I choose to share content / data on platforms with an understanding of how that platform respects my rights to the content / data I post. Others have expressed a similar sentiment (see links below). I'm a straight male, but I think the author's point in the Medium article regarding LGBTQ+ outing through network data analysis is a good one.

Just a moment...

(bmatb.medium.com)

More Mastodon Scraping Without Consent (Notes on Nobre et al 2022)

There’s a new paper out about Mastodon! But unfortunately, it’s a deeply problematic one. Nobre et al’s “More of the Same? A Study of Images Shared on Mastodon’s Federated Timeline” is a paper that is now published in proceedings from International Conference on Social Informatics. (Unfortunately, it’s not open access.) Because I’m currently researching the fediverse and blogging about that process, I thought I’d write up notes on this paper. Why this paper? Frankly, because I’m pretty certain it violates the community norms, as well as terms of service, of many Mastodon instances. It instantly reminded me of the controversial paper from Zignani et al, “Mastodon Content Warnings: Inappropriate Contents on a Microblogging Platform”, which resulted in a scathing open letter and the retraction of a dataset from the Harvard Dataverse. Nobre et al’s “More of the Same” is a study of image-sharing. The authors claim that it is about image-sharing on Mastodon, but really their focus is on images they culled from Mastodon.social’s federated timeline. They pulled 4M posts from 103K active users, of which 1M had images. Since they pulled posts from Mastodon.social’s federated timeline, they saw posts from 4K separate instances. The authors state that a “relevant number” of the images they found are “explicit.” They categorize the images as such after running them through Google’s Vision AI Safe Search system. They also run the images they find through Google’s image search to trace where the images came from and how they are shared on Mastodon. Ultimately, the authors don’t really make an argument, other than stating in passing that Mastodon needs better moderation, since people share explicit images. In some ways, “More of the Same” lives up to its title: it’s more of the same poor scholarship that can be seen in Zignani et al (in fact, Nobre et al cite that controversial paper). Here are my critiques:

FOSS Academic (fossacademic.tech)

Elasticsearch server actively scraping Mastodon user data; over 150,000 individuals exposed so far

If you’re a Twitter user, you’ve probably heard of Mastodon, a free open-source software with similar micro-blogging features.

Hot for Security (www.bitdefender.com)

Evan Prodromou

@JasonPester OK. I feel like the phrasing of the question suggests consent, but ok if you don't.

Kristian

@evan Would have voted "qualified no" if I actually could, at this point. Experiences with "researchers" on other platforms leave me very very cautious and concerned here. Much more than these, however: _My_ Fediverse data itself is irrelevant. What matters seems data that somehow relates me to others, and I can't at all be sure most (or even all) of my contacts are willing for me to share any information on our interactions, communications, messages. Plus, it feels at least not trivial to handle data from people that agreed _and_, from that set, weeding out data from people that have _not_ agreed without actually having at least this interaction information at hand, back then.
(On the other end, most of my communication out here is public. I have learnt not to very much trust the Fediverse and specificially ActivityPub from a technical perspective with real "private" data so guess any researcher could probably go there and utilize some sort of web or AP crawler to get whichever public information is there without a second thought.)

Michael Vogel

@evan While writing about European privacy laws, I realised that any Fediverse research that uses personal data (even the public data) would have to be on a strict opt-in basis for European users.

When you subscribe to a social network (Fediverse-based or otherwise), you always have to agree to its terms of use, which also have to tell you something about how your data will be used. If there is no mention of (academic) research, then no one is allowed to use my data for that. Data like "who interacts with whom" (the social graph) is declared as very sensitive data, so it can never be used for anything other than the intended purpose (communication) unless I explicitly agree.

The fines are really high. Meta, for example, has had to pay fines of more than 2 billion euros in recent years, Amazon almost 800 million euros, and so on.

Evan Prodromou

@heluecht

So, in my mind, "Would you share your Fediverse data with researchers?" implies that you have agency to share or not, and that you can consent or not.

But I guess you're reading it a different way.

Evan Prodromou

@z428 Why don't you trust ActivityPub with private data? It's as good as email.

Kristian

@evan At first, I don't trust e-mail with really "private" data either, due to its very nature (store-and-forward, unencrypted metadata, encryption mainly "just" done using PGP/GPG with long-lived private keys closely tied to my identity).

Plus, I think these things don't really compare. E-mail, by default, has access control and whatever is in _my_ mailbox is supposed to be in _my_ mailbox. With maybe the exception of mailing lists, I usually don't have such a thing as an e-mail sent out to a "random public" - it's always addressing one specific recipient and usually supposed to end up in this persons very inbox invisible to someone else. Fediverse, to me, seems more like "the old WWW" here where a lot of things are public by default and anything to reduce visibility is somewhat difficult to do right on top.

Adding to that, for ActivityPub things seem slightly more complex depending on how various implementations handle things. In example, I've seen a bunch of situations in which "private" or "follower-only" messages have made it to public views in Friendica. Not sure whether these issues arising from loopholes or weaknesses in ActivityPub as a spec or "just" flaws in individual implementations, yet this makes me very very cautious how to make sure "private" messages actually remain "private".

cc @heluecht

whither and d'ye

@evan depends on whether we mean academic research or corporate data scientists

Evan Prodromou

@squinky would you say yes to either?