Hi Fediverse denizens. I've been working on a project I hope will help Fediverse devs make software that federates across ALL services, not just Mastodon-plus-a-few-others.
-
-
Deborah Pickettreplied to Darius Kazemi last edited by
@darius How do you counteract poisoning of the data by bad actors? Do you have a blocklist of peers or addresses to discard data from? Can you remove poisoned anonymized datapoints if they’re discovered long after collection?
Does the relay work with peers who have AUTHORIZED_FETCH turned on? Most relays don’t.
-
The Nexus of Privacyreplied to Darius Kazemi last edited by
@darius thanks for taking time time to write it up so thoroughly and get feedback! It seems like a great tool -- I was impressed by the demo at FediForum -- and I really appreciate you thinking so deeply about the privacy aspects and taking a conset-based approach. I certainly hope that this sets the bar for future projects.
Opt-in at the server level makes a lot of sense to me, and I like the specific approach you described in your reply to @djsundog ... it's a mechanism server admins are already familiar with. The discussion of how you can't leverage existing opt-in/opt-out signals makes it clear that trying to do so would compromising user privacy (and also exposes a limitation of the current design -- not your issue but something I hope developers think about).
Scrubbing the data is a great example of data minimization, and the example makes it easy to understand. The exceptions you list all seem very sensible,
A question about the additional opt-out mechanism ... does this do anything more than the admin undoing the opt-in by unsubscribing? If not, then it might be overkill ... although certainly nothing the matter with having an email-based opt-out as well.
-
Darius Kazemireplied to Deborah Pickett last edited by
@futzle yes to blocklist and yes to removing poisoned data manually
No it doesn't work with authorized fetch, and I assume that a server with that turned on doesn't want my tool anywhere near them anyway
-
@darius okay! this seems solid to us. we do suspect it errs on the side of not capturing quite enough data to solve real compatibility problems, but we're supportive of resolving that iteratively by looking at the data captured this way, then studying what else it's useful to capture one thing at a time.
-
Darius Kazemireplied to The Nexus of Privacy last edited by
@thenexusofprivacy @djsundog if I get any data from a server that's opted out via "announce leaking" I will instantly drop it instead of recording it
-
The Nexus of Privacyreplied to Darius Kazemi last edited by
@darius got it. that makes sense then.
-
The Nexus of Privacyreplied to The Nexus of Privacy last edited by
@darius also with my pedantic hat on, your approach very much aligns with the principles in https://www.cell.com/patterns/fulltext/S2666-3899(23)00323-9, which is good! I'll update my original post to mention that.
-
Mike [SEC=OFFICIAL]replied to Darius Kazemi last edited by
@darius I'd love to read the blog post but light text, black background is a massive accessibility problem for a lot of people. If you're able to incorporate some detection of user colour theme requirements into the styles for the project and its documentation that'd be extremely helpful. Thanks.
-
@darius @nuintari For example, Lemmy uses Announce totally differently and with a different meaning and intent than Mastodon does. That information cannot be seen by looking at the JSON (the "results" that @nuintari was referring to).
As another example - in Lemmy a Note is a comment on a post and posts are shared over the wire using a Page object.
Still, a great tool and sorely needed. Thank you!
-
Steffo :deadlock_dynamo:replied to Darius Kazemi last edited by
@darius opposite question!
will a relay explicitly for opting in be available?
some people may not want to join a public relay due to how heavy they are on server load, but may want to provide data to your project nontheless…
-
@[email protected] Hello! I am glad I found this post again, it got lost in my feed. I emailed you a couple of questions. I see you already answered the authorized fetch one. I bring the following counter argument though:
I run a queer instance, and I feel pretty confident with what you’ve shown off so far when it comes to your data “deanonymization" because you’re stripping everything and just changing it to a skeleton highlighting types and the various nested structures. I’m okay with this, but authorized fetch is a protection against instances that pose a threat to my community. I can’t exactly disable it (even though I am completely okay with the collection method you outline) just to opt in. Would you consider looking into that in the future? (I don’t believe you need an entire instance implementation just something that can sign the requests so my instance is happy). -
Pumpkin Amberreplied to Steffo :deadlock_dynamo: last edited by
@[email protected] @[email protected] this is one of the reasons why I suggested perhaps intercepting an instance's job queue. There are certain things that can’t be collected by a relay such as forwarded reports which are still just as important to handle.
-
Jen :TransButterfly: :3hearts: :Green:replied to Darius Kazemi last edited by
@darius Honestly, you could have just left this at "I'm not assuming that I am entitled to the content of every public post on the federated network" and I'd have already known you were more on the level than most scrapers and bridges. But you seem to have put a lot of effort into bot only not hanging on to the content of every post on the network, but into being transparent about the how and why. I'd give you a thumbs-up react, but most Masto servers don't support it. (So obviously I agree with your stated goal as well. :neobot_giggle:)
One thing that occurred to me is when a fedi instance uses custom forks. For example, Anarres.family uses a customized fork of Glitch Social (itself a fork of Mastodon), and the version number reported on our web interface, and presumably what your software will see, is
v4.4.0-alpha.1+glitch.anarres.family
, which is a pretty common practice for these forks, though not all of them include the full URL of the instance like ours does. So with your example of "we saw n polls from [software] x.y.z," in our case it would not actually anonymize us because it would display "50 polls by Masto...anarres.family." Not terribly critical since you're planning on scrubbing post data, so there's not much that could be shared that we'd likely want anonymized.You could get around that by truncating or obfuscating the version, and possibly rolling it in with the equivalent Masto or Glitch version, but that's relevant data; one of the things our fork adds is the emoji reaction code from Chuckya, so our instance is capable of sending and receiving AP messages that are not supported by other Glitch instances.
-
NeoDB Open Source Softwarereplied to Darius Kazemi last edited by
@darius nice idea. consider opt in with https://eggplant.place
meanwhile it would be nice to share your code url, host name, and nodeinfo once you are ready to open your service. much as you'd like to research others' schema, other admin/dev may want to know more about yours, by looking into nodeinfo and code.
-
@darius cool! I didn't dig in too deeply, so maybe you cover this, but a related helpful feature/tool might be knowing which software (libraries, implementations) would be able to parse data/schemas. getting in to https://caniuse.com/ territory, or things like ACID.
also often helpful if folks (eg, devs) can paste in data (JSON) and see how it validates, using the exact code being used to "observe", without that being captured and included in database.
-
Darius Kazemireplied to The Nexus of Privacy last edited by
@thenexusofprivacy thank you! I've been talking with Rob a lot
-
@ireneista 100% correct! That's the idea. Release a version that's very safe, see how useful it is, figure out the gaps, figure out how to safely address those gaps, do another public comment period, rinse and repeat
-
Darius Kazemireplied to Steffo :deadlock_dynamo: last edited by
@steffo yes actually I'm turning this server into a relay itself! So you can just join the relay in order to opt in. It'll be a kind of fake relay where data flows in (and gets scrubbed) but no data flows out
-
@puppygirlhornypost2 @steffo I'm open to that for future iterations if I can figure out how to do that as a safe opt in. But I'm also big on incremental development so I'm building this part first