Hi Fediverse denizens. I've been working on a project I hope will help Fediverse devs make software that federates across ALL services, not just Mastodon-plus-a-few-others.

Scott Feeney

@darius please hand off Hometown to another maintainer! It's cool you're moving on to these new, bigger projects, but it's sadly ironic that those of us who liked your ideas about social media the most — and therefore started Hometown servers — are left stuck without updates or bug fixes.

Firecat

@darius great tool but the Mastodon developers do not want to comply with other rules for different federations and activity hub standards. Mastodon will only work in Mastodon. It’s the reason why you won’t see misskey post in Mastodon or pixelfed photos, the Mastodon developers deliberately ignore everything outside the Mastodon community.

Hugh

@darius @djsundog Is this how a new software would be added? e.g. if I wanted the Observatory to include BookWyrm, I'd convince a BookWyrm server admin to join the relay?

Darius Kazemi

@unchartedworlds means a lot coming from someone on the scicomm server!!!

Darius Kazemi

@graue yes, I want to do this and have plans for it, but I needed to get this out first

Darius Kazemi

@hugh @djsundog yes! Though bookwyrm would also need to support relays

Deborah Pickett

@darius How do you counteract poisoning of the data by bad actors? Do you have a blocklist of peers or addresses to discard data from? Can you remove poisoned anonymized datapoints if they’re discovered long after collection?

Does the relay work with peers who have AUTHORIZED_FETCH turned on? Most relays don’t.

The Nexus of Privacy

@darius thanks for taking time time to write it up so thoroughly and get feedback! It seems like a great tool -- I was impressed by the demo at FediForum -- and I really appreciate you thinking so deeply about the privacy aspects and taking a conset-based approach. I certainly hope that this sets the bar for future projects.

Opt-in at the server level makes a lot of sense to me, and I like the specific approach you described in your reply to @djsundog ... it's a mechanism server admins are already familiar with. The discussion of how you can't leverage existing opt-in/opt-out signals makes it clear that trying to do so would compromising user privacy (and also exposes a limitation of the current design -- not your issue but something I hope developers think about).

Scrubbing the data is a great example of data minimization, and the example makes it easy to understand. The exceptions you list all seem very sensible,

A question about the additional opt-out mechanism ... does this do anything more than the admin undoing the opt-in by unsubscribing? If not, then it might be overkill ... although certainly nothing the matter with having an email-based opt-out as well.

Darius Kazemi

@futzle yes to blocklist and yes to removing poisoned data manually

No it doesn't work with authorized fetch, and I assume that a server with that turned on doesn't want my tool anywhere near them anyway

Irenes (many)

@darius okay! this seems solid to us. we do suspect it errs on the side of not capturing quite enough data to solve real compatibility problems, but we're supportive of resolving that iteratively by looking at the data captured this way, then studying what else it's useful to capture one thing at a time.

Darius Kazemi

@thenexusofprivacy @djsundog if I get any data from a server that's opted out via "announce leaking" I will instantly drop it instead of recording it

The Nexus of Privacy

@darius got it. that makes sense then.

The Nexus of Privacy

@darius also with my pedantic hat on, your approach very much aligns with the principles in https://www.cell.com/patterns/fulltext/S2666-3899(23)00323-9, which is good! I'll update my original post to mention that.

Mike [SEC=OFFICIAL]

@darius I'd love to read the blog post but light text, black background is a massive accessibility problem for a lot of people. If you're able to incorporate some detection of user colour theme requirements into the styles for the project and its documentation that'd be extremely helpful. Thanks.

Rimu

@darius @nuintari For example, Lemmy uses Announce totally differently and with a different meaning and intent than Mastodon does. That information cannot be seen by looking at the JSON (the "results" that @nuintari was referring to).

As another example - in Lemmy a Note is a comment on a post and posts are shared over the wire using a Page object.

Still, a great tool and sorely needed. Thank you!

Steffo

@darius opposite question!

will a relay explicitly for opting in be available?

some people may not want to join a public relay due to how heavy they are on server load, but may want to provide data to your project nontheless…

Amber

@[email protected] Hello! I am glad I found this post again, it got lost in my feed. I emailed you a couple of questions. I see you already answered the authorized fetch one. I bring the following counter argument though:

I run a queer instance, and I feel pretty confident with what you’ve shown off so far when it comes to your data “deanonymization" because you’re stripping everything and just changing it to a skeleton highlighting types and the various nested structures. I’m okay with this, but authorized fetch is a protection against instances that pose a threat to my community. I can’t exactly disable it (even though I am completely okay with the collection method you outline) just to opt in. Would you consider looking into that in the future? (I don’t believe you need an entire instance implementation just something that can sign the requests so my instance is happy).

Amber

@[email protected] @[email protected] this is one of the reasons why I suggested perhaps intercepting an instance's job queue. There are certain things that can’t be collected by a relay such as forwarded reports which are still just as important to handle.

Jen :TransButterfly: :3hearts: :Green:

@darius Honestly, you could have just left this at "I'm not assuming that I am entitled to the content of every public post on the federated network" and I'd have already known you were more on the level than most scrapers and bridges. But you seem to have put a lot of effort into bot only not hanging on to the content of every post on the network, but into being transparent about the how and why. I'd give you a thumbs-up react, but most Masto servers don't support it. (So obviously I agree with your stated goal as well. :neobot_giggle:)

One thing that occurred to me is when a fedi instance uses custom forks. For example, Anarres.family uses a customized fork of Glitch Social (itself a fork of Mastodon), and the version number reported on our web interface, and presumably what your software will see, is v4.4.0-alpha.1+glitch.anarres.family, which is a pretty common practice for these forks, though not all of them include the full URL of the instance like ours does. So with your example of "we saw n polls from [software] x.y.z," in our case it would not actually anonymize us because it would display "50 polls by Masto...anarres.family." Not terribly critical since you're planning on scrubbing post data, so there's not much that could be shared that we'd likely want anonymized.

You could get around that by truncating or obfuscating the version, and possibly rolling it in with the equivalent Masto or Glitch version, but that's relevant data; one of the things our fork adds is the emoji reaction code from Chuckya, so our instance is capable of sending and receiving AP messages that are not supported by other Glitch instances.

NeoDB Open Source Software

@darius nice idea. consider opt in with https://eggplant.place

meanwhile it would be nice to share your code url, host name, and nodeinfo once you are ready to open your service. much as you'd like to research others' schema, other admin/dev may want to know more about yours, by looking into nodeinfo and code.