Hi Fediverse denizens. I've been working on a project I hope will help Fediverse devs make software that federates across ALL services, not just Mastodon-plus-a-few-others.
-
@[email protected] @[email protected] this is one of the reasons why I suggested perhaps intercepting an instance's job queue. There are certain things that can’t be collected by a relay such as forwarded reports which are still just as important to handle.
-
Jen :TransButterfly: :3hearts: :Green:replied to Darius Kazemi last edited by
@darius Honestly, you could have just left this at "I'm not assuming that I am entitled to the content of every public post on the federated network" and I'd have already known you were more on the level than most scrapers and bridges. But you seem to have put a lot of effort into bot only not hanging on to the content of every post on the network, but into being transparent about the how and why. I'd give you a thumbs-up react, but most Masto servers don't support it. (So obviously I agree with your stated goal as well. :neobot_giggle:)
One thing that occurred to me is when a fedi instance uses custom forks. For example, Anarres.family uses a customized fork of Glitch Social (itself a fork of Mastodon), and the version number reported on our web interface, and presumably what your software will see, is
v4.4.0-alpha.1+glitch.anarres.family
, which is a pretty common practice for these forks, though not all of them include the full URL of the instance like ours does. So with your example of "we saw n polls from [software] x.y.z," in our case it would not actually anonymize us because it would display "50 polls by Masto...anarres.family." Not terribly critical since you're planning on scrubbing post data, so there's not much that could be shared that we'd likely want anonymized.You could get around that by truncating or obfuscating the version, and possibly rolling it in with the equivalent Masto or Glitch version, but that's relevant data; one of the things our fork adds is the emoji reaction code from Chuckya, so our instance is capable of sending and receiving AP messages that are not supported by other Glitch instances.
-
NeoDB Open Source Softwarereplied to Darius Kazemi last edited by
@darius nice idea. consider opt in with https://eggplant.place
meanwhile it would be nice to share your code url, host name, and nodeinfo once you are ready to open your service. much as you'd like to research others' schema, other admin/dev may want to know more about yours, by looking into nodeinfo and code.
-
@darius cool! I didn't dig in too deeply, so maybe you cover this, but a related helpful feature/tool might be knowing which software (libraries, implementations) would be able to parse data/schemas. getting in to https://caniuse.com/ territory, or things like ACID.
also often helpful if folks (eg, devs) can paste in data (JSON) and see how it validates, using the exact code being used to "observe", without that being captured and included in database.
-
Darius Kazemireplied to The Nexus of Privacy last edited by
@thenexusofprivacy thank you! I've been talking with Rob a lot
-
@ireneista 100% correct! That's the idea. Release a version that's very safe, see how useful it is, figure out the gaps, figure out how to safely address those gaps, do another public comment period, rinse and repeat
-
@steffo yes actually I'm turning this server into a relay itself! So you can just join the relay in order to opt in. It'll be a kind of fake relay where data flows in (and gets scrubbed) but no data flows out
-
@puppygirlhornypost2 @steffo I'm open to that for future iterations if I can figure out how to do that as a safe opt in. But I'm also big on incremental development so I'm building this part first
-
@bnewbold yeah for sure, if I can create something lile caniuse that would be huge! That's been a goal of mine the whole time really though I'm not there yet
-
Darius Kazemireplied to NeoDB Open Source Software last edited by
@neodb correct and I will do so (I already have nodeinfo working)
-
Darius Kazemireplied to Jen :TransButterfly: :3hearts: :Green: last edited by
@SymTrkl yeah I've already noticed that! I considered scrubbing stuff past the + in the semver (so no additional data like most recent commit hash etc) but also kind of the point here is to find patterns from edge cases.
I wonder though, maybe what I'll do is anonymize extra metadata in the semver for enumerated software for a given schema until there are at least N servers emitting that schema.
-
@puppygirlhornypost2 I might be able to make it work as an opt in relay that DOES work with authorized fetch. We'll see
-
Veronika Cheplyginareplied to Darius Kazemi last edited by
@darius I think I support this but the scrolling thing on the side makes it difficult to read the thing
-
Darius Kazemireplied to Mike [SEC=OFFICIAL] last edited by
@mike sorry, I don't control the lab website and I'm salty about some of our choices. Will pass this on to the people in charge
-
Darius Kazemireplied to Veronika Cheplygina last edited by
@DrVeronikaCH I would like it to go away too. I agree with you
-
@DrVeronikaCH there is a pause button on the lower right of the page that might help?
-
@darius There seems to be an overlap between this and https://funfedi.dev/support_tables/
Perhaps you can contribute to that project? It seems to be privacy-respecting because samples are generated locally
-
@silverpill yes! It came in my radar last week and I hope to contribute to their cc0 data as well
-
@darius What safeguards are there to ensure someone can't maliciously insert code that reveals the contents of posts and other PII? Trust?
-
@fembot yeah I'm running the code and you have to trust that I'm not putting evil stuff into it. Just like any other Fediverse server that way (I'm currently trusting that this message isn't getting used by your server to malicious ends)