Excited to announce that I will be at #fediforum today speed demo-ing my latest project: an ActivityPub data observatory!
-
Emelia πΈπ»replied to Darius Kazemi last edited by
@darius oooh, I see, <uri> isn't a placeholder for an actual value, it's just a indicator of the value type
-
Emelia πΈπ»replied to Darius Kazemi last edited by
@darius I think capabilities comes from either pixelfed or gotosocial?
-
@darius it would probably be very useful to also run the data through a JSON-LD processor and flag e.g. URIs being serialised as strings, undefined properties, etc
-
Emelia πΈπ»replied to Darius Kazemi last edited by
@darius why there's the new security context in mastodon: https://github.com/mastodon/mastodon/pull/31871
I think that's a backport candidate, as the context was used but not actually present in the @context's object
-
Evan Prodromoureplied to Darius Kazemi last edited by
@darius can you compare to browser.pub?
-
Emelia πΈπ»replied to Darius Kazemi last edited by
@darius given it's just collecting the unique shapes of data, I think it's perfectly fine to be opt-out, since there's no user data at all. (assuming you never store the raw data anywhere)
-
@erincandescent yeah! any pointers to things that can help me infer more stuff would be great.
-
Darius Kazemireplied to Evan Prodromou last edited by
@evan yeah. on browserpub I can say "hey help me take a look at these particular messages I know about". This observatory will surface information about stuff floating around the fedi that I don't even know about. For example I am already learning about server software I've never even heard of, and I would not have put that into browser.pub because I wouldn't have known it existed
-
Evan Prodromoureplied to Darius Kazemi last edited by
@darius Ah, OK, interesting. Where does your network tap plug in?
-
Darius Kazemireplied to Emelia πΈπ» last edited by
@thisismissem yes I am just storing the inferred type! I use https://github.com/triggerdotdev/schema-infer
-
Darius Kazemireplied to Evan Prodromou last edited by
@evan still figuring it out. Right now I am subscribing to a public relay as that is the most software-neutral source I could think of, but I am looking at other ingestion methods too. Importantly I want to ingest AP only... I'm not going to hit proprietary API endpoints like most scrapers do
-
Evan Prodromoureplied to Darius Kazemi last edited by
@darius barf, no
-
the bus lane enthusiastreplied to Darius Kazemi last edited by
@darius Hm. Am I reading it right you would be logging that person x made a post with URL y on date z?
that might interfere with some people's want to not have their posts seen off fedi; that info could be used against someone even if they delete it later. "why're you posting while on the clock" fer a basic example.
the "in reply to" field as well might expose the shape of who you talk to in a concerning wayedit: it's clear that I don't get it but will Try again after coffee
-
@darius Are you tracking #goblin? https://indieweb.social/@goblin@goblin.band
-
@tchambers never heard of it!
-
Darius Kazemireplied to the bus lane enthusiast last edited by [email protected]
@t54r4n1 no, I am logging that "some person somewhere but I don't who or where because I threw away that data, made a post with a "URL" field that contains some kind of URL in it but I don't know what because I threw away that data"
I'm not even logging the time something was posted! Just "there is a time field in this and it contains a time but I don't know what time"
-
@t54r4n1 like the second screenshot is the literal data I am recording, so like I am recording the word "<date-time>" instead of an actual date and time
-
-
@darius love this, it might even more useful than a test suite!
-
Jenniferplusplusreplied to Darius Kazemi last edited by
@darius how do you collect it? Do you just follow a bunch of actors on different software?