The current state of context resolution

julian

tl;dr — conversation backfill and synchronization via resolvable context; potential FEP.

This topic is an extension of an earlier discussion: How do you use context (if at all)?

We came out of May's ForumWG meeting with a sense that pursuing formalisation of the context property was a step in the right direction. I later built out a resolvable context collection as part of this effort.

Currently, if you are given a standalone activitypub object, you might not have any or all of the conversation surrounding it. That's part-and-parcel of the design of ActivityPub — that content is pushed to various federated instances, as opposed to one centralized authority —but is a source of some concern as end-users continually remark on how various instances have different reply sets, and worse yet, even the original site may not have the entire conversation.

I can hear @[email protected] now:

"ActivityPub is a push and pull-based API!!" — Evan Prodromou

Agreed! Although, while you can pull public objects via ActivityPub, you can't pull said objects if you don't know they exist. Here are your options for building/resolving any single object's conversational context:

You may opt to do nothing (and the object is standalone; not ideal).
You may traverse up the inReplyTo chain and build out one direct thread of replies (better).
- N.B. for security, it is best to limit the traversal to an arbitrary maximum
New — you may query the object's context property, and if resolving to a (Ordered)Collection, build out the entire conversational context — including all conversational sub-trees — in one fell swoop.

New this week is a proof-of-concept implementation of a "context synchronization" mechanic. Using similar mechanics to Mastodon's FEP-8fcf (Followers collection synchronization across servers), I propose servers can compute a digest for a context collection via its object ids, and serve them using the common ETag header. Recipients may opt to calculate their own digest and begin backfill on digest mismatch. Optionally, the If-None-Match header containing that digest can be sent, allowing the origin server to respond with an even simpler 304 Not Modified.

Technical details re: topic synchronization.

Backfill and sync are both still limited availability; only NodeBB supports them currently. However, I'm working with Angus (building out the Discourse AP integration) to expand support, and I'd like to eventually publish an FEP and SocialCG report to make this all pseudo-official.

We intend to discuss our research at this month's ForumWG (August 1st; 1300 EDT), join us and let's see where this goes!

Jon

@[email protected] very interesting work, it's certainly addressing an important problem!

silverpill

@julian

>and serve them using the common ETag header

Nice. Does it eliminate the need for a custom Collection-Synchronization header?

(initially replied from SocialHub but my reply didn't federate)

[email protected]

julian:

and serve them using the common ETag header

Nice. Does it eliminate the need for a custom Collection-Synchronization header?

julian

@[email protected] it wasn't readily apparent in FEP 8fcf why a bespoke header was used instead of ETag. If I had to conjure up a rationale, it would be because an ETag is explicitly tied to a resource, but follower synchronization digests differ depending on calling user (different follower sets, etc.)

In this case however, since topic synchronization deals only with publicly addressable content, and is tied to the context, we sidestep that complication and I opted to use the commonly seen ETag header.

Also thanks for the heads up re: socialhub federation (or lack thereof), yay for regressions!

Michael Foster

@julian This looks really interesting! Great you are addressing this. cc @newsmast

silverpill

@julian

I implemented fetching of context (manual) - my server simply reads the latest N items from the collection.
While working on that I realized that synchronization can be done differently. If context collection contains activities that modify it (such as Add and Remove), in reverse chronological order, the client can re-construct the current state by fetching them and applying one by one. There is no need to compute digests with this approach, remembering latest activity ID would be enough.

silverpill

@julian Even if context contains objects and not activities, synchronization can be done by requesting all activities where Activity.target == Object.context (from the outbox perhaps?)

Alex mehr

So, I set up a way to fetch the latest stuff from the server by just grabbing the last N items from the collection. But then I thought of a cooler way to sync things. If your context collection has actions that change things (like Add or Remove), the client can rebuild the current state by getting these actions in reverse order and applying them one by one. You don’t even need to mess with digests—just keep track of the latest activity ID and you’re good to go.

julian

@[email protected] @Alex-mehr You both seem to have come up with similar solutions at roughly the same time!

Some thoughts:

There's no guarantee that a collection would present items in chronological vs. reverse chronological order — are you checking the timestamps and reversing as needed?
Wouldn't you need to paginage through the entire collection anyway?
My context contains only objects (e.g. as:Note, as:Article, etc.) so there wouldn't be any activities for you to actually consume
- However, this is not set in stone. @[email protected] advocated for objects in the context collection, but activities in the context outbox, which could also work.
The idea behind serving a digest in ETag header is simply to provide a means to quickly determine whether your collection of objects is up-to-date. The header is served when the context is requested, and the hashing can be done locally. If a match is found, then you avoid any additional network calls.

How the context is synchronized is actually implementor dependent. So if what works for you is to look at the activities and re-construct based on ID, then that's great (assuming activities are even provided by the context)! If you'd prefer to just re-iterate through the entire collection, that's great too.

julian

@[email protected] @Alex-mehr Admittedly I have a blind spot when it comes to activities.

NodeBB doesn't actually track activities it receives, it only processes objects and activity IDs are really just generated on-the-fly. I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.

silverpill

@julian @alex-mehr @trwnh

>There's no guarantee that a collection would present items in chronological vs. reverse chronological order — are you checking the timestamps and reversing as needed?

The ordering can be specified by some property of Collection

>Wouldn't you need to paginage through the entire collection anyway?

The client will fetch pages until it finds an item that has already been processed.

> I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.

I'd prefer context to be a collection of objects too, as long as there's a way to retrieve activity history.

Activity-based sync seems more natural to me. I think ActivityPub can be better understood not as a protocol for social networking, but as a distributed database where nodes sync datasets by sending messages over the network. Messages are activities, datasets are collections. When I send a Follow activity and your server responds with an Accept, followers and following collections are updated on both sides (or their equivalents if you don't store activities and collections). More generally, any activity delivery can be viewed as a synchronization of outbox collection.

I think such change of perspective can greatly improve DX and provide a solid foundation for further protocol extensions