Periodically I see the absolute monster that is the Akkoma timeline query

@erincandescent mine used to be worse btw. lots of joins and stuff

Erin 💽✨

@gabboman I think the best approach would just be to create an inbox table for each user where each row contained (user_id, activity_id, object_id, reasons_post_might_be_hidden_maybe) that it could just slam down and then JOIN in the activity/object data it needs for the frontend

Jenniferplusplus

@erincandescent
me: that's not SO bad
me: wait, there's like 4 tables referenced in that SELECT statement, where are those coming from?
me: *sees read more link*
me: *clicks*
me: *gasps*

Erin 💽✨

@jenniferplusplus I’m sure once upon a time this was an entirely reasonable query

…

It is no longer an entirely reasonable query

Jenniferplusplus

@erincandescent
FWIW, I'm actually pretty convinced that timelines are an obvious use case for a timeseries database, and based on that, Letterbook can partially pre-compute them, with fast insertion and queries. I don't have the actual SQL this generates handy, but it closely resembles the src linq expression

(from post in _feeds.Timelines
where post.Time <= before
where post.AudienceId in audienceKeys
where includeShared != (post.SharedBy == null)
select post)
.Take(limit)
.Distinct();

Jenniferplusplus

@erincandescent At some point, I'll likely need to complicate this a little bit, because that where ... in is less performant than a join, so I'll need to find a break point where it makes more sense to create a temporary join table rather than include a big list of keys. but still

Erin 💽✨

@jenniferplusplus I kinda think you want to precompute each user their own timeline because otherwise you will just end up dealing with so much complexity etc for blocking

(obviously this does mean that you e.g. need to garbage collect entries when someone blocks someone else)

Jenniferplusplus

@erincandescent that's handled via the audience mechanism, so blocking someone removes them from your followers audience, and thus their next timeline query will exclude your posts.

The audience memberships are queried separately, prior to this timeline query.

Erin 💽✨

@jenniferplusplus yes, but this layout means (structurally) you’re scanning everyone you follow’s outbox, which becomes an increasingly big gather op as someone follows more people

Alternatively you can do the fan-out at delivery time and then it just becomes one dense index scan

Jenniferplusplus

@erincandescent the tsdb that powers timelines isn't the canonical source of truth for the app. It only handles timelines and notifications, and everything gets denormalized at write time. Single origin scenarios like populating an outbox use the canonical db.

Jenniferplusplus

@erincandescent There are a few reasons I don't want to precompute explicit individual timelines for each user. A big one is that I want inactive users to cost as close to zero as possible. So with the exception of cases where the inactive user is the only follower of some remote actor, zero compute is spent on their timelines.

λ Natty :butterfly_::neofox_lesbian:

@[email protected] scared to ask where's the part that populates it with all the data the frontend needs

Jenniferplusplus

@erincandescent You're right that following more people makes for larger gathers. But it's still just a single inner join at query time, because of the denormalization.

Erin 💽✨

@natty the entire thing is

marius

@jenniferplusplus I think you're falling pray to corpo-think where the desired outcome for your service is an eternally increasing number of users.

In my opinion that's a wrong way to design services for communities of the fediverse.

If you limit the number then you need to care less about the inactive users because you don't need to worry about eternally increasing compute time.

@erincandescent

Erin 💽✨

@mariusor @jenniferplusplus even on small instances you need to think about scaling because social media is just an enormous mount of traffic; people way underestimate this

Jenniferplusplus

@erincandescent @mariusor
This.

But even if that didn't have an effect, supporting large instances is an explicit goal for the project. There's like 5 billion people who use social media, and approximately all of them use "corpo" services, which I think we generally agree is bad. So if that is ever to change, even community services will need to operate at internet scale. The vast majority of people cannot and will not operate their own communication infrastructure.

marius

@jenniferplusplus good luck with that (not in a dismissive way) but building a billion users software stack as one person is pure hubris. (I'll be glad to eat my words though)

marius

@erincandescent

I think that network traffic should not equate 1-to-1 with the compute power to build a timeline for a user because the two tasks should be somewhat independent: ingestion of activities (the traffic that you mentioned) -> server, building timeline (based on normalized data) -> the client with which the user interacts with the server.

@jenniferplusplus

Jenniferplusplus

@mariusor I hope and intend that this will not be a 1 woman operation forever. But I also don't expect that the social web would become a monoculture. I really just mean that some very large nodes must necessarily exist, and I plan for letterbook to support that.