Periodically I see the absolute monster that is the Akkoma timeline query
-
Gabbo the wafrn guy :neocat_floof_devil_256: (not a vampire)replied to Gabbo the wafrn guy :neocat_floof_devil_256: (not a vampire) last edited by
@erincandescentΒ mine used to be worse btw. lots of joins and stuff
-
Erin π½β¨replied to Gabbo the wafrn guy :neocat_floof_devil_256: (not a vampire) last edited by
@gabboman I think the best approach would just be to create an inbox table for each user where each row contained
(user_id, activity_id, object_id, reasons_post_might_be_hidden_maybe)
that it could just slam down and then JOIN in the activity/object data it needs for the frontend -
Jenniferplusplusreplied to Erin π½β¨ last edited by
@erincandescent
me: that's not SO bad
me: wait, there's like 4 tables referenced in that SELECT statement, where are those coming from?
me: *sees read more link*
me: *clicks*
me: *gasps* -
Erin π½β¨replied to Jenniferplusplus last edited by
@jenniferplusplus Iβm sure once upon a time this was an entirely reasonable query
β¦
It is no longer an entirely reasonable query
-
Jenniferplusplusreplied to Erin π½β¨ last edited by
@erincandescent
FWIW, I'm actually pretty convinced that timelines are an obvious use case for a timeseries database, and based on that, Letterbook can partially pre-compute them, with fast insertion and queries. I don't have the actual SQL this generates handy, but it closely resembles the src linq expression(from post in _feeds.Timelines
where post.Time <= before
where post.AudienceId in audienceKeys
where includeShared != (post.SharedBy == null)
select post)
.Take(limit)
.Distinct(); -
Jenniferplusplusreplied to Jenniferplusplus last edited by
@erincandescent At some point, I'll likely need to complicate this a little bit, because that where ... in is less performant than a join, so I'll need to find a break point where it makes more sense to create a temporary join table rather than include a big list of keys. but still
-
Erin π½β¨replied to Jenniferplusplus last edited by
@jenniferplusplus I kinda think you want to precompute each user their own timeline because otherwise you will just end up dealing with so much complexity etc for blocking
(obviously this does mean that you e.g. need to garbage collect entries when someone blocks someone else)
-
Jenniferplusplusreplied to Erin π½β¨ last edited by [email protected]
@erincandescent that's handled via the audience mechanism, so blocking someone removes them from your followers audience, and thus their next timeline query will exclude your posts.
The audience memberships are queried separately, prior to this timeline query.
-
Erin π½β¨replied to Jenniferplusplus last edited by
@jenniferplusplus yes, but this layout means (structurally) youβre scanning everyone you followβs outbox, which becomes an increasingly big gather op as someone follows more people
Alternatively you can do the fan-out at delivery time and then it just becomes one dense index scan
-
Jenniferplusplusreplied to Erin π½β¨ last edited by
@erincandescent the tsdb that powers timelines isn't the canonical source of truth for the app. It only handles timelines and notifications, and everything gets denormalized at write time. Single origin scenarios like populating an outbox use the canonical db.
-
Jenniferplusplusreplied to Jenniferplusplus last edited by
@erincandescent There are a few reasons I don't want to precompute explicit individual timelines for each user. A big one is that I want inactive users to cost as close to zero as possible. So with the exception of cases where the inactive user is the only follower of some remote actor, zero compute is spent on their timelines.
-
Ξ» Natty :butterfly_:β:neofox_lesbian:replied to Erin π½β¨ last edited by
@[email protected] scared to ask where's the part that populates it with all the data the frontend needs
-
Jenniferplusplusreplied to Jenniferplusplus last edited by
@erincandescent You're right that following more people makes for larger gathers. But it's still just a single inner join at query time, because of the denormalization.
-
Erin π½β¨replied to Ξ» Natty :butterfly_:β:neofox_lesbian: last edited by
@natty the entire thing is
-
@jenniferplusplus I think you're falling pray to corpo-think where the desired outcome for your service is an eternally increasing number of users.
In my opinion that's a wrong way to design services for communities of the fediverse.
If you limit the number then you need to care less about the inactive users because you don't need to worry about eternally increasing compute time.
-
@mariusor @jenniferplusplus even on small instances you need to think about scaling because social media is just an enormous mount of traffic; people way underestimate this
-
Jenniferplusplusreplied to Erin π½β¨ last edited by
@erincandescent @mariusor
This.But even if that didn't have an effect, supporting large instances is an explicit goal for the project. There's like 5 billion people who use social media, and approximately all of them use "corpo" services, which I think we generally agree is bad. So if that is ever to change, even community services will need to operate at internet scale. The vast majority of people cannot and will not operate their own communication infrastructure.
-
@jenniferplusplus good luck with that (not in a dismissive way) but building a billion users software stack as one person is pure hubris. (I'll be glad to eat my words though)
-
I think that network traffic should not equate 1-to-1 with the compute power to build a timeline for a user because the two tasks should be somewhat independent: ingestion of activities (the traffic that you mentioned) -> server, building timeline (based on normalized data) -> the client with which the user interacts with the server.
-
@mariusor I hope and intend that this will not be a 1 woman operation forever. But I also don't expect that the social web would become a monoculture. I really just mean that some very large nodes must necessarily exist, and I plan for letterbook to support that.