So does anyone have a good guide on what I am actually looking at in the Sharkey dashboard?
-
So does anyone have a good guide on what I am actually looking at in the Sharkey dashboard? I'm assuming all the red numbers getting bigger overtime mean something, but I can't find anything anywhere about it.
-
Amber ๐ธreplied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] show me what you mean
-
Aurora ๐ณ๏ธโ๐replied to Amber ๐ธ last edited by
@[email protected] The Inbox Queue and Deliver queue Delayed numbers keep going up slowly over time, but idk if that means something is wrong or that my server is just a little slow or something.
-
Amber ๐ธreplied to Aurora ๐ณ๏ธโ๐ last edited by [email protected]
@[email protected] Thatโs complicated. It depends on the context really. 1,000 delayed jobs? For a single user instance thatโs horrifying. For transfem.social thatโs a bit on the lower end because we have such wide federation that thereโs always a ton of small to big servers offline that prevent jobs from being sent out or inbox jobs from processing. It also depends on how quickly it is rising, in the case of our dos I noticed it rising pretty damn fast up to 80,000 delayed jobs within a span of an hour. Thatโs pretty fucking scary even with as much traffic as we have.
-
Aurora ๐ณ๏ธโ๐replied to Amber ๐ธ last edited by
@[email protected] Could it be because we did our huge migration and some of that stuff is still settling out? It seems like that was the biggest traffic we got by far.
I'm not sure what the bottleneck would be here, my cpu is running at ~10% and I've got plenty of ram.
I was running the server off wifi until recently, maybe that contributed to the issue?
It's not rising very much now, maybe 100 added today. -
Amber ๐ธreplied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] id have to see your queues.
-
Aurora ๐ณ๏ธโ๐replied to Amber ๐ธ last edited by
@[email protected] I posted the one image above, do you need a screenshot of something else? I'm not sure what is sensitivity to send openly on Fedi tbh, but I doubt analytics are.
-
Amber ๐ธreplied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] thatโs perfectly fine, just got the edit. thatโs absolutely normal
-
Aurora ๐ณ๏ธโ๐replied to Amber ๐ธ last edited by
@[email protected] Oh okay, great. It does seem to be going down now too, only about ~600 delayed as of a couple minutes ago.
-
Amber ๐ธreplied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] the job queues are hard to wrap your head around because itโs a bunch of independent variables.
1) Worker configuration (yes)
2) Database latency (yes, including concurrent connections because too many connections overloads the database and it responds slower)
3) other servers (outbound jobs build up when it fails to deliver, so even if everything is good on your side a remote instance might be unable to accept that job and itโll queue it to retry later)
4) again remote instances, if you have an activity thatโs a reply to another note the instance wants to fetch the original note and if it canโt then the job will be delayed. so instance A might be fine, but instance B which made the note that a user on instance A replied to is stopping the job from being processed
5) cpu, although thatโs the least likely. Jobs can be gridlocked by other instance tasks such as the media proxy which does some on the fly server sided work to process images. Itโs how the second dos attack on transfem.social worked. They spammed our media proxy, requiring the instance to process the same image over and over again which made 8 cpu cores go to 100%. There is no sharkey side caching for that so to account for it before the patches we made HAProxy cache the media proxy so the conversion wasnโt done hundreds of times per second for the same image. -
@[email protected] source? Iโm the head administrator (and one of the main sysadmins) of transfem.social and a sharkey developer/contributor (although I mostly serve an administrative role)
-
@[email protected] (just for people seeing this thread in passing lol)
-
Aurora ๐ณ๏ธโ๐replied to Amber ๐ธ last edited by
@[email protected] Yeah I'm not worried about any source for you lmao. You could tell me the delay is because I don't have enough blahaj in proximity to the server and I would probably try it.
-
@[email protected] oh also, the api itself can gridlock the instance but you donโt see that until youโre this size. If we ran sharkey in the unified configuration (default, db workers and frontend+api in the same process) itโd have imploded in on itself.
-
ash nova :neocat_flag_genderfluid:replied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] @[email protected] hehe that sounds like good advice actually, make sure the server is comfy
-
Aurora ๐ณ๏ธโ๐replied to ash nova :neocat_flag_genderfluid: last edited by
@[email protected] @[email protected] Server is very comfy don't worry, it has its own blanket keeping it nice and warm. I even activated the RGB for extra 10% ram just in case.
-
Amber ๐ธreplied to Amber ๐ธ last edited by [email protected]
@[email protected] and that gridlocking isnโt bypassable by throwing more CPUs at it I already tried that you have to put on a big girl face and set the variable MK_SERVER_ONLY (iirc? Iโd have to double check) and MK_WORKER_ONLY.
-
ash nova :neocat_flag_genderfluid:replied to Amber ๐ธ last edited by
@[email protected] @[email protected] incidentally mine is also split and a bit overscaled but that's mostly because I can, not because I need it strictly
-
Amber ๐ธreplied to ash nova :neocat_flag_genderfluid: last edited by
@[email protected] @[email protected] thereโs another level where you use haproxy to send websocket traffic to its own MK_SERVER_ONLY node based on /streaming & HTTP/1.1 -> websocket negotiation. This isnโt possible with sharkey config but you can use middleware. Thereโs also doing this but matching on headers "Accept: application/ld+json" (and other content types like activity+json) to route federation traffic to its own nodeโฆ
-
Gwen, the kween fops :neofox_flag_trans: :sheher:replied to Aurora ๐ณ๏ธโ๐ last edited by
@[email protected] @[email protected] @[email protected] I can't get over putting a blanket on your server to keep it warm lmaaaaao