I'm a little surprised I can't find more on the mathematics of #SRE.
-
I'm a little surprised I can't find more on the mathematics of #SRE. Things like analysis of service models, QPS required for a valid SLO, applications of the pareto distribution to decide on size of a fault, etc.
Is there some deep resource I'm missing here? I've found a few articles that discuss it superficially, but nothing that goes into significant depth.
-
@hrefna I've been thinking about this for a long time. We need a lot more practical manuals in our field. We've got a lot of introduction materials and reference materials. But specifically technical manuals that describe a specific thing and how to do it.
-
@hrefna
Maybe texts on reliability more generally? There is definitely a history of work here. It had become more specialized with the introduction of component level and link level correction.This definitely feels like something ripe for a textbook treatment, at this point. It's sort of a mix of statistics, signal processing, and software architecture, so far as source fields go. Maybe some diffeq for the queueing sections, if they're needed.
-
@smolwaffle Yeah, I have a lot of texts and papers that tie in peripherally.
There's systems reliability like the kind you analyze for physical engineering projects or safety, but they don't quite work the same way (on an online basis).
There's DSP, of course, which has a lot of tie ins.
There's queuing theory and there's the background in both statistics and probabilistic modeling (things like MCMC).
But no equivalent to the Concrete Math textbook and no cross-disc synthesis I can find.
-
@smolwaffle There is a fair bit of research into _reliability_ generally, but usually not drawing a directed line to SRE-style systems that I can find.
-
@hrefna check out:
- https://www.heinrichhartmann.com/srecon-emea-2023/ (Statistics for Engineers)
- https://www.heinrichhartmann.com/sampling/ (Sampling error calculator) -
Hrefna (DHC)replied to Bartosz last edited by [email protected]
@bocytko Appreciate the links! I am looking for something about 5 levels above that, however, if you know of anything?
I have a firm background in statistics and modeling, I can and have applied MCMC to these domains previously, and I have nontrivial coursework in DSP. So I'm looking for something beyond descriptive statistics and calculating SLO burn.
More like: "given these parameters what is the minimum QPS for this SLO to make sense? What kind of false positive rate can we expect?" etc.
-
@hrefna
Yeah, the systems level stuff is just for older unreliable computers. The modern work is generally lower level AFAIK, and I did years of research in the area of reliability for novel microarchitectures. Of course, I was also intentionally focused on the lower level stuff.Lots of older techniques rediscovered and touted as new. It's very easy to miss the prior work when it's from the 50s and both poorly cited and poorly indexed.
-
@smolwaffle In that genre I am finding some minimally entertaining bits of history like this one: https://ntrs.nasa.gov/api/citations/19730003768/downloads/19730003768.pdf
-