500 restarts of the kube-apiserver in the span of 5 hours definitely explains the intermittent breakage in the CI system this morning.
-
500 restarts of the kube-apiserver in the span of 5 hours definitely explains the intermittent breakage in the CI system this morning.
-
Johannes Kastlreplied to Scott Williams 🐧 last edited by
@vwbusguy Ouch.
-
Scott Williams 🐧replied to Johannes Kastl last edited by [email protected]
@johanneskastl It was one control plane node in need of cert rotation. Easy to fix, but just kept barely sneaking under the radar of alertmanager and the reprovisioner, which is the worst. I spent all morning looking at Java stack traces thinking it was a client side issue with Jenkins because other stuff kept running. Jenkins wasn't the problem - just got stuck with the unfortunate job of being the messenger where it shouldn't have had to be.
-
Scott Williams 🐧replied to Scott Williams 🐧 last edited by
@johanneskastl It's not a total wasted effort though. In the process of debugging, I was also able to disable a bunch of Jenkins plugins we're no longer using, so at least that will ever so slightly lighten the burden of technical debt.