[14/01/2014] Outage in sf-1 server


  • GNU/Linux Admin

    Status: Resolved


    Please be advised that NodeBB instances hosted on the sf-1 server are currently offline.

    We are investigating the root cause of the problem, and will restore services as soon as possible.

    Update (07:26 EDT): Services have been restored for the NodeBBs hosted on sf-1, temporarily. Instance administration via NodeBB.org has been suspended. Root cause investigation is underway.

    Update (07:50 EDT): The source of the problem has been identified, fix is being applied.

    Update (08:25 EDT): Client software has been patched and administration via NodeBB.org has been re-activated.


    Post Mortem (09:38 EDT)

    A recent update to our backend services caused a restart loop, causing the server to progressively eat up more and more memory as it attempted to maintain instance uptime.

    This loop was triggered because we did not account for client redis databases to be switched offline when a NodeBB was toggled off.

    Once the server ran out of memory, it became unresponsive, as the zombie processes were never cleaned up.

    The following changes have (or will have) been made:

    • The NodeBB team will be re-evaluating our handling of NodeBB instances as they are toggled on and off.
    • We will now maintain an outages & alerts forum for these notices
    • We will be applying a credit to those instances affected, for the time that services were offline

    We apologize for the inconvenience this has caused!

    The NodeBB Team



  • Best to find these bugs out in the early stages when things are still quiet, and you do not need to apply any credit to my account, I understand this is still in beta and as such I expect things to be a bit wonky at times, must admit did go a bit white this morning when it said my instance was not there anymore 😞


  • GNU/Linux Admin

    No worries @StuartH -- glad to have your business, even if it's as rocky as it is!


Log in to reply
 

Suggested Topics

| |