Unplanned outage

Incident Report for OfficeRnD

Resolved

What Happened
On 23rd of March 2018 we've experienced an unplanned outage of the platform.
It was caused by a combination of factors - an increased usage of the platform and a system failure during the daily backup process. These events led to a memory depletion of one of our servers, which resulted in.

Corrective Measures
To ensure minimal disruption and immediate incident resolution we took the following measures:
* We've extended our system monitoring and outage notification systems to better detect downtime and to immediately alert the incidence response team.
* We've optimized our backend systems to minimize data and processing load over the server infrastructure.
* We've adjusted the backup procedure schedule to better align with the times of heavier system usage.
* We've outlined the infrastructure and process enhancements that should be added to the OfficeR&D backend implementation to significantly improve system resilience and availability. We will be working on those over the next months and releases.

Posted Mar 23, 2018 - 14:00 EET