On 6th of September 2018 we've experienced an unplanned outage of the platform for roughly 30 minutes.
It was caused by a combination of factors - an increased usage of the platform and memory depletion of one of our servers, which resulted in an outage of the service.
The platform was up and running again in about half an hour.
After analyzing the cause of the issue we've planned the following improvements to prevent further issues like this ones:
* Improve platform server infrastructure and monitoring to be more resilient and report problems with greater details
* Ensure heavy operations are differed and executed in smaller batches