SaaS disruption
Incident Report for TOPdesk SaaS Status page
Postmortem

Timeline
- 11:20 TOPdesk Support receives phone calls about TOPdesk being slow or not reachable.
- 11:21 The SaaS team found an issue with the authentication service.
- 11:25 Mitigated the issue by up-scaling parts of the authentication service.

Root cause:
Due to high load on NL3 and a lack of connections in the authentication service new connections to various parts became very unreliable. The system attempted automated restarts for parts of the service, this succeeded however these parts could not immediately be used for new connections. The parts which did not come back up overloaded the rest of the system which caused the problems on the user side.

Points of improvement
We want to be alerted when critical parts crash or when health/readiness endpoints fail. Investigation afterwards learned us that there was already something in the authentication service monitoring earlier that morning; automated restarts fixed these problems before the monitoring showed them. An internal incident has been created to adjust the monitoring.

Posted Nov 12, 2019 - 19:42 CET

Resolved
All TOPdesk environments are available since 11:28 CEST.

If you still experience any problems while working in TOPdesk, please contact TOPdesk Support.
Posted Oct 07, 2019 - 12:32 CEST
Update
We are continuing to monitor for any further issues.
Posted Oct 07, 2019 - 11:35 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 07, 2019 - 11:34 CEST
Investigating
We are currently investigating an issue on one of our SaaS hosting locations. As a result of this issue your TOPdesk environment may not be available.
We are aware of the problem and working on a solution.

Our apologies for the inconvenience. We aim to update this status page at least every 30 minutes until the issue has been resolved.

If you are affected by this issue, please visit https://extranet.topdesk.com/tas/public/ssp/ to indicate you are affected.
Please refer to incident 19 10 1850.
Posted Oct 07, 2019 - 11:33 CEST
This incident affected: NL3 SaaS hosting location.