SaaS disruption
Incident Report for TOPdesk SaaS Status page
Postmortem

Timeline

On February 11th, 2020 between 20:44 CET and 21:13 CET several environments in Canada and the United Kingdom were unavailable for several minutes.

On February 13th, 2020 between 16:00 CET and 16:37 CET and again between 19:51 CET and 19:56 CET several environments in the United States were unavailable for several minutes.

In both cases no issues could be identified in the TOPdesk SaaS network. We continued to investigate the issue with our Content Delivery Network (CDN) provider.

On February 13th, 2020 at 21:20 we identified that several connections (so-called Argo tunnels) between the CDN provider and our hosting locations had been in a critical state and were rebuilt by existing tooling. Rebuilding several tunnels at the same time caused a high load on the other tunnels, which resulted in traffic not reaching its destination. Customers experienced that TOPdesk environments were slow to respond, or in some cases unreachable.

On February 14th the software used to manage the tunnels was updated, as an update had recently been made available. Since deploying this update, we have not seen several tunnels being rebuilt at the same time. However, the tunnel rebuild issue does still occur sporadically.

The CDN provider has not seen similar connection issues before and is working with us to investigate the problem. As there have not been any more cases where several tunnels fail at the same time, customers no longer experience connection issues.

Follow-up actions

We’ve improved our documentation regarding troubleshooting connection issues in general and are working to update documentation on troubleshooting issues related to the CDN provider connections.

On all proxy servers that receive connections from the CDN network additional logging has been enabled. This helps to troubleshoot connection issues.

The software used to manage the connections with the CDN provider has been updated. This has reduced the frequency at which the problem occurs. We have not seen any indication of connection issues impacting the customer experience since this update, but we’re still working with the CDN provider to remove the issue altogether.

We’re working to improve our monitoring system to allow us to better pinpoint the root cause of connection issues in the future.

Posted Feb 18, 2020 - 13:53 CET

Resolved
The connection issues have been resolved.
Posted Feb 11, 2020 - 22:30 CET
Monitoring
The connectivity issues we have seen seem to be resolved. We are currently monitoring the situation.
Posted Feb 11, 2020 - 22:18 CET
Update
We are continuing to investigate the issue with our suppliers.
Posted Feb 11, 2020 - 22:08 CET
Investigating
We are currently investigating an issue on one of our SaaS hosting locations. As a result of this issue your TOPdesk environment may not be available.
We are aware of the problem and working on a solution.

Our apologies for the inconvenience. We aim to update this status page at least every 30 minutes until the issue has been resolved.

If you are affected by this issue, please visit https://my.topdesk.com/tas/public/ssp/ to indicate you are affected.
Please refer to incident 20 02 3485.
Posted Feb 11, 2020 - 21:38 CET
This incident affected: CA1 SaaS hosting location.