SaaS disruption
Incident Report for TOPdesk SaaS Status page
Postmortem

Timeline

On February 11th, 2020 between 20:44 CET and 21:13 CET several environments in Canada and the United Kingdom were unavailable for several minutes.

On February 13th, 2020 between 16:00 CET and 16:37 CET and again between 19:51 CET and 19:56 CET several environments in the United States were unavailable for several minutes.

In both cases no issues could be identified in the TOPdesk SaaS network. We continued to investigate the issue with our Content Delivery Network (CDN) provider.

On February 13th, 2020 at 21:20 we identified that several connections (so-called Argo tunnels) between the CDN provider and our hosting locations had been in a critical state and were rebuilt by existing tooling. Rebuilding several tunnels at the same time caused a high load on the other tunnels, which resulted in traffic not reaching its destination. Customers experienced that TOPdesk environments were slow to respond, or in some cases unreachable.

On February 14th the software used to manage the tunnels was updated, as an update had recently been made available. Since deploying this update, we have not seen several tunnels being rebuilt at the same time. However, the tunnel rebuild issue does still occur sporadically.

The CDN provider has not seen similar connection issues before and is working with us to investigate the problem. As there have not been any more cases where several tunnels fail at the same time, customers no longer experience connection issues.

Follow-up actions

We’ve improved our documentation regarding troubleshooting connection issues in general and are working to update documentation on troubleshooting issues related to the CDN provider connections.

On all proxy servers that receive connections from the CDN network additional logging has been enabled. This helps to troubleshoot connection issues.

The software used to manage the connections with the CDN provider has been updated. This has reduced the frequency at which the problem occurs. We have not seen any indication of connection issues impacting the customer experience since this update, but we’re still working with the CDN provider to remove the issue altogether.

We’re working to improve our monitoring system to allow us to better pinpoint the root cause of connection issues in the future.

Posted Feb 18, 2020 - 13:53 CET

Resolved
Connections have been stable since 19:47 UTC.

We are now closing this major. A full root cause analysis will be posted in due time.
Posted Feb 13, 2020 - 22:04 CET
Update
Investigation at our Content Delivery Network is still ongoing.

Since 18:49 UTC only one small drop in connections at 19:47. Since then the situation is stable.
Posted Feb 13, 2020 - 21:22 CET
Update
Further investigation points to connections being dropped from our Content Delivery Network to TOPdesk resulting in slowness and / or unreachable environments.

We have seen these drops at 15:00, 17:38 and 18:49 UTC for 1-2 minutes.

Investigation is ongoing, updates to follow.
Posted Feb 13, 2020 - 20:37 CET
Investigating
We are currently investigating an issue on one of our SaaS hosting locations. As a result of this issue your TOPdesk environment may not be available.
We are aware of the problem and working on a solution.

Our apologies for the inconvenience. We aim to update this status page at least every 30 minutes until the issue has been resolved.

If you are affected by this issue, please visit https://my.topdesk.com/tas/public/ssp/ to indicate you are affected.
Please refer to incident 20 02 4468.
Posted Feb 13, 2020 - 20:08 CET
This incident affected: US1 SaaS hosting location.