Disruption in NL3 hosting location
Incident Report for TOPdesk SaaS Status page
Postmortem

On June 30th at 11:22 (AM) CEST the first customer reported performance issues while working in their TOPdesk environment. An hour later, several customers had reported performance issues and the issue was escalated to investigate a common root cause.

The TOPdesk SaaS hosting team identified hardware issues as a possible cause, and contacted the hosting provider to verify this theory. At 13:05 (01:05 PM) the hosting provider confirmed there were issues with the router for the NL3 hosting location that caused performance issues.

The problem was communicated to customers via our Status page and My TOPdesk. As there were previous indications the router was nearing it’s maximum capacity, a replacement router had already been ordered and was ready to be taken into production. Together with engineers at the hosting provider, emergency maintenance was scheduled to replace the router.

The router still had to be configured, so we decided to schedule the maintenance at the end of the next day, to allow the hosting provider to properly test the new configuration. Several measures were taken to reduce the load on the router in the meantime. We chose a maintenance time with minimal customer impact where engineers from both teams were available to troubleshoot any issues.

The router was replaced at 22:00 (10:00PM) on Wednesday July 1st. Engineers were available during a late shift on Wednesday, and an early shift on Thursday to remediate any remaining issues, but no further issues were found or reported. During the router replacement customers may have experienced connection issues for a few seconds.

Root cause

The performance issues in the NL3 datacenter were caused by a router that was at its maximum capacity. Replacing the router was already scheduled, but had to be carried out sooner. The replacement router was already ordered, but still had to be configured and taken into production.

Action points

The replacement router will not have any similar capacity issues in the forseeable future. We’ve scheduled the creation of another hosting location in Europe to further reduce the load on the NL3 hosting location infrastructure.

Several possible improvements were identified in the way our internal investigation incident was initially created and communicated. We’re scheduling a refresher training for all teams that might create and publish a major incidents in the future.

Posted Jul 08, 2020 - 14:44 CEST

Resolved
There were no reports of performance issues during the busiest hours on Wednesday. The replacement router is ready and will be installed at 22:00 (10:00PM) CEST today.
Posted Jul 01, 2020 - 15:18 CEST
Update
We've identified the problem and started remediating actions.

Emergency maintenance will be scheduled to resolve the root cause Wednesday evening. We do not expect the issue to reoccur before that time, and we'll continue to monitor the situation.
Posted Jun 30, 2020 - 16:22 CEST
Monitoring
The performance issues in the NL3 datacenter are caused by a router that is at its maximum capacity. Replacing the router was already scheduled, but will now be carried out sooner.

The replacement router was already ordered but still has to be configured. This is currently being executed, and we expect the router to be configured and tested tomorrow.
 
Wednesdays are typically less busy than Tuesdays, and our hosting team will do what they can to further reduce the load on the current router until it has been replaced. The load on the router has already reduced, and we do not expect today's slowness to reoccur tomorrow. Please contact TOPdesk Support if you still notice performance issues.

The router load will be monitored while we prepare to replace the router as soon as possible. Our current plan is to replace the router Wednesday evening. The exact time will be announced in a separate post on our status page.
Posted Jun 30, 2020 - 16:10 CEST
Update
The network issues are still ongoing.

We are still investigating this with our supplier.
Posted Jun 30, 2020 - 15:05 CEST
Update
We are continuing to investigate this issue.
Posted Jun 30, 2020 - 14:22 CEST
Investigating
Customers are experiencing slowness in their TOPdesk SaaS environment.

To inform TOPdesk you are affected by this issue, please visit https://my.topdesk.com/tas/public/ssp/ . Please refer to incident 20 06 8592.
Posted Jun 30, 2020 - 14:22 CEST
This incident affected: NL3 SaaS hosting location.