RESOLVED: SaaS disruption NL3
Incident Report for TOPdesk SaaS Status page
Postmortem

On the morning of July 22nd, 2024, at approximately 08:25 CET, we began receiving reports from our customers stating that their TOPdesk environments were unavailable or unreachable. An initial check by our Support department indicated that these reports were all originating from customer environments hosted in the NL3 datacenter, specifically within containers 8 and 26.

Upon further investigation, our engineers determined that the reported issues were potentially caused by a misbehaving host machine. We promptly contacted our provider, Leaseweb, to confirm this suspicion. In the meantime, all environments within containers 8 and 26 were failed over to backup infrastructure at approximately 09:15 CET to mitigate the impact on our customers.

In collaboration with Leaseweb, the faulty host machine was placed into maintenance mode. At around 10:15 CET, after careful consideration, the decision was made to move customers back to the main infrastructure in batches while closely monitoring the situation. The final batch was completed at approximately 15:30 CET.

We are pleased to report that the system remained stable throughout the entire restoration process, and the major disruption was resolved by 16:01 CET. The entire disruption, from the initial report to the completion of the restoration process, lasted approximately 7 hours.

Moving forward, we will continue to work closely with our provider, Leaseweb, as we did during this incident. Their prompt response was crucial in mitigating the disruption, and this level of collaboration will be maintained to swiftly address any future disruptions.

Your understanding and patience are greatly appreciated as we continue to work towards providing a more reliable service.

Posted Aug 07, 2024 - 10:15 CEST

Resolved
We are pleased to report that the issues impacting the NL3 hosting location have been fully resolved, as indicated by SQL metrics.

Our team is currently conducting an in-depth review of this incident. Once this internal analysis is complete, we will publish a comprehensive post mortem on our status page.

If you happen to encounter any further issues, please don't hesitate to get in touch with our support team. We deeply appreciate your patience throughout this disruption and sincerely apologize for any inconvenience it may have caused.
Posted Jul 22, 2024 - 16:01 CEST
Monitoring
We are pleased to announce that the restoration process has successfully completed. The metrics on our SQL server indicates the system is stable.

For added assurance, we will extend our monitoring for an additional 30 minutes. If you encounter any performance issues during this period, please feel free to contact our Support department.
Posted Jul 22, 2024 - 15:29 CEST
Update
The process of restoration is continuing as planned. Based on our monitoring, the system appears stable for now. We will continue to monitor the situation during this process.

Another update will follow as soon as we have more information.
Posted Jul 22, 2024 - 13:39 CEST
Update
Our engineers are currently moving environments back to the main infrastructure in batches, as mentioned in our previous update.

We will keep monitoring the situation during this restore process.

Another update will follow as soon as we have more information.
Posted Jul 22, 2024 - 11:21 CEST
Update
Our engineers have identified that the issue stems from a misbehaving host. Consequently, we have reached out to our provider, Leaseweb, and one of the troubled machines has been subsequently disabled.

In the mean time, we have applied a temporary change by falling back to back-up infrastructure.

At this moment, we are working on restoring everything to normalcy by moving customers back to the main infrastructure in batches while keeping a close eye on our monitoring.

We will post another update as soon as possible.
Posted Jul 22, 2024 - 10:17 CEST
Update
Our engineers are investigating the root cause for the unresponsiveness of TOPdesk environments on NL3 and have applied measures to mitigate the impact by re-allocating resources.

The measures implemented have provided an indication that the situation is improving. We will keep monitoring the situation and post another update as soon as possible.
Posted Jul 22, 2024 - 09:33 CEST
Investigating
We are currently experiencing problems on our NL3 hosting location. As a result your TOPdesk environment may not be available.

We are aware of the problem and are working on a solution.

Our apologies for the inconvenience. At the time of writing this we are not able to give you an estimate on when your environment will be available. We aim to update this status page every 30 minutes until the issue has been resolved.

E-mail updates will be sent when the issue has been resolved. You can subscribe on the status page (https://status.topdesk.com) for additional updates.

To inform TOPdesk you are affected by this issue, please visit https://my.topdesk.com/tas/public/ssp/ . Please refer to incident TDR24 07 6037.
Posted Jul 22, 2024 - 09:10 CEST
This incident affected: NL3 SaaS hosting location.