On Tuesday August 25th at 15:13 (03:13 PM) CEST our monitoring system alerted the engineer on duty that part of the environments in the US1 hosting location became unavailable for our customers. The engineer on duty investigated the alerts and noticed several environments reporting as unhealthy.
The SaaS operations team investigated the failing health checks which was caused by connectivity issues to the relational database setup which was still recovering after a short network disruption. At 15:25 (03:25 PM) CEST the database service was responding as expected and most TOPdesk environments were functioning normal again. However, TOPdesk Support got a few reports of customers who were still experiencing error pages or timeouts when trying to open pages inside of their TOPdesk environment. We investigated and found these environments were still experiencing timeouts to the database service. A restart of the TOPdesk application was required to resolve the issue.
At 16:20 (4:20 PM) CEST all affected environments were available once more and we verified the issue was resolved.
Root cause
A small disruption in network routing caused several TOPdesk environments to lose connectivity to its database service which were slow to recover after a short network disruption. After the service was recovered, several environments had to be restarted to restore database connectivity.