RESOLVED: EU1 automated actions delay
Incident Report for TOPdesk SaaS Status page
Postmortem

Incident summary

On the morning of the incident, at approximately 8:30 AM, we began receiving reports from customers about delays in their action sequences or that they were not being processed at all. Upon investigation, it was discovered that two action sequences coming from one instance had become entangled in a continuous loop. This resulted in a queue for actions awaiting to be executed, subsequently leading to the delays reported by several customers hosted on the same data center, EU1.

After a risk assessment, we took the decision to disable the problematic action sequences at 9:54 AM. This decision proved effective as we noticed the queue began to clear shortly after this action was implemented.

We received confirmation of the improvement when multiple customers reported that their actions were being processed as anticipated.

The major was resolved at 10:55 AM. The total time of disruption starting from the first report and the fix was approximately 1 hour and 25 min.

After care

Despite the quick reaction of our team with disabling the problematic action sequences, it was unfortunate that the incident had a negative impact on other customers hosted on the same data center.

We have reached out to the concerned customer to investigate the action sequences and ensure that such an issue does not recur.

Additionally, the team responsible for the action service has already started develop a fairness mechanism system designed to prevent a re-occurrence of such issues. This mechanism will ensure that the actions of one customer do not adversely affect others. It is important to note that this mechanism is currently still in the developing phase. Once the developing phase is concluded and the mechanism is deemed efficient, it will be rolled out across all our hosting locations. At this moment, we do not have an estimation for the release date of this mechanism.

Closing remark

Due to this incident we are extra committed to implementing measures that will prevent any similar occurrences in the future. We appreciate your understanding and patience as we continue to work towards providing a more reliable service.

Posted Feb 08, 2024 - 13:25 CET

Resolved
After monitoring the effect of the taken measures to our systems, we are pleased to confirm that the automated action execution has returned to normal. 
The disruption was caused by an overload in our automated actions queue, which resulted in a delay of action executions being executed.

All previously delayed automated actions have now been executed successfully and our systems have returned to a normal operational state. 
We apologize any inconvenience this disruption might have caused during this period.

We will investigate the root cause and post the Root Cause Analysis (RCA) on the Status Page once this major has been evaluated by our team.
Posted Jan 31, 2024 - 10:55 CET
Update
We have identified the cause and have taken the necessary measures to minimize the impact on the action service.

The action sequences queue should be clearing soon.

We will keep monitoring the situation and post another update in about 30 min.
Posted Jan 31, 2024 - 10:09 CET
Investigating
We are currently experiencing problems with our action service on our EU1 hosting location.

As a result there may be a delay in the automated actions.

We are aware of the problem and are working on a solution.

Our apologies for the inconvenience. At the time of writing this we are not able to give you an estimate on when your environment will be available. We aim to update this status page every 30 minutes until the issue has been resolved.

E-mail updates will be sent when the issue has been resolved. You can subscribe on the status page (https://status.topdesk.com) for additional updates.

To inform TOPdesk you are affected by this issue, please visit https://my.topdesk.com/tas/public/ssp/ . Please refer to incident TDR24 01 9300.
Posted Jan 31, 2024 - 09:20 CET
This incident affected: EU1 SaaS hosting location.