Incident summary
On the morning of the incident, at approximately 8:30 AM, we began receiving reports from customers about delays in their action sequences or that they were not being processed at all. Upon investigation, it was discovered that two action sequences coming from one instance had become entangled in a continuous loop. This resulted in a queue for actions awaiting to be executed, subsequently leading to the delays reported by several customers hosted on the same data center, EU1.
After a risk assessment, we took the decision to disable the problematic action sequences at 9:54 AM. This decision proved effective as we noticed the queue began to clear shortly after this action was implemented.
We received confirmation of the improvement when multiple customers reported that their actions were being processed as anticipated.
The major was resolved at 10:55 AM. The total time of disruption starting from the first report and the fix was approximately 1 hour and 25 min.
After care
Despite the quick reaction of our team with disabling the problematic action sequences, it was unfortunate that the incident had a negative impact on other customers hosted on the same data center.
We have reached out to the concerned customer to investigate the action sequences and ensure that such an issue does not recur.
Additionally, the team responsible for the action service has already started develop a fairness mechanism system designed to prevent a re-occurrence of such issues. This mechanism will ensure that the actions of one customer do not adversely affect others. It is important to note that this mechanism is currently still in the developing phase. Once the developing phase is concluded and the mechanism is deemed efficient, it will be rolled out across all our hosting locations. At this moment, we do not have an estimation for the release date of this mechanism.
Closing remark
Due to this incident we are extra committed to implementing measures that will prevent any similar occurrences in the future. We appreciate your understanding and patience as we continue to work towards providing a more reliable service.