Time line (CEST)
09:15 Received the first tickets.
10:02 Created a ticket with our infrastructure supplier.
10:15 The issue appeared to be caused by two physical hosts.
10:30 Migrated first set of VMs to different physical host.
10:35 Monitoring indicated that host names of TOPdesk environments were reachable.
10:47 Infrastructure Supplier confirms possible hardware issues.
11:05 Migrated second set of VMs to different physical host.
11:30 Migrated firewall node to different host server.
14:30 Finished migrating first set of VMs back to original physical host.
15:50 Finished migrating second set of VMs back to original physical host.
16:24 All physical hosts were available.
Root cause
Memory issues were found on two physical hosts. The memory modules were replaced by our infrastructure supplier.
Identified points of improvement
Investigate if internal major procedure can be improved.
Communicate with our infrastructure supplier to see if monitoring of failing memory modules can be improved.