IBM’s cloud crisis deepens: 54 services disrupted in latest outage

IBM Cloud suffered its second major outage this week on Wednesday, once again disrupting essential services and leaving customers worldwide unable to log in or manage their resources.

The incident, which lasted for over four hours, began at 8:55 AM UTC and was resolved by 1:20 PM UTC on Wednesday. This time, it disrupted 54 services, including critical components like IBM Cloud itself, AI Assistant, Account Management, Activity Tracker Event Routing, Cloud Monitoring, Security and Compliance Center, and Watson Discovery.

Once again, IBM Cloud users were locked out, unable to log in via the console, the CLI, or the API. The incident resulted in customers being unable to manage or provision essential cloud resources. Furthermore, the incident caused IAM authentication failures and even disrupted access to the support portal, making it impossible to open or track support cases.

According to IBM’s status update report, the company started investigation and mitigation efforts, and by 12:54 PM UTC, the majority of impacted services had been fully recovered, with the exception of Cloud Object Storage, Secrets Manager, and Container Registry. By 1:50 PM UTC, IBM reported the complete recovery of its Cloud services in a controlled manner.

Deeper flaws and security concerns

This recent incident follows two other significant outages in quick succession. A May 20 incident lasted two hours and ten minutes, affecting 14 services. On June 2, another IBM Cloud outage occurred, which lasted much longer and disrupted 41 services.

“Repeated security and availability issues in IBM Cloud suggest deeper flaws in its security architecture and incident response protocols. The recurrence points to unresolved root causes and possible gaps in resilience design, such as inadequate failover systems and weak infrastructure segmentation. Persistent vulnerabilities may stem from poor patch management, misconfigurations, and weak threat detection,” said Manish Rawat, analyst, TechInsights.

Rawat said IBM’s incident response appears slow and ineffective, hinting at procedural or resource limitations. The situation also raises concerns about IBM Cloud’s adherence to zero trust principles, its automation in threat response, and the overall enforcement of security controls.

“The recent IBM Cloud outages are part of a broader pattern of modern cloud dependencies being over-consolidated, under-observed, and poorly decoupled. Most enterprises — and regulators — tend to scrutinise cloud strategies through the lens of data sovereignty, compute availability, and regional storage compliance. Yet it is often the non-data-plane services—identity resolution, DNS routing, orchestration control — that introduce systemic exposure,” said Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research.

Gogia said this blind spot is not unique to IBM. Similar disruptions across other hyperscalers — ranging from IAM outages at Google Cloud to DNS failures at Azure — illustrate the same lesson: resilience must include architectural clarity and blast radius discipline for every layer that enables platform operability.

Such frequent outages can trigger immediate compliance alarms and lead to reassessments in tightly regulated industries like banking, healthcare, telecommunications, and energy, where even brief disruptions carry serious risks.

IBM did not immediately respond to a request for comment.

However, adding to the concerns, IBM had issued a security bulletin stating its QRadar Software Suite, its threat detection and response solution, had multiple security vulnerabilities. These included issues like a failure to invalidate sessions post-logout, which could lead to user impersonation, and a weakness allowing an authenticated user to cause a denial of service through to improperly validating API data input. To maintain security, IBM advised customers to update their systems promptly.

Source:: Network World