Post-Mortem on the Outage of May 12, 2026
Dear Sir or Madam,
As you have certainly noticed, all Stackfield systems were largely unavailable yesterday between 7:15 AM and 11:20 PM. We are fully aware of the significant impact such an outage has on your daily work, and we deeply regret the inconvenience caused.
With a duration of approximately 16 hours, this was by far the longest outage in our company’s 14-year history.
What happened?
Immediately at 7:16 AM, we began investigating the outage internally and quickly determined that the issue originated on the side of our infrastructure provider, IONOS. After contacting them, it became clear that there was a problem at the storage layer. Initially, IONOS was able to resolve the issue through software-level changes, which brought our systems and Stackfield fully back online by 11:20 AM. However, at around 1:05 PM, further disruptions occurred, which proved to be significantly more severe and resulted in the complete shutdown of the cluster at the Karlsruhe location.
Based on the information currently available to us, the cause was an extensive hardware failure affecting the entire cluster rather than the failure of a single component, making comprehensive recovery work necessary. After these recovery efforts were completed, our services became fully available again at 11:20 PM.
All of our systems are designed with multiple layers of redundancy: hard drives are mirrored four times across different fire zones, and individual machines are replicated both locally and at a secondary location. Under normal circumstances, this ensures that hardware failures are not noticeable to users. In this case, however, the hardware failure had widespread effects across the infrastructure, making the outage unavoidable.
The primary issue was not the failure of isolated components, but rather the failure of the entire storage cluster, which required 14 of our servers to be shut down. We are currently working closely with IONOS to fully investigate the exact root causes of the incident. The complexity of the recovery process is also reflected in the official IONOS incident report: https://status.ionos.cloud/incidents/p6pjqxzgkh1g
Unfortunately, proactive email notifications were not possible because our platform systems are closely interconnected, and the email service itself was also affected by the outage. However, we are committed to improving this for the future. During the incident, we were therefore only able to provide updates directly through our support team via phone and email, as well as through our status page at https://status.stackfield.com.
What measures will we take?
We will conduct a thorough investigation of this incident together with IONOS down to the smallest detail. Such an outage is absolutely unacceptable to us, and we will draw clear conclusions from it. In addition, we will improve the visibility of our status page and further enhance the email systems connected to the platform so that we can respond more effectively in similar situations.
At this point, I would also like to explicitly emphasize that at no time was there any risk of data loss, and that our entire team dedicated the whole day to resolving the incident.
Incidents like this are incredibly frustrating, but events of this complexity are extremely rare. This is precisely why we maintain a comprehensive security and redundancy concept in order to assess and mitigate potential risks. However, as demonstrated by numerous incidents across our industry, it is unfortunately impossible to protect against every conceivable scenario, which is why outages lasting many hours or even days are known to occur (such as Microsoft’s 25-hour outage some time ago).
For us, it is therefore essential to ensure that such problems do not recur and to implement targeted measures that continuously minimize downtime and improve communication with you.
I would like to sincerely apologize for the inconvenience caused and thank you for your understanding.
Kind regards,
Cristian Mudure, CEO