Resolved -
# Post-Mortem: Service Disruption - Stockholm Data Center
Date: 2026-01-20 Duration: 3 hours 26 minutes (08:53 - 12:19 CET) Impact: Multiple websites in Stockholm data center experienced downtime
## Summary
On 20th of January 2026, a configuration error on one of our shared hosting servers in Stockholm resulted in service disruption for multiple customer websites. All services were fully restored by 12:19 CET with no data loss.
## Timeline (All times in CET)
* 08:53 - Incident began; monitoring systems detected service disruptions * 09:04 - Status page updated; investigation initiated * 09:13 - Root cause identified * 11:44 - Partial restoration completed; subset of websites back online * 12:19 - All services fully restored and operational
## What Happened
During routine operations, critical shared system files were inadvertently removed from our server "ashien" in the Stockholm region. These files are essential for the proper functioning of customer containers in our hosting environment. The deletion immediately impacted multiple customer websites hosted on this server. To recover, we safely shut down the server to access and repair the affected file system. All necessary files were successfully restored from our backup systems.
## Impact
* Affected customers: Multiple websites on our Stockholm server * Service degradation: Complete unavailability for affected sites * Data loss: None - all customer data remained intact and was fully recoverable * Duration: 3 hours 26 minutes from incident start to full resolution
## Root Cause
Human error during routine administrative tasks. The incident occurred because system administrators require elevated privileges to perform necessary maintenance and support operations, which inherently carries risk when handling critical system files.
## What Went Well
* Rapid detection: Issue identified immediately through monitoring * Quick diagnosis: Root cause determined within 12 minutes * Transparent communication: Status page updated immediately with hourly progress updates * Successful recovery: All data recovered without loss * Effective execution: Recovery procedures completed smoothly
## Actions Being Taken
To reduce the likelihood of similar incidents, we are implementing the following measures:
1. Enhanced operational procedures: Reviewing and strengthening our change management processes with additional verification steps for operations affecting shared system components 2. Improved safeguards: Evaluating technical controls and confirmation mechanisms for high-risk operations 3. Training reinforcement: Conducting focused sessions on critical file handling and risk awareness for all team members with elevated system access 4. Documentation updates: Enhancing our runbooks with clearer guidelines for operations on production shared hosting infrastructure
We recognize that despite best efforts, the nature of system administration requires privileged access that will always carry inherent risk. Our focus is on implementing multiple layers of protection - procedural, technical, and human - to minimize the probability of similar incidents.
## Customer Compensation
This incident resulted in 206 minutes of unplanned downtime, bringing our availability for this month to 99.54% - below our guaranteed 99.95% SLA commitment.
We are proactively compensating all affected customers.
Our team will contact each affected customer individually within the next few days with specific details about the credit that will be applied to a future invoice. We believe in taking responsibility when we fall short of our commitments. While our terms allow customers to request availability credits, we've chosen to proactively compensate everyone affected as a demonstration of our commitment to your trust and business. If you have any questions about compensation or this incident, please contact our support team at support@templ.io.
## Closing
We sincerely apologize for the disruption this incident caused. We understand that website availability is critical to your business, and we take this responsibility seriously. We remain committed to providing reliable hosting services and continuously improving our operations to prevent future incidents. If you have any questions about this incident, please don't hesitate to contact our support team.
Jan 20, 13:51 UTC
Monitoring -
All affected websites in our Stockholm data center have been restored and are now operational. We are actively monitoring all services to ensure stability. A detailed post-mortem analysis will be published once we have completed our review. We apologize for the inconvenience and thank you for your patience.
Jan 20, 11:25 UTC
Update -
Some of the affected websites have now been restored. We are working on restoring all websites as soon as possible.
Jan 20, 10:46 UTC
Update -
Recovery efforts are still ongoing for our Stockholm data center. We continue to make progress and will provide yet another update within 1 hour.
Jan 20, 10:18 UTC
Update -
Recovery efforts are ongoing for our Stockholm data center. We're making progress and will provide another update within 1 hour.
Jan 20, 09:12 UTC
Identified -
We have identified the root cause of the outage affecting websites in our Stockholm data center. Our team is now actively working to restore services.
We will provide another update within 1 hour with our progress.
Jan 20, 08:18 UTC
Investigating -
Some websites hosted in our Stockholm data center may experience partial outages. We are currently investigating the issue.
Jan 20, 08:04 UTC