Partial outage

Incident Report for Templ

Postmortem

Post-Mortem: Service Disruption - Stockholm Data Center

Date: 2026-01-20
Duration: 3 hours 26 minutes (08:53 - 12:19 CET)
Impact: Multiple websites in Stockholm data center experienced downtime

Summary

On 20th of January 2026, a configuration error on one of our shared hosting servers in Stockholm resulted in service disruption for multiple customer websites. All services were fully restored by 12:19 CET with no data loss.

Timeline (All times in CET)

  • 08:53 - Incident began; monitoring systems detected service disruptions
  • 09:04 - Status page updated; investigation initiated
  • 09:13 - Root cause identified
  • 11:44 - Partial restoration completed; subset of websites back online
  • 12:19 - All services fully restored and operational

What Happened

During routine operations, critical shared system files were inadvertently removed from our server "ashien" in the Stockholm region. These files are essential for the proper functioning of customer containers in our hosting environment.
The deletion immediately impacted multiple customer websites hosted on this server. To recover, we safely shut down the server to access and repair the affected file system. All necessary files were successfully restored from our backup systems.

Impact

  • Affected customers: Multiple websites on our Stockholm server
  • Service degradation: Complete unavailability for affected sites
  • Data loss: None - all customer data remained intact and was fully recoverable
  • Duration: 3 hours 26 minutes from incident start to full resolution

Root Cause

Human error during routine administrative tasks. The incident occurred because system administrators require elevated privileges to perform necessary maintenance and support operations, which inherently carries risk when handling critical system files.

What Went Well

  • Rapid detection: Issue identified immediately through monitoring
  • Quick diagnosis: Root cause determined within 12 minutes
  • Transparent communication: Status page updated immediately with hourly progress updates
  • Successful recovery: All data recovered without loss
  • Effective execution: Recovery procedures completed smoothly

Actions Being Taken

To reduce the likelihood of similar incidents, we are implementing the following measures:

  1. Enhanced operational procedures: Reviewing and strengthening our change management processes with additional verification steps for operations affecting shared system components
  2. Improved safeguards: Evaluating technical controls and confirmation mechanisms for high-risk operations
  3. Training reinforcement: Conducting focused sessions on critical file handling and risk awareness for all team members with elevated system access
  4. Documentation updates: Enhancing our runbooks with clearer guidelines for operations on production shared hosting infrastructure

We recognize that despite best efforts, the nature of system administration requires privileged access that will always carry inherent risk. Our focus is on implementing multiple layers of protection - procedural, technical, and human - to minimize the probability of similar incidents.

Customer Compensation

This incident resulted in 206 minutes of unplanned downtime, bringing our availability for this month to 99.54% - below our guaranteed 99.95% SLA commitment.
We are proactively compensating all affected customers. Our team will contact each affected customer individually within the next few days with specific details about the credit that will be applied to a future invoice.
We believe in taking responsibility when we fall short of our commitments. While our terms allow customers to request availability credits, we've chosen to proactively compensate everyone affected as a demonstration of our commitment to your trust and business.
If you have any questions about compensation or this incident, please contact our support team at support@templ.io.

Closing

We sincerely apologize for the disruption this incident caused. We understand that website availability is critical to your business, and we take this responsibility seriously.
We remain committed to providing reliable hosting services and continuously improving our operations to prevent future incidents.
If you have any questions about this incident, please don't hesitate to contact our support team.

Posted Jan 20, 2026 - 13:52 UTC

Resolved

# Post-Mortem: Service Disruption - Stockholm Data Center

Date: 2026-01-20
Duration: 3 hours 26 minutes (08:53 - 12:19 CET)
Impact: Multiple websites in Stockholm data center experienced downtime

## Summary

On 20th of January 2026, a configuration error on one of our shared hosting servers in Stockholm resulted in service disruption for multiple customer websites. All services were fully restored by 12:19 CET with no data loss.

## Timeline (All times in CET)

* 08:53 - Incident began; monitoring systems detected service disruptions
* 09:04 - Status page updated; investigation initiated
* 09:13 - Root cause identified
* 11:44 - Partial restoration completed; subset of websites back online
* 12:19 - All services fully restored and operational

## What Happened

During routine operations, critical shared system files were inadvertently removed from our server "ashien" in the Stockholm region. These files are essential for the proper functioning of customer containers in our hosting environment.
The deletion immediately impacted multiple customer websites hosted on this server. To recover, we safely shut down the server to access and repair the affected file system. All necessary files were successfully restored from our backup systems.

## Impact

* Affected customers: Multiple websites on our Stockholm server
* Service degradation: Complete unavailability for affected sites
* Data loss: None - all customer data remained intact and was fully recoverable
* Duration: 3 hours 26 minutes from incident start to full resolution

## Root Cause

Human error during routine administrative tasks. The incident occurred because system administrators require elevated privileges to perform necessary maintenance and support operations, which inherently carries risk when handling critical system files.

## What Went Well

* Rapid detection: Issue identified immediately through monitoring
* Quick diagnosis: Root cause determined within 12 minutes
* Transparent communication: Status page updated immediately with hourly progress updates
* Successful recovery: All data recovered without loss
* Effective execution: Recovery procedures completed smoothly

## Actions Being Taken

To reduce the likelihood of similar incidents, we are implementing the following measures:

1. Enhanced operational procedures: Reviewing and strengthening our change management processes with additional verification steps for operations affecting shared system components
2. Improved safeguards: Evaluating technical controls and confirmation mechanisms for high-risk operations
3. Training reinforcement: Conducting focused sessions on critical file handling and risk awareness for all team members with elevated system access
4. Documentation updates: Enhancing our runbooks with clearer guidelines for operations on production shared hosting infrastructure

We recognize that despite best efforts, the nature of system administration requires privileged access that will always carry inherent risk. Our focus is on implementing multiple layers of protection - procedural, technical, and human - to minimize the probability of similar incidents.

## Customer Compensation

This incident resulted in 206 minutes of unplanned downtime, bringing our availability for this month to 99.54% - below our guaranteed 99.95% SLA commitment.

We are proactively compensating all affected customers.

Our team will contact each affected customer individually within the next few days with specific details about the credit that will be applied to a future invoice.
We believe in taking responsibility when we fall short of our commitments. While our terms allow customers to request availability credits, we've chosen to proactively compensate everyone affected as a demonstration of our commitment to your trust and business.
If you have any questions about compensation or this incident, please contact our support team at support@templ.io.

## Closing

We sincerely apologize for the disruption this incident caused. We understand that website availability is critical to your business, and we take this responsibility seriously.
We remain committed to providing reliable hosting services and continuously improving our operations to prevent future incidents.
If you have any questions about this incident, please don't hesitate to contact our support team.
Posted Jan 20, 2026 - 13:51 UTC

Monitoring

All affected websites in our Stockholm data center have been restored and are now operational. We are actively monitoring all services to ensure stability. A detailed post-mortem analysis will be published once we have completed our review. We apologize for the inconvenience and thank you for your patience.
Posted Jan 20, 2026 - 11:25 UTC

Update

Some of the affected websites have now been restored. We are working on restoring all websites as soon as possible.
Posted Jan 20, 2026 - 10:46 UTC

Update

Recovery efforts are still ongoing for our Stockholm data center. We continue to make progress and will provide yet another update within 1 hour.
Posted Jan 20, 2026 - 10:18 UTC

Update

Recovery efforts are ongoing for our Stockholm data center. We're making progress and will provide another update within 1 hour.
Posted Jan 20, 2026 - 09:12 UTC

Identified

We have identified the root cause of the outage affecting websites in our Stockholm data center. Our team is now actively working to restore services.

We will provide another update within 1 hour with our progress.
Posted Jan 20, 2026 - 08:18 UTC

Investigating

Some websites hosted in our Stockholm data center may experience partial outages. We are currently investigating the issue.
Posted Jan 20, 2026 - 08:04 UTC
This incident affected: Stockholm data center.