Skip to main content

lastplayed.io · Incident report

Status / June 21, 2026

SEV-1Resolved

lastplayed.io 503 outage — image registry unreachable during a node fault

A fault on a single cluster node simultaneously killed lastplayed.io's app pods and the private image registry they needed to restart; the site returned 503 for ~43 minutes until it was served from a cached image. The user-facing outage is over; a durable node fix is still pending.

Started
2026-06-21 15:44 UTC
Detected
2026-06-21 16:06 UTC
Resolved
2026-06-21 16:27 UTC (service restored)
Impact
~43 min

Affected components

lastplayed.io
web app
operational
Alternate host
same app, second hostname
operational
Container registry / CI deploys
private image registry
degraded
Sign-in (SSO)
identity service
operational
Cluster node
overlay-network fault
degraded

Timeline

monitoring16:45 UTC
Sign-in (SSO) restored with the same cached-image stopgap; deploy tooling unblocked. The underlying node is still fragile — watching.
resolved16:27 UTC
lastplayed.io restored, served from a cached image; both replicas healthy, HTTP 200.
identified16:18 UTC
Root cause: an overlay-network fault on a single cluster node simultaneously killed the app pods and downed the private registry they needed to restart.
investigating16:06 UTC
503s confirmed at the edge with no healthy backends behind them; investigation began.
started15:44 UTC
Both app pods were killed by a health-check timeout and could not re-pull their image. Outage begins — no automated alert fired.
Read the full RCA (PDF) →

← All incidents · June 21, 2026