LavaTech Incidents

Live updated incident logs from LavaTech

On June 15, we had roughly 64 minutes of downtime (between 01:05UTC and 02:09UTC) on our US servers which took down a majority of our services.

This was caused by a power outage one of the legs (“A-side”) on the rack said services were located on. Other people on the same location have also reported the same issue:

The power provided to the rack comes in two legs (“A-side” and “B-side”), and most of the equipment on the rack is connected to both legs for redundancy. The router is among the few equipment on the rack without a redundant PSU (and definitely the most critical one), and at the time of the power cut it was connected to the leg that went down.

Indeed, when the power cut happened, most of the servers actually stayed up, but without a working network connection. This was also visible afterwards on our energy use graphs on our B-side PDU:

B-side PDU graphs showing a significant spike from ~9A to ~16.5A for roughly an hour

As this issue affected our services solely because of lack of redundant PSUs on the router, options for replacing it are being researched as we speak. More updates will be provided once it is replaced.

This is an incident report on our blog page. Page isn't automatically refreshed, please hit F5 yourself. We do post live updates on the elixire discord guild.

Result

This move was successful.

Total upload downtime was ~27 minutes, and total access downtime was ~14m. Some domains are facing more downtime due to Cloudflare.

Read more...

This is an incident report on our blog page. Page isn't automatically refreshed, please hit F5 yourself. We do post live updates on our discord guild: https://discord.gg/urgYG9S

Result

This maintenance was successful. Downtime end time: 8:02PM GMT As of 8:30PM GMT, extended backups are also now enabled, and as of 8:38PM GMT, automated MAM clearing was deployed, but these required no downtime.

Total downtime was 2 hours and 12 minutes. Total maintenance period (excluding time between announcement and downtime) was 2 hours and 48 minutes.

Read more...