Website monitoring

The Biggest Website Outages of All Time.

Written by Laura Clayton Verified by Alex Ioannides 2,343 words | 12 min read Updated Feb 2, 2026
0%

Major website outages don’t start as “historic.” They begin with a small failure that compounds, a misconfiguration, a dependency timeout, a change that slips through review. The scale only becomes obvious once users can’t log in, check out, or load anything at all.

This article looks at some of the biggest website outages and what actually caused them. Not rumors or headlines, but the technical breakdowns teams shared after the fact, including where detection lagged and why recovery took longer than expected.

You’ll see common failure patterns repeat across companies and stacks, plus the early warning signs that were missed. If you want outages to be shorter and less surprising, these incidents are worth studying.

UptimeRobot
Downtime happens. Get notified!
Join the world's leading uptime monitoring service with 2.1M+ happy users.

Facebook, October 2021 

On October 4th, 2021, Facebook experienced a massive outage that also took down other Meta services such as WhatsApp, Instagram, Messenger, and Oculus Quest (which allows VR headset Oculus users to stream TV, movies, and videos). 

The outage lasted for approximately six hours, and it was caused by what Facebook later called a boring error when “configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.” 

It might not sound like much, but for a social network that also provides an authentication mechanism for other companies, six hours is an eternity. Even worse, because the outage also took down the tools used to reset the routers handling traffic, employees had to manually restart all systems. 

And in what sounds like one of those movies where everything goes wrong, employees could not, at first, even access the building where the data centers were located to debug the issue. 

A report issued by Facebook the day after the outage explained that “this took time, because these facilities are designed with high levels of physical and system security in mind. They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them” (as reported via Market Watch). By the time all was said and done, the outage had cost Facebook roughly $60 million in ad revenue and $47.3 billion in lost market cap. Zuckerberg lost $6 billion from personal wealth in the few hours Facebook was down, according to Bloomberg.  

SOURCE: Market Watch

Fastly, June 8th, 2021

The name Fastly might not carry the same weight as Facebook or Amazon for most people out there, but this cloud-based Content Delivery Network (CDN) provider is behind the content delivery of major companies around the world

When the 1-hour outage hit in 2021, CNBC reported that high-traffic websites and online services such as Amazon, Reddit, The New York Times, Shopify, Twitter, Spotify, and even the UK government’s official website all went down, displayed error messages or experience on-and-off difficulties.  

The outage was caused by a software bug within the company’s CDN configuration.  Funnily enough, the bug wasn’t triggered at the server or even from the company side – it happened when a single customer changed their own setting, accidentally “waking up” the bug (already in the system but dormant) that led to the outage. According to Fastly’s summary of the event, that single, innocent change caused 85% of their network to return errors. 

According to The Wolfcast, the “Outage may have cost digital platforms up to $150 million in lost sales” – for just one hour of offline time. 

British Airways, May 28th, 2017 

Downtime doesn’t only affect shopping and entertainment sites. Perhaps even scarier is the fact that it can also disrupt transportation. This happened when in 2017 an outage took down many of the systems and operations for British Airways. 

When an engineer accidentally disconnected the power supply to British Airway’s data center, a major outage followed, causing disruption to BA’s global operations. 

According to The Guardian, over 1,000 flights were grounded, terminals in London were overflowed with 75,000 stranded passengers, and access to the booking system and baggage handling were affected.

A month later, the British Airways owner estimated the data center outage cost the company about $102 million between lost revenue and the expense of compensating thousands of passengers, according to Data Center Knowledge

Google, December 14th, 2020

When a giant like Google goes down, the entire world feels the impact. Google’s outage in 2020 only lasted 45 minutes, but it’s considered one of the biggest outages to ever hit the internet, The Guardian reported.  

The outage took down Google services, including Gmail, Google Drive, YouTube, Google Calendar, Google Home apps, and Google Maps

The cause of the crash? A lack of storage space in Google’s authentication tools (what Google later called “an internal storage quota issue”) caused an error when the system failed to release more space automatically and caused the system to crash.  

Can you guess the damage that little 45-minute crash caused? Google lost $1.7M in ad revenue during the YouTube outage, according to Fox Business.  

SOURCE: Fox Business

Dyn, October 21st, 2016

Between 2001 and 2017, Dyn was an Internet performance management and web application security company that handled things like data traffic management and  Domain Name System (DNS) provider. 

When the outage happened in 2016, many major companies – everybody from Twitter and Spotify to Netflix, Airbnb, Amazon, Spotify, eBay, and the PlayStation Network – were using Dyn as their DNS provider, and they all went down with it. 

The cause behind the outage? One of the biggest distributed denial of service attacks (DDoS) to ever hit the internet. Wired called it the DDoS attack “that took down a big chunk of the Internet for most of the Eastern seaboard.” The attack overwhelmed the company’s servers, spreading malware vulnerabilities in basic equipment like printers and IP cameras. Later reports identified the attack as the largest of its kind in history (via The Guardian). 

In the end, the Dyn outage cost the business millions in lost revenue. Although there aren’t specific numbers available about the losses, CoverLink points out that “organizations spend an average of $2.5 million recovering from DDoS attacks” and because Dyn’s outage was so widespread, it likely cost a lot more than that.  

Spotify and Discord, March 8, 2022

Both Spotify and Discord suffered interruptions in service in March 2022. At the time, Mashable reported that it started with smaller issues around 1 pm, with users unable to log in and support pages glitching.  

Within half an hour, things were deteriorating, with API failures and further glitches complicating things. As The Verge reported later, it took about 2two hours before things started to come back online — just as Google Cloud (the service provider both Spotify and Discord operate on) announced they had their own glitch due to a malfunctioning component that required a reroute.    

Twitter and Instagram July 14, 2022

Another two-per-one outage hit Twitter and Instagram on the same day in July 2022, but with a twist – the outages weren’t actually connected.  

Twitter went down for 40 minutes in the early morning of July 14th, 2022. Within minutes, over half a million users were reporting issues uploading tweets and logging into the service. 

Twitter had already suffered two outages earlier in February for what the company called “a technical bug that briefly impacted how Tweets were loading,” so when it happened again in July, people were less than happy (via The Verge).  

About an hour later, however, Twitter was back up and running from “some trouble with internal systems.” 

Just a few hours later, CNET reported that Instagram went down too, with people reporting issues accessing the service, sending DMs, or seeing the app crash as soon as they tried to open it. And here’s some irony for you –Instagram users flocked to Twitter to report Instagram outages as soon as they started. 

Instagram was up again within a couple of hours, only to suffer another major outage in October 2022. This time, it wasn’t just a question of crashes and difficulty accessing the app, but accounts were accidentally locked and suspended because of a bug. By the time Instagram was back up, many large accounts had lost millions of followers, reports Lifestyle Asia.  

Amazon Web Services, 2017 and 2020 

Amazon has had its share of outages over the years. And because of its size, no other company out there loses more money every time its website goes down. 

Amazon Web Services (AWS) had a major outage in 2017, during which millions of cloud service and website users lost access to the website

The main issue with Amazon going down is that Amazon’s S3 web-based storage service provides cloud services for a lot of other sites out there. So when Amazon web services fail, other sites go down with it — in this case, that means everybody from Apple to Venmo to Slack suffered the consequences. 

This is what happened in March 2017, when a simple human error “broke the internet.” According to Data Center Knowledge, an engineer was debugging an issue when he accidentally mistyped a command. That was it – a simple click of the wrong key took down the cloud for several hours and caused headaches for many companies. That “oops” resulted in over $150 million in losses for the companies involved. 

Amazon Web Services (AWS) experienced another hours-long outage in December 2021, this time taking down Disney, Netflix, and Spotify as well. Even Alexa and iRobot reported glitches and connectivity issues. 

According to CNBC, the effect of this even extended beyond commercial sites – Many colleges in the U.S. had to cancel exams as they couldn’t access the platforms where the exams were hosted.  

Even worse, by the time the outage hit on December 22nd, Amazon was still recovering from two other major outages from earlier in the month — all three were caused by power outages at one of its data centers. 

SOURCE: TFIR

Because of its widespread reach, it’s almost impossible to calculate the losses caused by the December 22nd outage, but analysts believe it could potentially have cost “at least a billion dollars in economic loss to companies that depend on AWS,” TFIR says.  

If there’s anything we can learn from these examples is that downtime can affect everybody – even the big players have to deal with outages and the financial losses, reputation damage, and customer dissatisfaction that come with them. 

And while there’s no doubt that some outages may be unavoidable, it’s important for any company, no matter the size, to invest in proactive measures like downtime monitoring to prevent and respond to them as quickly and effectively as possible.  

What Major Website Outages Have in Common

Big website outages look dramatic from the outside, but the root causes are usually familiar. When you line them up, patterns repeat more often than not. The scale is different, the failure mode is not.

Configuration changes are a top offender. Many large outages start with a small change that behaved differently in production than expected. A misapplied config, a partial rollout, or a rollback that did not fully revert state can cascade quickly at scale. The systems did exactly what they were told, just not what was intended.

Dependency failures show up again and again. DNS providers, cloud platforms, CDNs, auth services, and internal control planes all act as shared points of failure. When one goes down, thousands of sites follow. The lesson is not “avoid dependencies,” but “assume they will fail and plan for it.”

Automation amplifies impact. The same tooling that enables fast deploys and global rollouts also spreads mistakes instantly. An automated change with no effective guardrails can take down everything before humans have time to react. Speed without limits increases blast radius.

Monitoring gaps make outages last longer. In many incidents, detection lagged behind failure. Alerts fired late, dashboards looked green, or signals were ignored because they were noisy in the past. The outage was not just the failure, but the delay in recognizing it.

Recovery paths often fail too. Several high-profile incidents worsened because rollback systems depended on the same broken components. If you cannot undo a change when core services are degraded, recovery time grows fast.

One consistent theme is that redundancy alone is not enough. Many affected systems had backups, failovers, or secondary regions. Those protections failed because they shared assumptions, configurations, or control planes with the primary system.

The practical takeaway is simple. Large outages are rarely caused by unknown risks. They come from known failure modes interacting under pressure. Config changes, dependencies, automation, and monitoring are always in play.

Studying big outages is useful not because your site is as large, but because the same patterns apply at any scale. Smaller systems just hit them less often.

FAQ’s

What qualifies as a major website outage?

A major website outage is an incident that causes widespread service unavailability or severe degradation for a large number of users. These outages usually affect core functionality, last longer than a few minutes, and often make the news. Scale and impact matter more than the exact duration.

What are the most common causes behind major outages?

Most major outages are caused by configuration errors, failed deployments, or cascading failures in dependencies. Network issues and DNS misconfigurations are also frequent triggers. Human error during routine changes is a recurring theme.

Why do small changes sometimes cause massive outages?

Modern systems are highly interconnected, so small changes can have outsized effects. A minor config update, expired certificate, or incorrect DNS record can propagate quickly across regions. Without proper safeguards, failures cascade faster than teams can react.

Are cloud providers usually responsible for big outages?

Not always. While cloud provider outages do happen, many incidents originate from customer-side misconfiguration or application-level issues. Even when the cloud is involved, outages often result from how services are designed or deployed on top of it.

How do monitoring gaps make outages worse?

Without proper monitoring, teams may not detect issues until users report them. Single-location or shallow checks can miss partial or regional failures. Delayed detection increases downtime and customer impact.

Start using UptimeRobot today.

Join more than 2M+ users and companies!

  • Get 50 monitors for free - forever!
  • Monitor your website, server, SSL certificates, domains, and more.
  • Create customizable status pages.

Written by

Laura Clayton

Copywriter |

Her qualifications and experience make her adept at creating content that is compelling, informative, and aligned with bringing readers the most accurate information. In her personal life, Laura is an avid reader and fan of Stephen King, finding inspiration and enjoyment in his storytelling techniques for her own writing. Additionally, Laura practices yoga on an amateur level, valuing the physical and mental benefits it offers. This eclectic blend of interests enriches her life and indirectly contributes to her unique voice in the professional realm. You can read more from Laura on: Mangools EmailListVerify Warmup Inbox

🎖️

Our content is peer-reviewed by our expert team to maximize accuracy and prevent miss-information.

Content verified by

Alex Ioannides

Head of DevOps |

Prior to his tenure at itrinity, Alex founded FocusNet Group and served as its CTO. The company specializes in providing managed web hosting services for a wide spectrum of high-traffic websites and applications. One of Alex's notable contributions to the open-source community is his involvement as an early founder of HestiaCP, an open-source Linux Web Server Control Panel. At the core of Alex's work lies his passion for Infrastructure as Code. He firmly believes in the principles of GitOps and lives by the mantra of "automate everything". This approach has consistently proven effective in enhancing the efficiency and reliability of the systems he manages. Beyond his professional endeavors, Alex has a broad range of interests. He enjoys traveling, is a football enthusiast, and maintains an active interest in politics.

Feature suggestions? Share

2 Comments

William Jensen May 3, 2023 at 1:43 pm

UptimeRobot is excellent, Simple, and easy to use. Set and forget. You will feel secure knowing if anything happens, you will be notified.

Kristian - Community Manager May 3, 2023 at 1:43 pm

We're glad you like it, William!

Recent Articles

Recent Articles