AWS Outages in December 2021 Highlight Need for Multi-Cloud Infrastructure
Amazon Web Services, the leading provider of cloud infrastructure technology for businesses, SMBs, and enterprises, experienced a series of outages in December 2021. The December 7 outage was the most significant one as it lasted five hours and disrupted Amazon’s retail business and third-party online services on the East Coast. Popular websites and heavily used services, including Disney+, and Netflix, were knocked offline. Roomba vacuums, Amazon’s Ring security cameras, and other internet-connected devices like smart cat litter boxes and app-connected ceiling fans were also taken down by the outage.
Amazon’s own retail operations were also brought to a standstill in some parts of the US – internal apps used by Amazon’s warehouse and delivery workforce rely on AWS, so employees could not scan packages or access delivery routes during the outage. Third-party sellers also couldn’t access a site used to manage customer orders. Since Amazon’s Support Contact Center also runs on the AWS network, customers couldn’t create support cases for seven hours during the outage.
So, What Exactly Happened?
According to AWS’s explanation of what went wrong, the source of the December 7 outage was a glitch in its internal network that hosts “foundational services” such as application/service monitoring, the AWS internal Domain Name Service (DNS), authorization, and parts of the EC2 network control plane.
An automated activity to scale capacity of one of the AWS services hosted on the main AWS network triggered unexpected behavior from many clients inside the internal network, resulting in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network. This caused delays in communication between these networks and led to persistent congestion and performance issues with the devices connecting the two networks.
The second outage, which occurred on December 15, lasted for less than an hour and impacted the West Coast. It affected numerous AWS-based business services, such as Duo, the two-factor authentication endpoint security service, Zoom, and Slack. Entertainment services, including Hulu, Xbox Live, and Halo, also went down. The outage was attributed to network congestion due to internal engineering that incorrectly moved more traffic than expected to parts of the AWS backbone that affected connectivity.
The third outage on December 22 disrupted operations at major companies, including Slack, Fortnite, Imgur, Epic Games Store, and Coinbase. The company blamed the outage on a loss of power within a data center.
A Multi-Cloud Approach to Mitigate Outages
The latest AWS outages highlight why it’s so critical for businesses to design their technology infrastructure for resilience, with no single point of failure. The most efficient way to protect yourself from cloud outages is to store your data in more than one provider’s Cloud. Multi-cloud means you leverage two or more brands of public clouds, such as AWS and Azure, or Azure and Google, or perhaps even all three.
A multi-cloud strategy assumes it’s unlikely multiple cloud providers will fail at once. So, when one provider goes down, you simply fail-over your cloud-based application from the primary cloud provider to a secondary cloud provider and then return to the primary after the outage ends. By not putting all our eggs into one cloud basket, you lower the exposure that your systems could be taken out by a single cloud outage, reducing or even eliminating costly downtime.
However, managing multiple cloud environments can be complex. Each cloud brand has different features and functions for storage, databases, computing, security, governance, and more. If you want to use cloud-native features on both clouds, which is typically preferred, the differences become even more problematic. You must also carefully match applications to cloud capabilities and monitor performance and costs. Ultimately, you’ll pay operating costs twice and twice as much to customize the application and data for a different cloud, especially considering the need for special development, databases, and administration skills for each. This makes multi-cloud expensive and out of reach for many businesses.
Regardless of how you do it, it’s extremely important to prepare for cloud outages by creating a redundant data access system that won’t go down if one cloud provider does. For more updates like these and information on our IT services, contact Alvarez Technology Group today!