AWS EBS Volumes Aren't Safe from Failure, Backup to S3

AWS Logo

EBS is a block storage service offered by AWS. If you are running an EC2 instance, you are definitely using it, as it serves as a storage drive for your server. However, it is not immune to failure and you should always perform regular backups.

“Fault tolerant” does not mean safe

Of course, EBS is pretty fault tolerant on the backend. AWS is not a bunch of wilds running a JBOD array; they predicted single drive failures, so a single failed drive is not going to shut down your server.

However, EBS failures can and do happen because EBS volumes have an annual failure rate (AFR) between 0.1% and 0.2%. It’s not a lot, and it’s very low compared to ~ 4% of a single hard drive, but it’s not nothing. Your EBS volume is unlikely to crash on you, but if you’re running tons of it, you might run into a few issues here and there.

The easy solution, of course, is to make backups. EBS provides a great tool for this: the snapshot feature. You can create a snapshot, which acts like a backup stored in S3, which is much more durable. In the event of an EBS failure, you can restore from a backup. You don’t need to automate it yourself, because EBS Lifecycle Manager can manage it for you, but it’s not enabled by default. You will of course have to pay for the additional storage costs associated with storing data in S3, but it is cheaper than EBS.

AWS does not try to hide this fact and recommends taking regular snapshot backups. Most people will also recommend doing backups in general, but it’s easy to get carried away by the magic of the cloud and forget about this fact. At the end of the day, it’s just someone else’s computer and it can crash like any other. An extreme example of this is in September 2019, when an AWS US-EAST-1 data center had a power outage and generator failure, wiping out EBS servers and the data that came with it.

Amazon AWS experienced a power outage, its backup generators failed, which killed their EBSl servers, which took all of our data with it. Then it took them four days to figure it out and tell us about it.

Reminder: the cloud is just a Reston computer with poor power.

– Andy Hunt (@PragmaticAndy) September 3, 2019

The main driver of high availability architecture and cloud computing in general is to ensure that when isolated failures inevitably occur, it does not destroy the entire application. You still need to take steps to avoid failures in the first place, but sometimes, as with hard drives, this is a hardware problem, not something you can fix with code.

S3, on the other hand, is very secure, with 99.999999999% durability (that’s eleven nine). If you store 10,000,000 items in S3, on average you can expect to lose a single item once every 10,000 years. This is because unlike EBS, S3 is fully replicated across at least three Availability Zones and continuously monitored for disk failures in each zone. Even if an entire data center catches fire, your S3 buckets and the snapshots they contain should still be safe.

How do EBS snapshots work?

EBS snapshots are incremental backups. Each subsequent backup will only store the data that has changed, so you won’t rack up insane storage costs by taking regular snapshots.

EBS snapshots.

Activating them is quite simple. Of EC2 console, go to Elastic Block Store> Lifecycle Manager in the sidebar, and create a new policy.

Lifecycle Manager to create a new policy. “Width =” 402 “height =” 146 “onload =” pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon (this); “onerror =” this.onerror = null; pagespeed.lazyLoad.loadIfVeaconImages (this); “/>

You will need to specify a tag to which this policy will apply. This can be the name of a single EBS volume or a general tag that applies to everything.

Specify the tag to which the policy should apply.

You can set the schedule for this policy as well as the snapshot retention policy. You usually don’t need to keep extensive backups, so a handful of them based on snapshot frequency should be fine.

Set a schedule for the snapshot retention policy and policy

If you are serious about high availability, you can also enable Fast Snapshot Restore, which will make the recovery completely instant. however, It is quite expensive, so it’s not something that everyone should activate.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.