Write-ahead logs (WAL) for Amazon EMR - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Write-ahead logs (WAL) for Amazon EMR

With Amazon EMR 6.15 and higher, you can write your Apache HBase write-ahead logs (WAL) to the Amazon EMR WAL. With lower Amazon EMR releases, when you create a cluster with the HBase on Amazon S3 option, WAL is the only Apache HBase component that gets stored in the local disk for clusters, and you can store other components such as the root directory, store files (HFiles), table metadata, and data on Amazon S3.

You can use Amazon EMR WAL to recover data that didn't flush to Amazon S3. To fully back up your HBase clusters, opt in to use the Amazon EMR WAL service. Behind the scenes, RegionServer writes your HBase write-ahead logs (WAL) to the WAL for Amazon EMR.

In the event that your cluster or the AZ becomes unhealthy or unavailable, you can create a new cluster, point it to the same S3 root directory and Amazon EMR WAL workspace, and automatically recover the data in WAL within a few minutes. For more information, see Restoring from Amazon EMR WAL.

Note

Amazon EMR retains your write-ahead log and its data for 30 days from the time you create your cluster. After 30 days, Amazon EMR automatically deletes your Amazon EMR WAL and its data. However, if you launch a new WAL-enabled cluster from the same S3 root directory, you can extend the use of your WAL for 30 days from the launch time of the new cluster. Amazon EMR will still clean up any WAL data from the first cluster after the initial 30-day period. For more information, see Restoring from Amazon EMR WAL.

The following sections describe how to set up and use Amazon EMR WAL with your HBase-enabled EMR cluster.