Using HBase snapshots
HBase uses a built-in snapshotcommand-runner.jar
with either the Amazon CLI or Amazon SDK for Java. For
more information about other types of HBase backups, see HBase backup
Create a snapshot using a table
hbase snapshot create -n
snapshotName
-ttableName
Using command-runner.jar from the Amazon CLI:
aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \ --steps Name="HBase Shell Step",Jar="command-runner.jar",\ Args=[ "hbase", "snapshot", "create","-n","
snapshotName
","-t","tableName
"]
Amazon SDK for Java
HadoopJarStepConfig hbaseSnapshotConf = new HadoopJarStepConfig() .withJar("command-runner.jar") .withArgs("hbase","snapshot","create","-n","snapshotName","-t","tableName");
Note
If your snapshot name is not unique, the create operation fails with a return code of
-1
or 255
but you may not see an error
message that states what went wrong. To use the same snapshot name, delete it and
then re-create it.
Delete a snapshot
hbase shell >> delete_snapshot '
snapshotName
'
View snapshot info
hbase snapshot info -snapshot
snapshotName
Export a snapshot to Amazon S3
Important
If you do not specify a -mappers
value when exporting a snapshot, HBase uses an arbitrary calculation to determine the number of mappers. This value can be very large depending on your table size, which negatively affects running jobs during the export. For this reason, we recommend that you specify the -mappers
parameter, the -bandwidth
parameter (which specifies the bandwidth consumption in megabytes per second), or both to limit the cluster resources used by the export operation. Alternatively, you can run the export snapshot operation during a period of low usage.
hbase snapshot export -snapshot
snapshotName
\ -copy-to s3://amzn-s3-demo-bucket
/folder
-mappers 2
Using command-runner.jar
from the Amazon CLI:
aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \ --steps Name="HBase Shell Step",Jar="command-runner.jar",\ Args=[ "hbase", "snapshot", "export","-snapshot","
snapshotName
","-copy-to","s3://amzn-s3-demo-bucket
/folder
","-mappers","2","-bandwidth","50"]
Amazon SDK for Java:
HadoopJarStepConfig hbaseImportSnapshotConf = new HadoopJarStepConfig() .withJar("command-runner.jar") .withArgs("hbase","snapshot","export", "-snapshot","snapshotName","-copy-to", "s3://bucketName/folder", "-mappers","2","-bandwidth","50");
Import snapshot from Amazon S3
Although this is an import, the HBase option used here is still
export
.
sudo -u hbase hbase snapshot export \ -D hbase.rootdir=s3://
amzn-s3-demo-bucket
/folder
\ -snapshotsnapshotName
\ -copy-to hdfs://masterPublicDNSName
:8020/user/hbase \ -mappers 2
Using command-runner.jar
from the Amazon CLI:
aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \ --steps Name="HBase Shell Step",Jar="command-runner.jar", \ Args=["sudo","-u","hbase","hbase snapshot export","-snapshot","
snapshotName
", \ "-D","hbase.rootdir=s3://amzn-s3-demo-bucket
/folder
", \ "-copy-to","hdfs://masterPublicDNSName
:8020/user/hbase","-mappers","2","-chmod","700"]
Amazon SDK for Java:
HadoopJarStepConfig hbaseImportSnapshotConf = new HadoopJarStepConfig() .withJar("command-runner.jar") .withArgs("sudo","-u","hbase","hbase","snapshot","export", "-D","hbase.rootdir=s3://path/to/snapshot", "-snapshot","snapshotName","-copy-to", "hdfs://masterPublicDNSName:8020/user/hbase", "-mappers","2","-chuser","hbase");
Restore a table from snapshots within the HBase shell
hbase shell >> disable
tableName
>> restore_snapshotsnapshotName
>> enabletableName
HBase currently does not support all snapshot commands found in the HBase shell. For
example, there is no HBase command-line option to restore a snapshot, so you must
restore it within a shell. This means that command-runner.jar
must
run a Bash command.
Note
Because the command used here is echo
, it is possible that your shell
command will still fail even if the command run by Amazon EMR returns a
0
exit code. Check the step logs if you choose to run a shell
command as a step.
echo 'disable
tableName
; \ restore_snapshotsnapshotName
; \ enabletableName
' | hbase shell
Here is the step using the Amazon CLI. First, create the following
snapshot.json
file:
[ { "Name": "restore", "Args": ["bash", "-c", "echo $'disable \"
tableName
\"; restore_snapshot \"snapshotName
\"; enable \"tableName
\"' | hbase shell"], "Jar": "command-runner.jar", "ActionOnFailure": "CONTINUE", "Type": "CUSTOM_JAR" } ]
aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \ --steps file://./snapshot.json
Amazon SDK for Java:
HadoopJarStepConfig hbaseRestoreSnapshotConf = new HadoopJarStepConfig() .withJar("command-runner.jar") .withArgs("bash","-c","echo $'disable \"tableName\"; restore_snapshot \"snapshotName\"; enable \"snapshotName\"' | hbase shell");