Tutorial: Working with Amazon DynamoDB and Apache Hive
In this tutorial, you will launch an Amazon EMR cluster, and then use Apache Hive to process data stored in a DynamoDB table.
Hive is a data warehouse application for Hadoop that allows you to process and analyze data from multiple sources. Hive provides a SQL-like language, HiveQL, that lets you work with data stored locally in the Amazon EMR cluster or in an external data source (such as Amazon DynamoDB).
For more information, see to the Hive
Tutorial
Topics
Before you begin
For this tutorial, you will need the following:
-
An Amazon account. If you do not have one, see Signing up for Amazon.
-
An SSH client (Secure Shell). You use the SSH client to connect to the leader node of the Amazon EMR cluster and run interactive commands. SSH clients are available by default on most Linux, Unix, and Mac OS X installations. Windows users can download and install the PuTTY
client, which has SSH support.
Next step
Step 1: Create an Amazon EC2 key pair