Considerations with Sqoop on Amazon EMR
Consider the following items when you run Sqoop on Amazon EMR.
Using Sqoop with HCatalog integration
Sqoop on Amazon EMR supports Sqoop-HCatalog integrationmapred.output.direct.NativeS3FileSystem
and mapred.output.direct.EmrFileSystem
properties to false
. For more information, see Using HCatalog. You can use the
Hadoop -D mapred.output.direct.NativeS3FileSystem=false
and -D
mapred.output.direct.EmrFileSystem=false
commands. If you don't disable
direct write, no error occurs, but the table is created in Amazon S3 and no data is
written.
Sqoop JDBC and database support
By default, Sqoop has a MariaDB and PostgreSQL driver installed. The PostgreSQL
driver installed for Sqoop only works for PostgreSQL 8.4. To install an alternate
set of JDBC connectors for Sqoop, connect to the cluster master node and install
them in /usr/lib/sqoop/lib
. The following are links for various
JDBC connectors:
-
MariaDB: About MariaDB Connector/J
. -
PostgreSQL: PostgreSQL JDBC driver
. -
SQLServer: Download Microsoft JDBC driver for SQL Server
. -
MySQL: Download Connector/J
-
Oracle: Get Oracle JDBC drivers and UCP from the Oracle Maven repository
The supported databases for Sqoop are listed at the following url,
http://sqoop.apache.org/docs/version
/SqoopUserGuide.html#_supported_databases,
where version
is the version of Sqoop you are using, for
example 1.4.6. If the JDBC connect string does not match those in this list, you
must specify a driver.
For example, you can export to an Amazon Redshift database table with the following command (for JDBC 4.1):
sqoop export --connect jdbc:redshift://$
MYREDSHIFTHOST
:5439/mydb
--tablemysqoopexport
--export-dir s3://amzn-s3-demo-bucket/myinputfiles/
--driver com.amazon.redshift.jdbc41.Driver --usernamemaster
--passwordMymasterpass1
You can use both the MariaDB and MySQL connection strings but if you specify the MariaDB connection string, you need to specify the driver:
sqoop export --connect jdbc:mariadb://$
HOSTNAME
:3306/mydb
--tablemysqoopexport
--export-dir s3://amzn-s3-demo-bucket/myinputfiles/
--driver org.mariadb.jdbc.Driver --usernamemaster
--passwordMymasterpass1
If you are using Secure Socket Layer encryption to access your database, you need to use a JDBC URI like in the following Sqoop export example:
sqoop export --connect jdbc:mariadb://$
HOSTNAME
:3306/mydb
?verifyServerCertificate=false&useSSL=true&requireSSL=true --tablemysqoopexport
--export-dir s3://amzn-s3-demo-bucket/myinputfiles/
--driver org.mariadb.jdbc.Driver --usernamemaster
--passwordMymasterpass1
For more information about SSL encryption in RDS, see Using SSL to encrypt a connection to a DB instance in the Amazon RDS User Guide.
For more information, see the Apache
Sqoop
Securing your password
There are several methods that you might choose from to securely pass your password: