End-to-end 亚马逊 EMR Java 源代码示例

开发人员可使用自定义 Java 代码调用 Amazon EMR API，以便执行与使用 Amazon EMR 控制台或 CLI 可能执行的相同操作。本节提供了安装 Amazon Toolkit for Eclipse 和运行功能齐全的 Java 源代码示例所需的 end-to-end步骤，该示例向 Amazon EMR 集群添加了步骤。

注意

此示例侧重于 Java，但是 Amazon EMR 还支持多种编程语言，其中包含一组亚马逊 EMR。 SDKs有关更多信息，请参阅用于 SDKs 致电 Amazon EMR APIs。

此 Java 源代码示例演示了如何使用 Amazon EMR API 执行以下任务：

检索 Amazon 凭证并将其发送到 Amazon EMR 以进行 API 调用
配置新的自定义步骤和新的预定义步骤
将新步骤添加到现有 Amazon EMR 集群
IDs 从正在运行的集群中检索集群步骤

注意

此示例演示了如何将步骤添加到现有集群，因此要求在您的账户上具有活动集群。

在您开始之前，请安装与您的计算机平台相匹配的 Eclipse IDE for Java EE Developers 版本。有关更多信息，请转到 Eclipse 下载。

下一步，安装适用于 Eclipse 的数据库开发插件。

安装数据库开发 Eclipse 插件

打开 Eclipse IDE。
依次选择 Help (帮助)、Install New Software (安装新软件)。
在 Work with: (使用:) 字段中，键入 http://download.eclipse.org/releases/kepler，或者键入与您的 Eclipse IDE 版本号相匹配的路径。
在项目列表中，依次选择 Database Development (数据库开发)、Finish (完成)。
在提示时重新启动 Eclipse。

接下来，安装 Toolkit for Eclipse，以便提供实用的预配置源代码项目模板。

安装 Toolkit for Eclipse

打开 Eclipse IDE。
依次选择 Help (帮助)、Install New Software (安装新软件)。
在 Work with: (使用:) 字段中，输入 https://aws.amazon.com/eclipse。
在项目列表中，依次选择 Amazon Toolkit for Eclipse 和 Finish (完成)。
在提示时重新启动 Eclipse。

接下来，创建一个新的 Amazon Java 项目并运行示例 Java 源代码。

创建新的 Amazon Java 项目

打开 Eclipse IDE。
依次选择 File (文件)、New (新建) 和 Other (其它)。
在 Select a wizard (选择向导) 对话框中，依次选择 Amazon Java Project (亚马逊云科技 Java 项目)、Next (下一步)。
例如，在 “新建 Amazon Java 项目” 对话框的Project name:字段中，输入新项目的名称EMR-sample-code。
选择配置 Amazon 帐户...，输入您的公钥和私有访问密钥，然后选择完成。有关创建访问密钥的更多信息，请参阅《亚马逊云科技一般参考》中的如何获取安全凭证？。

注意
您不应直接在代码中嵌入访问密钥。使用 Amazon EMR SDK，可以在已知位置放置访问密钥，这样就不必保留在代码中。
在新的 Java 项目中，右键单击 src 文件夹，然后选择 New (新建) 和 Class (类)。
在 Java Class (Java 类) 对话框的 Name (名称) 字段中，输入新类的名称，例如 main。
在 Which method stubs would you like to create? (您想创建哪些方法存根?) 部分中，依次选择 public static void main(String[] args) 和 Finish (完成)。

在您的新类中输入 Java 源代码，然后添加示例中针对这些类和方法的相应 import 语句。为方便起见，以下显示了完整的源代码清单。

注意

在以下示例代码中，使用以下 Amazon CLI 命令将示例集群 ID (JobFlowId) 替换为在 Amazon Web Services 管理控制台或中找到的账户中的有效集群 ID：j-xxxxxxxxxxxx


aws emr list-clusters --active | grep "Id"

此外，将示例 Amazon S3 路径 s3://path/to/my/jarfolder 替换为您的 JAR 的有效路径。最后，将示例类名称 com.my.Main1 替换为您的 JAR 中的类的正确名称（如果适用）。


import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;

public class Main {

	public static void main(String[] args) {
		AWSCredentials credentials_profile = null;
		try {
			credentials_profile = new ProfileCredentialsProvider("default").getCredentials();
		} catch (Exception e) {
			throw new AmazonClientException(
					"Cannot load credentials from .aws/credentials file. " +
							"Make sure that the credentials file exists and the profile name is specified within it.",
					e);
		}

		AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
				.withCredentials(new AWSStaticCredentialsProvider(credentials_profile))
				.withRegion(Regions.US_WEST_1)
				.build();

		// Run a bash script using a predefined step in the StepFactory helper class
		StepFactory stepFactory = new StepFactory();
		StepConfig runBashScript = new StepConfig()
				.withName("Run a bash script")
				.withHadoopJarStep(stepFactory.newScriptRunnerStep("s3://jeffgoll/emr-scripts/create_users.sh"))
				.withActionOnFailure("CONTINUE");

		// Run a custom jar file as a step
		HadoopJarStepConfig hadoopConfig1 = new HadoopJarStepConfig()
				.withJar("s3://path/to/my/jarfolder") // replace with the location of the jar to run as a step
				.withMainClass("com.my.Main1") // optional main class, this can be omitted if jar above has a manifest
				.withArgs("--verbose"); // optional list of arguments to pass to the jar
		StepConfig myCustomJarStep = new StepConfig("RunHadoopJar", hadoopConfig1);

		AddJobFlowStepsResult result = emr.addJobFlowSteps(new AddJobFlowStepsRequest()
				.withJobFlowId("j-xxxxxxxxxxxx") // replace with cluster id to run the steps
				.withSteps(runBashScript, myCustomJarStep));

		System.out.println(result.getStepIds());

	}
}

依次选择 Run (运行)、Run As (运行方式) 和 Java Application (Java 应用程序)。
如果示例运行正常，Eclipse IDE 控制台窗口中将显示新步骤的列表。 IDs 正确的输出类似于：
```
[s-39BLQZRJB2E5E, s-1L6A4ZU2SAURC]
```

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

编写启动和管理 Amazon EMR 集群的应用程序

Amazon EMR API 调用的常见概念