管理 Amazon EMR 作业 - AWS Step Functions
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

管理 Amazon EMR 作业

此示例项目演示 Amazon EMR 和 AWS Step Functions 集成。

它展示如何创建 Amazon EMR 集群、添加多个步骤并运行它们,然后终止集群。

重要

Amazon EMR 没有免费定价套餐。运行示例项目将产生成本。您可以在 Amazon EMR 定价页面上找到定价信息。Amazon EMR 服务集成的可用性取决于 Amazon EMR API 的可用性。因此,此示例项目可能无法在某些 AWS 区域正常工作。请查看 Amazon EMR 文档,了解特殊区域的限制。

创建状态机并预置资源

  1. 打开 Step Functions 控制台,然后选择 Create a state machine (创建状态机)

  2. 选择 Sample Projects (示例项目),然后选择 Manage an EMR Job (管理 EMR 任务)

    此时将显示状态机 Code (代码)Visual Workflow (可视工作流程)

    
          容器任务通知工作流。
  3. 选择 Next (下一步)

    此时将显示 Deploy resources (部署资源) 页面,其中列出了将创建的资源。对于本示例项目,资源包括 Amazon S3 存储桶。

  4. 选择 Deploy Resources (部署资源)

    注意

    创建这些资源和相关 AWS Identity and Access Management (IAM) 权限可能需要长达 10 分钟的时间。当显示 Deploy resources (部署资源) 页面时,您可打开 Stack ID (堆栈 ID) 链接以查看正在预置的资源。

启动新的执行

  1. New execution 页面上,输入执行名称 (可选),然后选择 Start Execution (开始执行)

  2. (可选)为帮助您标识执行,您可以在 Enter an execution name (输入执行名称) 框中为执行指定一个 ID。如果未输入 ID,Step Functions 将自动生成一个唯一 ID。

    注意

    Step Functions 允许您创建包含非 ASCII 字符的状态机、执行和活动名称。这些非 ASCII 名称不适用于 Amazon CloudWatch。为确保您可以跟踪 CloudWatch 指标,请选择一个只使用 ASCII 字符的名称。

  3. (可选)您可以转到 Step Functions Dashboard (控制面板) 上新创建的状态机,然后选择 New execution (新执行)

  4. 执行完成后,您可以在 Visual workflow (可视工作流) 上选择状态,并浏览 Step details (步骤详细信息) 下的 Input (输入)Output (输出)

示例状态机代码

此示例项目中的状态机通过将参数直接传递给这些资源来与 Amazon EMR 集成。浏览此示例状态机,了解 Step Functions 如何使用状态机同步调用 Amazon EMR 任务,等待任务成功或失败,并终止集群。

有关 AWS Step Functions 如何控制其他 AWS 服务的更多信息,请参阅 服务与 AWS Step Functions 集成

{ "Comment": "An example of the Amazon States Language for running jobs on Amazon EMR", "StartAt": "Create an EMR cluster", "States": { "Create an EMR cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:createCluster.sync", "Parameters": { "Name": "ExampleCluster", "VisibleToAllUsers": true, "ReleaseLabel": "emr-5.26.0", "Applications": [ { "Name": "Hive" } ], "ServiceRole": "<EMR_SERVICE_ROLE>", "JobFlowRole": "<EMR_EC2_INSTANCE_PROFILE>", "LogUri": "s3://<EMR_LOG_S3_BUCKET>/logs/", "Instances": { "KeepJobFlowAliveWhenNoSteps": true, "InstanceFleets": [ { "Name": "MyMasterFleet", "InstanceFleetType": "MASTER", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] }, { "Name": "MyCoreFleet", "InstanceFleetType": "CORE", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] } ] } }, "ResultPath": "$.cluster", "Next": "Run first step" }, "Run first step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My first EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.firstStep", "Next": "Run second step" }, "Run second step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My second EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.secondStep", "Next": "Terminate Cluster" }, "Terminate Cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:terminateCluster", "Parameters": { "ClusterId.$": "$.cluster.ClusterId" }, "End": true } } }

IAM 示例

示例项目生成的此示例 AWS Identity and Access Management (IAM) 策略包括执行状态机和相关资源所需的最小权限。最佳实践是在您的 IAM 策略仅包含这些必需的权限。

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:RunJobFlow", "elasticmapreduce:DescribeCluster", "elasticmapreduce:TerminateJobFlows" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": [ "arn:aws-cn:iam::123456789012:role/StepFunctionsSample-EMRJobManagement-EMRServiceRole-ANPAJ2UCCR6DPCEXAMPLE", "arn:aws-cn:iam::123456789012:role/StepFunctionsSample-EMRJobManagementWJALRXUTNFEMI-ANPAJ2UCCR6DPCEXAMPLE-EMREc2InstanceProfile-1ANPAJ2UCCR6DPCEXAMPLE" ] }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws-cn:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRRunJobFlowRule" ] } ] }

以下策略可确保 addStep 具有足够的权限。

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:AddJobFlowSteps", "elasticmapreduce:DescribeStep", "elasticmapreduce:CancelSteps" ], "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws-cn:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRAddJobFlowStepsRule" ] } ] } }

有关在将 Step Functions 与其他 AWS 服务一起使用时如何配置 IAM 的信息,请参阅 集成服务的 IAM 策略