管理 Amazon EMR Job - Amazon Step Functions
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

管理 Amazon EMR Job

此示例项目演示 Amazon EMR 和Amazon Step Functions集成。

它展示如何创建 Amazon EMR 集群、添加多个步骤并运行它们,然后终止集群。

重要

Amazon EMR 没有免费定价套餐。运行示例项目将产生成本。您可以在Amazon EMR 定价页. Amazon EMR 服务集成的可用性取决于 Amazon EMR API 的可用性。因此,此示例项目可能无法在某些 Amazon 区域正常工作。请参阅Amazon EMR文档,了解特殊区域的限制。

创建状态机并预置资源

  1. 打开Step Functions 控制台,然后选择创建状态机

  2. 选择 Sample Projects (示例项目),然后选择 Manage an EMR Job (管理 EMR 任务)

    此时将显示状态机 Code (代码)Visual Workflow (可视工作流程)

    
          容器任务通知工作流。
  3. 选择 Next

    此时将显示 Deploy resources (部署资源) 页面,其中列出了将创建的资源。对于本示例项目,资源包括 Amazon S3 存储桶。

  4. 选择 Deploy Resources (部署资源)

    注意

    对于这些资源和相关的资源,可能需要长达 10 分钟的时间。Amazon Identity and Access Management(IAM) 要创建的权限。当显示 Deploy resources (部署资源) 页面时,您可打开 Stack ID (堆栈 ID) 链接以查看正在预置的资源。

启动新的执行

  1. New execution 页面上,输入执行名称 (可选),然后选择 Start Execution (开始执行)

  2. (可选)为了帮助识别您的执行,您可以在输入执行名称。如果未输入 ID,Step Functions 将自动生成一个唯一 ID。

    注意

    Step Functions 允许您创建包含非 ASCII 字符的状态机、执行和活动名称。这些非 ASCII 名称不适用 Amazon CloudWatch。要确保您可以跟踪 CloudWatch 指标,请选择仅使用 ASCII 字符的名称。

  3. (可选)您可以转到 Step Functions 上新创建的状态机控制面板,然后选择新执行

  4. 执行完成后,您可以在 Visual workflow (可视工作流) 上选择状态,并浏览 Step details (步骤详细信息) 下的 Input (输入)Output (输出)

示例状态机代码

此示例项目中的状态机通过将参数直接传递给这些资源来与 Amazon EMR 集成。浏览此状态机示例,了解 Step Functions 如何使用状态机同步调用 Amazon EMR 任务,等待任务成功或失败,并终止集群。

有关 Amazon Step Functions 如何控制其他 Amazon 服务的更多信息,请参阅将 Amazon Step Functions 与其他服务一起使用

{ "Comment": "An example of the Amazon States Language for running jobs on Amazon EMR", "StartAt": "Create an EMR cluster", "States": { "Create an EMR cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:createCluster.sync", "Parameters": { "Name": "ExampleCluster", "VisibleToAllUsers": true, "ReleaseLabel": "emr-5.26.0", "Applications": [ { "Name": "Hive" } ], "ServiceRole": "<EMR_SERVICE_ROLE>", "JobFlowRole": "<EMR_EC2_INSTANCE_PROFILE>", "LogUri": "s3://<EMR_LOG_S3_BUCKET>/logs/", "Instances": { "KeepJobFlowAliveWhenNoSteps": true, "InstanceFleets": [ { "Name": "MyMasterFleet", "InstanceFleetType": "MASTER", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] }, { "Name": "MyCoreFleet", "InstanceFleetType": "CORE", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] } ] } }, "ResultPath": "$.cluster", "Next": "Run first step" }, "Run first step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My first EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.firstStep", "Next": "Run second step" }, "Run second step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My second EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.secondStep", "Next": "Terminate Cluster" }, "Terminate Cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:terminateCluster", "Parameters": { "ClusterId.$": "$.cluster.ClusterId" }, "End": true } } }

IAM 示例

此示例Amazon Identity and Access Management(IAM) 策略包括执行状态机和相关资源所需的最小权限。最佳实践是在 IAM 策略仅包含这些必需的权限。

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:RunJobFlow", "elasticmapreduce:DescribeCluster", "elasticmapreduce:TerminateJobFlows" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": [ "arn:aws:iam::123456789012:role/StepFunctionsSample-EMRJobManagement-EMRServiceRole-ANPAJ2UCCR6DPCEXAMPLE", "arn:aws:iam::123456789012:role/StepFunctionsSample-EMRJobManagementWJALRXUTNFEMI-ANPAJ2UCCR6DPCEXAMPLE-EMREc2InstanceProfile-1ANPAJ2UCCR6DPCEXAMPLE" ] }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRRunJobFlowRule" ] } ] }

以下策略可确保 addStep 具有足够的权限。

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:AddJobFlowSteps", "elasticmapreduce:DescribeStep", "elasticmapreduce:CancelSteps" ], "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRAddJobFlowStepsRule" ] } ] } }

有关在将 Step Functions 与其他Amazon服务,请参阅集成服务的 IAM 策略