为 Amazon EMR 配置托管扩展 - Amazon EMR
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

为 Amazon EMR 配置托管扩展

以下各节介绍如何启动使用托管扩展的 EMR 集群 Amazon Web Services Management Console Amazon SDK for Java、或。 Amazon Command Line Interface

使用 Amazon Web Services Management Console 来配置托管扩展

您可以在创建集群时使用 Amazon EMR 控制台配置托管式扩缩,也可更改正在运行的集群的托管式扩缩策略。

Console
使用控制台创建集群时配置托管扩展
  1. 登录 Amazon Web Services Management Console,然后在 /emr 上打开亚马逊 EMR 控制台。https://console.aws.amazon.com

  2. EC2在左侧导航窗格的 EMR on 下,选择集群,然后选择创建集群。

  3. 选择 Amazon EMR 发行版 emr-5.30.0 或更高版本(emr-6.0.0 版本除外)。

  4. Cluster scaling and provisioning option(集群扩展和预置选项)下,选择 Use EMR-managed scaling(使用 EMR 托管扩展)。指定最小最大实例数、最大核心节点实例数和最大按需型实例数。

  5. 选择适用于集群的任何其他选项。

  6. 要启动集群,选择 Create cluster(创建集群)。

使用控制台在现有集群上配置托管扩展
  1. 登录 Amazon Web Services Management Console,然后在 /emr 上打开亚马逊 EMR 控制台。https://console.aws.amazon.com

  2. EC2在左侧导航窗格的 EMR on 下,选择集群,然后选择要更新的集群。

  3. 在集群详细信息页面的 Instances(实例)选项卡上,找到 Instance group settings(实例组设置)部分。选择编辑集群扩展,为最小最大实例数以及按需限制指定新值。

使用 Amazon CLI 来配置托管扩展

创建集群时,您可以使用 Amazon EMR 的 Amazon CLI 命令来配置托管扩展。您可以使用速记语法 (可在相关命令中指定内联 JSON 配置)。也可以引用包含配置 JSON 的文件。您也可以将托管扩展策略应用于现有集群,并删除以前应用的托管扩展策略。此外,您可以从正在运行的集群中检索扩展策略配置的详细信息。

在集群启动期间启用托管扩展

您可以在集群启动期间启用托管扩展,如以下示例所示。

aws emr create-cluster \ --service-role EMR_DefaultRole \ --release-label emr-7.12.0 \ --name EMR_Managed_Scaling_Enabled_Cluster \ --applications Name=Spark Name=Hbase \ --ec2-attributes KeyName=keyName,InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceType=m4.xlarge,InstanceGroupType=MASTER,InstanceCount=1 InstanceType=m4.xlarge,InstanceGroupType=CORE,InstanceCount=2 \ --region us-east-1 \ --managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=2,MaximumCapacityUnits=4,UnitType=Instances}'

使用时,也可以使用--managed-scaling-policy 选项指定托管策略配置create-cluster

将托管扩展策略应用于现有集群

您可以将托管扩展策略应用于现有集群,如以下示例所示。

aws emr put-managed-scaling-policy --cluster-id j-123456 --managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=1, MaximumCapacityUnits=10, MaximumOnDemandCapacityUnits=10, UnitType=Instances}'

也可以使用 aws emr put-managed-scaling-policy 命令将托管扩展策略应用于现有集群。以下示例使用对 JSON 文件 managedscaleconfig.json 的引用,该文件指定托管扩展策略配置。

aws emr put-managed-scaling-policy --cluster-id j-123456 --managed-scaling-policy file://./managedscaleconfig.json

以下示例显示 managedscaleconfig.json 文件的内容,该文件定义托管扩展策略。

{ "ComputeLimits": { "UnitType": "Instances", "MinimumCapacityUnits": 1, "MaximumCapacityUnits": 10, "MaximumOnDemandCapacityUnits": 10 } }

检索托管扩展策略配置

GetManagedScalingPolicy 命令检索策略配置。例如,以下命令检索集群 ID 为 j-123456 的集群的配置。

aws emr get-managed-scaling-policy --cluster-id j-123456

该命令生成以下示例输出。

{ "ManagedScalingPolicy": { "ComputeLimits": { "MinimumCapacityUnits": 1, "MaximumOnDemandCapacityUnits": 10, "MaximumCapacityUnits": 10, "UnitType": "Instances" } } }

有关在中使用 Amazon EMR 命令的更多信息 Amazon CLI,请参阅。https://docs.amazonaws.cn/cli/latest/reference/emr

删除托管扩展策略

RemoveManagedScalingPolicy 命令可删除策略配置。例如,以下命令删除集群 ID 为 j-123456 的集群的配置。

aws emr remove-managed-scaling-policy --cluster-id j-123456

用于配置 Amazon SDK for Java 托管扩展

以下程序摘要说明如何使用 Amazon SDK for Java配置托管扩展:

package com.amazonaws.emr.sample; import java.util.ArrayList; import java.util.List; import com.amazonaws.AmazonClientException; import com.amazonaws.auth.AWSCredentials; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.auth.profile.ProfileCredentialsProvider; import com.amazonaws.regions.Regions; import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce; import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder; import com.amazonaws.services.elasticmapreduce.model.Application; import com.amazonaws.services.elasticmapreduce.model.ComputeLimits; import com.amazonaws.services.elasticmapreduce.model.ComputeLimitsUnitType; import com.amazonaws.services.elasticmapreduce.model.InstanceGroupConfig; import com.amazonaws.services.elasticmapreduce.model.JobFlowInstancesConfig; import com.amazonaws.services.elasticmapreduce.model.ManagedScalingPolicy; import com.amazonaws.services.elasticmapreduce.model.RunJobFlowRequest; import com.amazonaws.services.elasticmapreduce.model.RunJobFlowResult; public class CreateClusterWithManagedScalingWithIG { public static void main(String[] args) { AWSCredentials credentialsFromProfile = getCreadentials("AWS-Profile-Name-Here"); /** * Create an Amazon EMR client with the credentials and region specified in order to create the cluster */ AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard() .withCredentials(new AWSStaticCredentialsProvider(credentialsFromProfile)) .withRegion(Regions.US_EAST_1) .build(); /** * Create Instance Groups - Primary, Core, Task */ InstanceGroupConfig instanceGroupConfigMaster = new InstanceGroupConfig() .withInstanceCount(1) .withInstanceRole("MASTER") .withInstanceType("m4.large") .withMarket("ON_DEMAND"); InstanceGroupConfig instanceGroupConfigCore = new InstanceGroupConfig() .withInstanceCount(4) .withInstanceRole("CORE") .withInstanceType("m4.large") .withMarket("ON_DEMAND"); InstanceGroupConfig instanceGroupConfigTask = new InstanceGroupConfig() .withInstanceCount(5) .withInstanceRole("TASK") .withInstanceType("m4.large") .withMarket("ON_DEMAND"); List<InstanceGroupConfig> igConfigs = new ArrayList<>(); igConfigs.add(instanceGroupConfigMaster); igConfigs.add(instanceGroupConfigCore); igConfigs.add(instanceGroupConfigTask); /** * specify applications to be installed and configured when Amazon EMR creates the cluster */ Application hive = new Application().withName("Hive"); Application spark = new Application().withName("Spark"); Application ganglia = new Application().withName("Ganglia"); Application zeppelin = new Application().withName("Zeppelin"); /** * Managed Scaling Configuration - * Using UnitType=Instances for clusters composed of instance groups * * Other options are: * UnitType = VCPU ( for clusters composed of instance groups) * UnitType = InstanceFleetUnits ( for clusters composed of instance fleets) **/ ComputeLimits computeLimits = new ComputeLimits() .withMinimumCapacityUnits(1) .withMaximumCapacityUnits(20) .withUnitType(ComputeLimitsUnitType.Instances); ManagedScalingPolicy managedScalingPolicy = new ManagedScalingPolicy(); managedScalingPolicy.setComputeLimits(computeLimits); // create the cluster with a managed scaling policy RunJobFlowRequest request = new RunJobFlowRequest() .withName("EMR_Managed_Scaling_TestCluster") .withReleaseLabel("emr-7.12.0") // Specifies the version label for the Amazon EMR release; we recommend the latest release .withApplications(hive,spark,ganglia,zeppelin) .withLogUri("s3://path/to/my/emr/logs") // A URI in S3 for log files is required when debugging is enabled. .withServiceRole("EMR_DefaultRole") // If you use a custom IAM service role, replace the default role with the custom role. .withJobFlowRole("EMR_EC2_DefaultRole") // If you use a custom Amazon EMR role for EC2 instance profile, replace the default role with the custom Amazon EMR role. .withInstances(new JobFlowInstancesConfig().withInstanceGroups(igConfigs) .withEc2SubnetId("subnet-123456789012345") .withEc2KeyName("my-ec2-key-name") .withKeepJobFlowAliveWhenNoSteps(true)) .withManagedScalingPolicy(managedScalingPolicy); RunJobFlowResult result = emr.runJobFlow(request); System.out.println("The cluster ID is " + result.toString()); } public static AWSCredentials getCredentials(String profileName) { // specifies any named profile in .aws/credentials as the credentials provider try { return new ProfileCredentialsProvider("AWS-Profile-Name-Here") .getCredentials(); } catch (Exception e) { throw new AmazonClientException( "Cannot load credentials from .aws/credentials file. " + "Make sure that the credentials file exists and that the profile name is defined within it.", e); } } public CreateClusterWithManagedScalingWithIG() { } }