在 EK YuniKorn S 上用作 Amazon EMR 上的 Apache Spark 的自定义调度程序 - Amazon EMR
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

在 EK YuniKorn S 上用作 Amazon EMR 上的 Apache Spark 的自定义调度程序

您可以在 Amazon EMR on EKS 上使用 Spark Operator 或 spark-submit,通过 Kubernetes 自定义调度器运行 Spark 任务。本教程介绍如何在自定义队列中使用 YuniKorn调度器运行 Spark 作业和群组调度。

概览

Apache YuniKorn 可以通过应用感知调度来帮助管理 Spark 调度,这样你就可以精细地控制资源配额和优先级。使用帮派排程时, YuniKorn 只有在可以满足应用程序的最低资源请求时,才会对应用程序进行调度。有关更多信息,请参阅 Apache YuniKorn 文档网站上的什么是帮派计划

创建您的集群并进行设置 YuniKorn

请按照以下步骤部署 Amazon EKS 集群。 Amazon Web Services 区域 (region) 和可用区 (availabilityZones) 皆可更改。

  1. 定义 Amazon EKS 集群:

    cat <<EOF >eks-cluster.yaml --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: emr-eks-cluster region: eu-west-1 vpc: clusterEndpoints: publicAccess: true privateAccess: true iam: withOIDC: true nodeGroups: - name: spark-jobs labels: { app: spark } instanceType: m5.xlarge desiredCapacity: 2 minSize: 2 maxSize: 3 availabilityZones: ["eu-west-1a"] EOF
  2. 创建集群:

    eksctl create cluster -f eks-cluster.yaml
  3. 创建要在其中执行 Spark 任务的命名空间 spark-job

    kubectl create namespace spark-job
  4. 接下来,创建 Kubernetes 角色和角色绑定。这是 Spark 任务运行所用服务账户的必要项目。

    1. 定义 Spark 任务的服务账户、角色和角色绑定。

      cat <<EOF >emr-job-execution-rbac.yaml --- apiVersion: v1 kind: ServiceAccount metadata: name: spark-sa namespace: spark-job automountServiceAccountToken: false --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: spark-role namespace: spark-job rules: - apiGroups: ["", "batch","extensions"] resources: ["configmaps","serviceaccounts","events","pods","pods/exec","pods/log","pods/portforward","secrets","services","persistentvolumeclaims"] verbs: ["create","delete","get","list","patch","update","watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-sa-rb namespace: spark-job roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: spark-role subjects: - kind: ServiceAccount name: spark-sa namespace: spark-job EOF
    2. 使用以下命令应用 Kubernetes 角色和角色绑定的定义:

      kubectl apply -f emr-job-execution-rbac.yaml

安装和设置 YuniKorn

  1. 使用以下 kubectl 命令创建用于部署 Yunikorn 调度器的命名空间 yunikorn

    kubectl create namespace yunikorn
  2. 执行以下 Helm 命令安装调度器:

    helm repo add yunikorn https://apache.github.io/yunikorn-release
    helm repo update
    helm install yunikorn yunikorn/yunikorn --namespace yunikorn

使用 Spark 运算符运行带有 YuniKorn调度程序的 Spark 应用程序

  1. 如果尚未完成设置,请按照以下各节中的步骤完成设置:

    1. 创建您的集群并进行设置 YuniKorn

    2. 安装和设置 YuniKorn

    3. 设置 Amazon EMR on EKS 的 Spark Operator

    4. 安装 Spark Operator

      运行 helm install spark-operator-demo 命令时包括以下参数:

      --set batchScheduler.enable=true --set webhook.enable=true
  2. 创建 SparkApplication 定义文件 spark-pi.yaml

    要 YuniKorn 用作作业的调度器,您必须在应用程序定义中添加某些注释和标签。注释和标签会指定任务的队列和要使用的调度策略。

    在以下示例中,注释 schedulingPolicyParameters 为应用程序设置了分组调度。然后,该示例创建任务组或任务分组,指定在调度 Pod 开始任务执行之前必须可用的最小容量。最后,在任务组定义中指定使用带 "app": "spark" 标签的节点组,如 创建您的集群并进行设置 YuniKorn 小节所述。

    apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: spark-job spec: type: Scala mode: cluster image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest" imagePullPolicy: Always mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar" sparkVersion: "3.3.1" restartPolicy: type: Never volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" yunikorn.apache.org/task-group-name: "spark-driver" yunikorn.apache.org/task-groups: |- [{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }] serviceAccount: spark-sa volumeMounts: - name: "test-volume" mountPath: "/tmp" executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/task-group-name: "spark-executor" volumeMounts: - name: "test-volume" mountPath: "/tmp"
  3. 使用以下命令提交 Spark 应用程序。此操作还会创建名为 spark-piSparkApplication 对象:

    kubectl apply -f spark-pi.yaml
  4. 使用以下命令检查 SparkApplication 对象的事件:

    kubectl describe sparkapplication spark-pi --namespace spark-job

    第一个 pod 事件将显示 YuniKorn 已调度 pod:

    Type    Reason            Age   From                          Message
    ----    ------            ----  ----                          -------
    Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
    Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
    Normal Scheduled         3m10s yunikorn   Successfully assigned spark
    Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
    Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
    Normal Pulling           3m10s kubelet    Pulling

使用 YuniKorn调度器运行 Spark 应用程序 spark-submit

  1. 首先,完成 设置 Amazon EMR on EKS 的 spark-submit 小节中的步骤。

  2. 设置以下环境变量的值:

    export SPARK_HOME=spark-home export MASTER_URL=k8s://Amazon-EKS-cluster-endpoint
  3. 使用以下命令提交 Spark 应用程序:

    在以下示例中,注释 schedulingPolicyParameters 为应用程序设置了分组调度。然后,该示例创建任务组或任务分组,指定在调度 Pod 开始任务执行之前必须可用的最小容量。最后,在任务组定义中指定使用带 "app": "spark" 标签的节点组,如 创建您的集群并进行设置 YuniKorn 小节所述。

    $SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master $MASTER_URL \ --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --deploy-mode cluster \ --conf spark.kubernetes.namespace=spark-job \ --conf spark.kubernetes.scheduler.name=yunikorn \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" \ --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }]' \ local:///usr/lib/spark/examples/jars/spark-examples.jar 20
  4. 使用以下命令检查 SparkApplication 对象的事件:

    kubectl describe pod spark-driver-pod --namespace spark-job

    第一个 pod 事件将显示 YuniKorn 已调度 pod:

    Type    Reason           Age   From                          Message
    ----    ------           ----  ----                          -------
    Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
    Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
    Normal Scheduled         3m10s yunikorn   Successfully assigned spark
    Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
    Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
    Normal Pulling           3m10s kubelet    Pulling