本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
在 EK YuniKorn S 上用作 Amazon EMR 上的 Apache Spark 的自定义调度程序
您可以在 Amazon EMR on EKS 上使用 Spark Operator 或 spark-submit,通过 Kubernetes 自定义调度器运行 Spark 任务。本教程介绍如何在自定义队列中使用 YuniKorn调度器运行 Spark 作业和群组调度。
概览
Apache YuniKorn
创建您的集群并进行设置 YuniKorn
请按照以下步骤部署 Amazon EKS 集群。 Amazon Web Services 区域
(region
) 和可用区 (availabilityZones
) 皆可更改。
-
定义 Amazon EKS 集群:
cat <<EOF >eks-cluster.yaml --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: emr-eks-cluster region:
eu-west-1
vpc: clusterEndpoints: publicAccess: true privateAccess: true iam: withOIDC: true nodeGroups: - name: spark-jobs labels: { app: spark } instanceType: m5.xlarge desiredCapacity: 2 minSize: 2 maxSize: 3 availabilityZones: ["eu-west-1a"
] EOF -
创建集群:
eksctl create cluster -f eks-cluster.yaml
-
创建要在其中执行 Spark 任务的命名空间
spark-job
:kubectl create namespace spark-job
-
接下来,创建 Kubernetes 角色和角色绑定。这是 Spark 任务运行所用服务账户的必要项目。
-
定义 Spark 任务的服务账户、角色和角色绑定。
cat <<EOF >emr-job-execution-rbac.yaml --- apiVersion: v1 kind: ServiceAccount metadata: name: spark-sa namespace: spark-job automountServiceAccountToken: false --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: spark-role namespace: spark-job rules: - apiGroups: ["", "batch","extensions"] resources: ["configmaps","serviceaccounts","events","pods","pods/exec","pods/log","pods/portforward","secrets","services","persistentvolumeclaims"] verbs: ["create","delete","get","list","patch","update","watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-sa-rb namespace: spark-job roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: spark-role subjects: - kind: ServiceAccount name: spark-sa namespace: spark-job EOF
-
使用以下命令应用 Kubernetes 角色和角色绑定的定义:
kubectl apply -f emr-job-execution-rbac.yaml
-
安装和设置 YuniKorn
-
使用以下 kubectl 命令创建用于部署 Yunikorn 调度器的命名空间
yunikorn
:kubectl create namespace yunikorn
-
执行以下 Helm 命令安装调度器:
helm repo add yunikorn https://apache.github.io/yunikorn-release
helm repo update
helm install yunikorn yunikorn/yunikorn --namespace yunikorn
使用 Spark 运算符运行带有 YuniKorn调度程序的 Spark 应用程序
-
如果尚未完成设置,请按照以下各节中的步骤完成设置:
-
运行
helm install spark-operator-demo
命令时包括以下参数:--set batchScheduler.enable=true --set webhook.enable=true
-
创建
SparkApplication
定义文件spark-pi.yaml
。要 YuniKorn 用作作业的调度器,您必须在应用程序定义中添加某些注释和标签。注释和标签会指定任务的队列和要使用的调度策略。
在以下示例中,注释
schedulingPolicyParameters
为应用程序设置了分组调度。然后,该示例创建任务组或任务分组,指定在调度 Pod 开始任务执行之前必须可用的最小容量。最后,在任务组定义中指定使用带"app": "spark"
标签的节点组,如 创建您的集群并进行设置 YuniKorn 小节所述。apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: spark-job spec: type: Scala mode: cluster image: "895885662937.dkr.ecr.
us-west-2
.amazonaws.com/spark/emr-6.10.0:latest
" imagePullPolicy: Always mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar" sparkVersion: "3.3.1" restartPolicy: type: Never volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" yunikorn.apache.org/task-group-name: "spark-driver" yunikorn.apache.org/task-groups: |- [{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }] serviceAccount: spark-sa volumeMounts: - name: "test-volume
" mountPath: "/tmp" executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/task-group-name: "spark-executor" volumeMounts: - name: "test-volume
" mountPath: "/tmp" -
使用以下命令提交 Spark 应用程序。此操作还会创建名为
spark-pi
的SparkApplication
对象:kubectl apply -f spark-pi.yaml
-
使用以下命令检查
SparkApplication
对象的事件:kubectl describe sparkapplication spark-pi --namespace spark-job
第一个 pod 事件将显示 YuniKorn 已调度 pod:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduling 3m12s yunikorn spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation Normal GangScheduling 3m12s yunikorn Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member Normal Scheduled 3m10s yunikorn Successfully assigned spark Normal PodBindSuccessful 3m10s yunikorn Pod spark-operator/ Normal TaskCompleted 2m3s yunikorn Task spark-operator/ Normal Pulling 3m10s kubelet Pulling
使用 YuniKorn调度器运行 Spark 应用程序 spark-submit
-
首先,完成 设置 Amazon EMR on EKS 的 spark-submit 小节中的步骤。
-
设置以下环境变量的值:
export SPARK_HOME=spark-home export MASTER_URL=k8s://
Amazon-EKS-cluster-endpoint
-
使用以下命令提交 Spark 应用程序:
在以下示例中,注释
schedulingPolicyParameters
为应用程序设置了分组调度。然后,该示例创建任务组或任务分组,指定在调度 Pod 开始任务执行之前必须可用的最小容量。最后,在任务组定义中指定使用带"app": "spark"
标签的节点组,如 创建您的集群并进行设置 YuniKorn 小节所述。$SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master $MASTER_URL \ --conf spark.kubernetes.container.image=895885662937.dkr.ecr.
us-west-2
.amazonaws.com/spark/emr-6.10.0:latest
\ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --deploy-mode cluster \ --conf spark.kubernetes.namespace=spark-job \ --conf spark.kubernetes.scheduler.name=yunikorn \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" \ --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }]' \ local:///usr/lib/spark/examples/jars/spark-examples.jar 20 -
使用以下命令检查
SparkApplication
对象的事件:kubectl describe pod
spark-driver-pod
--namespace spark-job第一个 pod 事件将显示 YuniKorn 已调度 pod:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduling 3m12s yunikorn spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation Normal GangScheduling 3m12s yunikorn Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member Normal Scheduled 3m10s yunikorn Successfully assigned spark Normal PodBindSuccessful 3m10s yunikorn Pod spark-operator/ Normal TaskCompleted 2m3s yunikorn Task spark-operator/ Normal Pulling 3m10s kubelet Pulling