概览创建集群安装 YuniKorn 提交：Spark Operator 提交：spark-submit

在 EK YuniKorn S 上用作 Amazon EMR 上的 Apache Spark 的自定义调度程序

您可以在 Amazon EMR on EKS 上使用 Spark Operator 或 spark-submit，通过 Kubernetes 自定义调度器运行 Spark 任务。本教程介绍如何在自定义队列中使用 YuniKorn调度器运行 Spark 作业和群组调度。

概览

Apache YuniKorn 可以通过应用感知调度来帮助管理 Spark 调度，这样你就可以精细地控制资源配额和优先级。使用帮派排程时， YuniKorn 只有在可以满足应用程序的最低资源请求时，才会对应用程序进行调度。有关更多信息，请参阅 Apache YuniKorn 文档网站上的什么是帮派计划。

创建您的集群并进行设置 YuniKorn

请按照以下步骤部署 Amazon EKS 集群。 Amazon Web Services 区域 (region) 和可用区 (availabilityZones) 皆可更改。

定义 Amazon EKS 集群：


cat <<EOF >eks-cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: emr-eks-cluster
  region: eu-west-1

vpc:
  clusterEndpoints:
    publicAccess: true
    privateAccess: true

iam:
  withOIDC: true
  
nodeGroups:
  - name: spark-jobs
    labels: { app: spark }
    instanceType: m5.xlarge
    desiredCapacity: 2
    minSize: 2
    maxSize: 3
    availabilityZones: ["eu-west-1a"]
EOF

创建集群：


eksctl create cluster -f eks-cluster.yaml

创建要在其中执行 Spark 任务的命名空间 spark-job：
```
kubectl create namespace spark-job
```

接下来，创建 Kubernetes 角色和角色绑定。这是 Spark 任务运行所用服务账户的必要项目。

定义 Spark 任务的服务账户、角色和角色绑定。


cat <<EOF >emr-job-execution-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-sa
  namespace: spark-job
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-role
  namespace: spark-job
rules:
  - apiGroups: ["", "batch","extensions"]
    resources: ["configmaps","serviceaccounts","events","pods","pods/exec","pods/log","pods/portforward","secrets","services","persistentvolumeclaims"]
    verbs: ["create","delete","get","list","patch","update","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-sa-rb
  namespace: spark-job
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: spark-role
subjects:
  - kind: ServiceAccount
    name: spark-sa
    namespace: spark-job
EOF

使用以下命令应用 Kubernetes 角色和角色绑定的定义：
```
kubectl apply -f emr-job-execution-rbac.yaml
```

安装和设置 YuniKorn

使用以下 kubectl 命令创建用于部署 Yunikorn 调度器的命名空间 yunikorn：
```
kubectl create namespace yunikorn
```

执行以下 Helm 命令安装调度器：


helm repo add yunikorn https://apache.github.io/yunikorn-release


helm repo update


helm install yunikorn yunikorn/yunikorn --namespace yunikorn

使用 Spark 运算符运行带有 YuniKorn调度程序的 Spark 应用程序

如果尚未完成设置，请按照以下各节中的步骤完成设置：
1. 创建您的集群并进行设置 YuniKorn
2. 安装和设置 YuniKorn
3. 设置 Amazon EMR on EKS 的 Spark Operator
4. 安装 Spark Operator
  
  运行 helm install spark-operator-demo 命令时包括以下参数：
```
--set batchScheduler.enable=true 
--set webhook.enable=true
```

创建 SparkApplication 定义文件 spark-pi.yaml。

要 YuniKorn 用作作业的调度器，您必须在应用程序定义中添加某些注释和标签。注释和标签会指定任务的队列和要使用的调度策略。

在以下示例中，注释 schedulingPolicyParameters 为应用程序设置了分组调度。然后，该示例创建任务组或任务分组，指定在调度 Pod 开始任务执行之前必须可用的最小容量。最后，在任务组定义中指定使用带 "app": "spark" 标签的节点组，如创建您的集群并进行设置 YuniKorn 小节所述。


apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-job
spec:
  type: Scala
  mode: cluster
  image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.3.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.3.1
    annotations:
      yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard"
      yunikorn.apache.org/task-group-name: "spark-driver"
      yunikorn.apache.org/task-groups: |-
        [{
            "name": "spark-driver",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
          },
          {
            "name": "spark-executor",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
        }]
    serviceAccount: spark-sa
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.3.1
    annotations:
      yunikorn.apache.org/task-group-name: "spark-executor"
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

使用以下命令提交 Spark 应用程序。此操作还会创建名为 spark-pi 的 SparkApplication 对象：
```
kubectl apply -f spark-pi.yaml
```

使用以下命令检查 SparkApplication 对象的事件：


kubectl describe sparkapplication spark-pi --namespace spark-job

第一个 pod 事件将显示 YuniKorn 已调度 pod：

Type    Reason            Age   From                          Message
----    ------            ----  ----                          -------
Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal Scheduled         3m10s yunikorn   Successfully assigned spark
Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
Normal Pulling           3m10s kubelet    Pulling

使用 YuniKorn调度器运行 Spark 应用程序 `spark-submit`

首先，完成设置 Amazon EMR on EKS 的 spark-submit 小节中的步骤。

设置以下环境变量的值：


export SPARK_HOME=spark-home
export MASTER_URL=k8s://Amazon-EKS-cluster-endpoint

使用以下命令提交 Spark 应用程序：


$SPARK_HOME/bin/spark-submit \
 --class org.apache.spark.examples.SparkPi \
 --master $MASTER_URL \
 --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
 --deploy-mode cluster \
 --conf spark.kubernetes.namespace=spark-job \
 --conf spark.kubernetes.scheduler.name=yunikorn \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" \
 --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{
            "name": "spark-driver",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
          },
          {
            "name": "spark-executor",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
        }]' \
 local:///usr/lib/spark/examples/jars/spark-examples.jar 20

使用以下命令检查 SparkApplication 对象的事件：


kubectl describe pod spark-driver-pod --namespace spark-job

第一个 pod 事件将显示 YuniKorn 已调度 pod：

Type    Reason           Age   From                          Message
----    ------           ----  ----                          -------
Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal Scheduled         3m10s yunikorn   Successfully assigned spark
Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
Normal Pulling           3m10s kubelet    Pulling

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

使用 Volcano

安全性

在 EK YuniKorn S 上用作 Amazon EMR 上的 Apache Spark 的自定义调度程序

概览

创建您的集群并进行设置 YuniKorn

安装和设置 YuniKorn

使用 Spark 运算符运行带有 YuniKorn调度程序的 Spark 应用程序

使用 YuniKorn调度器运行 Spark 应用程序 spark-submit

使用 YuniKorn调度器运行 Spark 应用程序 `spark-submit`