使用服务账户的 IAM 角色(IRSA)设置集群访问权限 - Amazon EMR
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用服务账户的 IAM 角色(IRSA)设置集群访问权限

本节使用一个示例来演示如何配置 Kubernetes 服务账号来代入角色。 Amazon Identity and Access Management 然后,使用该服务账号的 Pod 可以访问该角色有权访问的任何 Amazon 服务。

以下示例运行一个 Spark 应用程序来统计 Amazon S3 中某个文件的字数。为此,您可以设置服务账户的 IAM 角色(IRSA),对 Kubernetes 服务账户进行身份验证和授权。

注意

此示例将“spark-operator”命名空间用于 Spark Operator 和提交 Spark 应用程序的命名空间。

先决条件

在尝试本页的示例之前,请先完成下述先决条件:

配置 Kubernetes 服务账户来代入 IAM 角色

使用以下步骤将 Kubernetes 服务账户配置为代入一个 IAM 角色,Pod 可以使用该角色访问该角色有权访问的 Amazon 服务。

  1. 完成后先决条件 ,使用创建允许 Amazon Command Line Interface 对您上传到 Amazon S3 的文件进行只读访问的文件:example-policy.json

    cat >example-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-pod-bucket", "arn:aws:s3:::my-pod-bucket/*" ] } ] } EOF
  2. 然后,创建 IAM policy example-policy

    aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
  3. 接下来,创建一个 IAM 角色 example-role,将该角色与 Spark 驱动程序的 Kubernetes 服务账户关联起来:

    eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \ --cluster my-cluster --role-name "example-role" \ --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
  4. 创建一个 yaml 文件,其中包含 Spark 驱动程序服务账户所需的集群角色绑定:

    cat >spark-rbac.yaml <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: driver-account-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: spark-role roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit subjects: - kind: ServiceAccount name: driver-account-sa namespace: spark-operator EOF
  5. 应用集群角色绑定配置:

    kubectl apply -f spark-rbac.yaml

kubectl 命令应确认成功创建账户:

serviceaccount/driver-account-sa created clusterrolebinding.rbac.authorization.k8s.io/spark-role configured

通过 Spark Operator 运行应用程序

配置 Kubernetes 服务账户后,您可以运行 Spark 应用程序来统计作为 先决条件 的一部分上传的文本文件中的字数。

  1. 创建一个新文件 word-count.yaml,其中包含字数统计应用程序的 SparkApplication 定义。

    cat >word-count.yaml <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: word-count namespace: spark-operator spec: type: Java mode: cluster image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest" imagePullPolicy: Always mainClass: org.apache.spark.examples.JavaWordCount mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar arguments: - s3://my-pod-bucket/poem.txt hadoopConf: # EMRFS filesystem fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate fs.s3.buffer.dir: /mnt/s3 fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000" mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2" mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true" sparkConf: # Required for EMR Runtime spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native sparkVersion: "3.3.1" restartPolicy: type: Never driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 serviceAccount: my-spark-driver-sa executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 EOF
  2. 提交 Spark 应用程序。

    kubectl apply -f word-count.yaml

    kubectl 命令应返回您已成功创建名为 word-countSparkApplication 对象的确认信息。

    sparkapplication.sparkoperator.k8s.io/word-count configured
  3. 运行以下命令检查 SparkApplication 对象的事件:

    kubectl describe sparkapplication word-count -n spark-operator

    kubectl 命令应返回 SparkApplication 的描述和事件:

    Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SparkApplicationSpecUpdateProcessed 3m2s (x2 over 17h) spark-operator Successfully processed spec update for SparkApplication word-count Warning SparkApplicationPendingRerun 3m2s (x2 over 17h) spark-operator SparkApplication word-count is pending rerun Normal SparkApplicationSubmitted 2m58s (x2 over 17h) spark-operator SparkApplication word-count was submitted successfully Normal SparkDriverRunning 2m56s (x2 over 17h) spark-operator Driver word-count-driver is running Normal SparkExecutorPending 2m50s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is pending Normal SparkExecutorRunning 2m48s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is running Normal SparkDriverCompleted 2m31s (x2 over 17h) spark-operator Driver word-count-driver completed Normal SparkApplicationCompleted 2m31s (x2 over 17h) spark-operator SparkApplication word-count completed Normal SparkExecutorCompleted 2m31s (x2 over 2m31s) spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] completed

应用程序现在正在统计 S3 文件中的字数。要查找统计的字数,请参阅驱动程序的日志文件:

kubectl logs pod/word-count-driver -n spark-operator

kubectl 命令应返回日志文件的内容以及字数统计应用程序的结果。

INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s Software: 1

有关如何通过 Spark 运算符向 Spark 提交应用程序的更多信息,请参阅 SparkApplication在 Apache Spark 的 Kubernetes 运算符(spark-on-k8s 操作员)文档中使用 a。 GitHub