Amazon EMR
管理指南
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 Amazon AWS 入门

配置输出位置

Amazon EMR 集群最常见的输出格式是压缩或未压缩的文本文件。通常,把这些文件写入 Amazon S3 存储段。启动该集群前,必须先创建此存储桶。启动集群时,指定 S3 存储桶作为输出位置。

有关更多信息,请参阅以下主题:

创建和配置 Amazon S3 存储段

Amazon EMR (Amazon EMR) uses Amazon S3 to store input data, log files, and output data. Amazon S3 refers to these storage locations as buckets. Buckets have certain restrictions and limitations to conform with Amazon S3 and DNS requirements. For more information, go to Bucket Restrictions and Limitations in the Amazon Simple Storage Service Developers Guide.

This section shows you how to use the Amazon S3 AWS Management Console to create and then set permissions for an Amazon S3 bucket. However, you can also create and set permissions for an Amazon S3 bucket using the Amazon S3 API or the third-party Curl command line tool. For information about Curl, go to Amazon S3 Authentication Tool for Curl. For information about using the Amazon S3 API to create and configure an Amazon S3 bucket, go to the Amazon Simple Storage Service API Reference.

To create an Amazon S3 bucket using the console

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/.

  2. Choose Create Bucket.

    The Create a Bucket dialog box opens.

  3. Enter a bucket name, such as myawsbucket.

    This name should be globally unique, and cannot be the same name used by another bucket.

  4. Select the Region for your bucket. To avoid paying cross-region bandwidth charges, create the Amazon S3 bucket in the same region as your cluster.

    Refer to 选择 AWS 区域 for guidance on choosing a Region.

  5. Choose Create.

You created a bucket with the URI s3n://myawsbucket/.

注意

If you enable logging in the Create a Bucket wizard, it enables only bucket access logs, not Amazon EMR cluster logs.

注意

For more information on specifying Region-specific buckets, refer to Buckets and Regions in the Amazon Simple Storage Service Developer Guide and Available Region Endpoints for the AWS SDKs .

After you create your bucket you can set the appropriate permissions on it. Typically, you give yourself (the owner) read and write access and authenticated users read access.

To set permissions on an Amazon S3 bucket using the console

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/.

  2. In the Buckets pane, open (right-click) the bucket you just created.

  3. Select Properties.

  4. In the Properties pane, select the Permissions tab.

  5. Choose Add more permissions.

  6. Select Authenticated Users in the Grantee field.

  7. To the right of the Grantee drop-down list, select List.

  8. Choose Save.

You have created a bucket and restricted permissions to authenticated users.

Required Amazon S3 buckets must exist before you can create a cluster. You must upload any required scripts or data referenced in the cluster to Amazon S3. The following table describes example data, scripts, and log file locations.

Information Example Location on Amazon S3
script or program s3://myawsbucket/script/MapperScript.py
log files s3://myawsbucket/logs
input data s3://myawsbucket/input
output data s3://myawsbucket/output