Configuring a crawler to populate the Amazon Glue Data Catalog - Amazon IoT SiteWise
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Configuring a crawler to populate the Amazon Glue Data Catalog

Amazon Glue crawlers crawl data stores to populate tables in the Amazon Glue Data Catalog. In this procedure, you create and run an Amazon Glue crawler for your S3 bucket that contains exported asset data. The crawler creates a table for asset property updates and a table for asset metadata. Then, you can perform SQL queries on these tables with Athena. For more information, see Populating the Amazon Glue Data Catalog and Defining crawlers in the Amazon Glue Developer Guide.

To create an Amazon Glue crawler
  1. Navigate to the Amazon Glue console.

  2. In the navigation pane, choose Crawlers.

  3. Choose Add crawler.

  4. On the Add crawler page, do the following:

    1. Enter a name for your crawler, such as IoTSiteWiseDataCrawler, and then choose Next.

    2. For Crawler source type, choose Data stores, and then choose Next.

    3. On the Add a data store page, do the following:

      1. For Choose a data store, choose S3.

      2. In Include path, enter s3://DOC-EXAMPLE-BUCKET1 to add your asset data bucket as a data store. Replace DOC-EXAMPLE-BUCKET1 with the bucket name that you chose when you created the stack.

      3. Choose Next.

        
                      Amazon Glue crawler "Add a data store"
                        screenshot.
    4. On the Add another data store page, choose No, and then choose Next.

    5. On the Choose an IAM role page, do the following:

      1. To create a new service role that allows Amazon Glue to access the S3 bucket, choose Create an IAM role.

      2. Enter a suffix for your role's name, such as IoTSiteWiseDataCrawler.

      3. Choose Next.

    6. For Frequency, choose Hourly, and then choose Next. The crawler updates the tables with new data each time it runs, so you can choose any frequency that fits your use case.

    7. On the Configure the crawler's output page, do the following:

      1. Choose Add database to create an Amazon Glue database for your asset data.

      2. Enter a name for the database, such as iot_sitewise_asset_database.

      3. Choose Create.

      4. Choose Next.

    8. Review the crawler details, and then choose Finish.

      
                  Amazon Glue crawler "Review crawler details" screenshot.

By default, your new crawler doesn't immediately run. You must manually run it or wait until it runs on its configured schedule.

To run a crawler
  1. On the Crawlers page, select the check box for your new crawler, and then choose Run crawler.

    
              Amazon Glue "Crawlers" screenshot with "Run crawler" highlighted.
  2. Wait until the crawler finishes and has a status of Ready.

    The crawler can take several minutes to run, and its status updates automatically.

  3. In the navigation pane, choose Tables.

    You should see two new tables: asset_metadata and asset_property_updates.