AWS::Kendra::DataSource WebCrawlerSeedUrlConfiguration - Amazon CloudFormation
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

AWS::Kendra::DataSource WebCrawlerSeedUrlConfiguration

Provides the configuration information of the seed or starting point URLs to crawl.

When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use the Amazon Kendra web crawler to index your own webpages, or webpages that you have authorization to index.

Syntax

To declare this entity in your Amazon CloudFormation template, use the following syntax:

JSON

{ "SeedUrls" : [ String, ... ], "WebCrawlerMode" : String }

YAML

SeedUrls: - String WebCrawlerMode: String

Properties

SeedUrls

The list of seed or starting point URLs of the websites you want to crawl.

The list can include a maximum of 100 seed URLs.

Required: Yes

Type: Array of String

Minimum: 0

Maximum: 100

Update requires: No interruption

WebCrawlerMode

You can choose one of the following modes:

  • HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

  • SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

  • EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link to.

The default mode is set to HOST_ONLY.

Required: No

Type: String

Allowed values: HOST_ONLY | SUBDOMAINS | EVERYTHING

Update requires: No interruption