AWS::Kendra::DataSource WebCrawlerSeedUrlConfiguration
Provides the configuration information of the seed or starting point URLs to crawl.
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy
Syntax
To declare this entity in your Amazon CloudFormation template, use the following syntax:
JSON
{ "SeedUrls" :
[ String, ... ]
, "WebCrawlerMode" :String
}
YAML
SeedUrls:
- String
WebCrawlerMode:String
Properties
SeedUrls
-
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
Required: Yes
Type: Array of String
Minimum:
0
Maximum:
100
Update requires: No interruption
WebCrawlerMode
-
You can choose one of the following modes:
-
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled. -
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled. -
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages link to.
The default mode is set to
HOST_ONLY
.Required: No
Type: String
Allowed values:
HOST_ONLY | SUBDOMAINS | EVERYTHING
Update requires: No interruption
-