Class CfnDataSource.WebCrawlerUrlsProperty
Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.
Inheritance
Implements
Namespace: Amazon.CDK.AWS.Kendra
Assembly: Amazon.CDK.Lib.dll
Syntax (csharp)
public class WebCrawlerUrlsProperty : Object, CfnDataSource.IWebCrawlerUrlsProperty
Syntax (vb)
Public Class WebCrawlerUrlsProperty
Inherits Object
Implements CfnDataSource.IWebCrawlerUrlsProperty
Remarks
You can include website subdomains. You can list up to 100 seed URLs and up to three sitemap URLs.
You can only crawl websites that use the secure communication protocol, Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling.
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use the Amazon Kendra web crawler to index your own webpages, or webpages that you have authorization to index.
ExampleMetadata: fixture=_generated
Examples
// The code below shows an example of how to instantiate this type.
// The values are placeholders you should change.
using Amazon.CDK.AWS.Kendra;
var webCrawlerUrlsProperty = new WebCrawlerUrlsProperty {
SeedUrlConfiguration = new WebCrawlerSeedUrlConfigurationProperty {
SeedUrls = new [] { "seedUrls" },
// the properties below are optional
WebCrawlerMode = "webCrawlerMode"
},
SiteMapsConfiguration = new WebCrawlerSiteMapsConfigurationProperty {
SiteMaps = new [] { "siteMaps" }
}
};
Synopsis
Constructors
WebCrawlerUrlsProperty() |
Properties
SeedUrlConfiguration | Configuration of the seed or starting point URLs of the websites you want to crawl. |
SiteMapsConfiguration | Configuration of the sitemap URLs of the websites you want to crawl. |
Constructors
WebCrawlerUrlsProperty()
public WebCrawlerUrlsProperty()
Properties
SeedUrlConfiguration
Configuration of the seed or starting point URLs of the websites you want to crawl.
public object SeedUrlConfiguration { get; set; }
Property Value
System.Object
Remarks
You can choose to crawl only the website host names, or the website host names with subdomains, or the website host names with subdomains and other domains that the web pages link to.
You can list up to 100 seed URLs.
SiteMapsConfiguration
Configuration of the sitemap URLs of the websites you want to crawl.
public object SiteMapsConfiguration { get; set; }
Property Value
System.Object
Remarks
Only URLs belonging to the same website host names are crawled. You can list up to three sitemap URLs.