Amazon Glue使用的示例Amazon SDK for .NET - Amazon SDK for .NET
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

Amazon Glue使用的示例Amazon SDK for .NET

以下代码示例显示如何通过使用来执行操作和实施常见场景Amazon SDK for .NET和Amazon Glue.

操作是展示如何打电话给个人Amazon Glue函数。

方案展示如何通过调用 multiple 来完成特定任务的代码示例Amazon Glue函数。

每个示例都包含一个指向以下内容的链接 GitHub,其中包含了有关如何在上下文中设置和运行代码的说明。

操作

以下代码示例显示如何创建Amazon Glue爬网程序。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Creates an AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="iam">The Amazon Resource Name (ARN) of the IAM role /// that is used by the crawler.</param> /// <param name="s3Path">The path to the Amazon S3 bucket where /// data is stored.</param> /// <param name="cron">The name of the CRON job that runs the crawler.</param> /// <param name="dbName">The name of the database.</param> /// <param name="crawlerName">The name of the AWS Glue crawler.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler was /// created successfully.</returns> public static async Task<bool> CreateGlueCrawlerAsync( AmazonGlueClient glueClient, string iam, string s3Path, string cron, string dbName, string crawlerName) { var s3Target = new S3Target { Path = s3Path, }; var targetList = new List<S3Target> { s3Target, }; var targets = new CrawlerTargets { S3Targets = targetList, }; var crawlerRequest = new CreateCrawlerRequest { DatabaseName = dbName, Name = crawlerName, Description = "Created by the AWS Glue .NET API", Targets = targets, Role = iam, Schedule = cron, }; var response = await glueClient.CreateCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was successfully created"); return true; } else { Console.WriteLine($"Could not create {crawlerName}."); return false; } }
  • 有关详细信息,请参阅。CreateCrawlerAmazon SDK for .NETAPI 参考.

以下代码示例显示如何创建Amazon Glue作业定义。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Creates an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The name of the job to create.</param> /// <param name="iam">The Amazon Resource Name (ARN) of the IAM role /// that will be used by the job.</param> /// <param name="scriptLocation">The location where the script is stored.</param> /// <returns>A Boolean value indicating whether the AWS Glue job was /// created successfully.</returns> public static async Task<bool> CreateJobAsync(AmazonGlueClient glueClient, string jobName, string iam, string scriptLocation) { var command = new JobCommand { PythonVersion = "3", Name = "MyJob1", ScriptLocation = scriptLocation, }; var jobRequest = new CreateJobRequest { Description = "A Job created by using the AWS SDK for .NET", GlueVersion = "2.0", WorkerType = WorkerType.G1X, NumberOfWorkers = 10, Name = jobName, Role = iam, Command = command, }; var response = await glueClient.CreateJobAsync(jobRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} was successfully created."); return true; } Console.WriteLine($"{jobName} could not be created."); return false; }
  • 有关详细信息,请参阅。CreateJobAmazon SDK for .NETAPI 参考.

以下代码示例显示如何删除Amazon Glue爬网程序。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Deletes the named AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawler to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler was /// deleted successfully.</returns> public static async Task<bool> DeleteSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var deleteCrawlerRequest = new DeleteCrawlerRequest { Name = crawlerName, }; var response = await glueClient.DeleteCrawlerAsync(deleteCrawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was deleted"); return true; } Console.WriteLine($"Could not create {crawlerName}."); return false; }
  • 有关详细信息,请参阅。DeleteCrawlerAmazon SDK for .NETAPI 参考.

以下代码示例显示如何从中删除数据库Amazon Glue Data Catalog.

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Deletes an AWS Glue database. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="databaseName">The name of the database to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue database was /// deleted successfully.</returns> public static async Task<bool> DeleteDatabaseAsync(AmazonGlueClient glueClient, string databaseName) { var request = new DeleteDatabaseRequest { Name = databaseName, }; var response = await glueClient.DeleteDatabaseAsync(request); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{databaseName} was successfully deleted"); return true; } Console.WriteLine($"{databaseName} could not be deleted."); return false; }
  • 有关详细信息,请参阅。DeleteDatabaseAmazon SDK for .NETAPI 参考.

以下代码示例显示如何删除Amazon Glue作业定义和所有相关的运行。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Deletes the named job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The name of the job to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue job was /// deleted successfully.</returns> public static async Task<bool> DeleteJobAsync(AmazonGlueClient glueClient, string jobName) { var jobRequest = new DeleteJobRequest { JobName = jobName, }; var response = await glueClient.DeleteJobAsync(jobRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} was successfully deleted"); return true; } Console.WriteLine($"{jobName} could not be deleted."); return false; }
  • 有关详细信息,请参阅。DeleteJobAmazon SDK for .NETAPI 参考.

以下以下以下以下代码示例显示如何获取Amazon Glue爬网程序。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Retrieves information about a specific AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawer.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue crawler was retrieved successfully.</returns> public static async Task<bool> GetSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var crawlerRequest = new GetCrawlerRequest { Name = crawlerName, }; var response = await glueClient.GetCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { var databaseName = response.Crawler.DatabaseName; Console.WriteLine($"{crawlerName} has the database {databaseName}"); return true; } Console.WriteLine($"No information regarding {crawlerName} could be found."); return false; }
  • 有关详细信息,请参阅。GetCrawlerAmazon SDK for .NETAPI 参考.

以下代码示例显示如何从Amazon Glue Data Catalog.

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Gets information about the database created for this Glue /// example. /// </summary> /// <param name="glueClient">The initialized Glue client.</param> /// <param name="databaseName">The name of the AWS Glue database.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue database was retrieved successfully.</returns> public static async Task<bool> GetSpecificDatabaseAsync( AmazonGlueClient glueClient, string databaseName) { var databasesRequest = new GetDatabaseRequest { Name = databaseName, }; var response = await glueClient.GetDatabaseAsync(databasesRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"The Create Time is {response.Database.CreateTime}"); return true; } Console.WriteLine($"No informaton about {databaseName}."); return false; }
  • 有关详细信息,请参阅。GetDatabaseAmazon SDK for .NETAPI 参考.

以下代码示例演示如何获取Amazon Glue工作。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Retrieves information about an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The AWS Glue object for which to retrieve run /// information.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue job runs was retrieved successfully.</returns> public static async Task<bool> GetJobRunsAsync(AmazonGlueClient glueClient, string jobName) { var runsRequest = new GetJobRunsRequest { JobName = jobName, MaxResults = 20, }; var response = await glueClient.GetJobRunsAsync(runsRequest); var jobRuns = response.JobRuns; if (jobRuns.Count > 0) { foreach (JobRun jobRun in jobRuns) { Console.WriteLine($"Job run state is {jobRun.JobRunState}"); Console.WriteLine($"Job run Id is {jobRun.Id}"); Console.WriteLine($"The Glue version is {jobRun.GlueVersion}"); } return true; } else { Console.WriteLine("No jobs found."); return false; } }
  • 有关详细信息,请参阅。GetJobRunsAmazon SDK for .NETAPI 参考.

以下代码示例显示如何从数据库获取表Amazon Glue Data Catalog.

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Gets the tables used by the database for an AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="dbName">The name of the database.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue tables was retrieved successfully.</returns> public static async Task<bool> GetGlueTablesAsync( AmazonGlueClient glueClient, string dbName) { var tableRequest = new GetTablesRequest { DatabaseName = dbName, }; // Get the list of AWS Glue databases. var response = await glueClient.GetTablesAsync(tableRequest); var tables = response.TableList; if (tables.Count > 0) { // Displays the list of table names. tables.ForEach(table => { Console.WriteLine($"Table name is: {table.Name}"); }); return true; } else { Console.WriteLine("No tables found."); return false; } }
  • 有关详细信息,请参阅。GetTablesAmazon SDK for .NETAPI 参考.

以下以下以下以下代码示例显示如何启动Amazon Glue爬网程序。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Starts the named AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawler to start.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler /// was started successfully.</returns> public static async Task<bool> StartSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var crawlerRequest = new StartCrawlerRequest { Name = crawlerName, }; var response = await glueClient.StartCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was successfully started!"); return true; } Console.WriteLine($"Could not start AWS Glue crawler, {crawlerName}."); return false; }
  • 有关详细信息,请参阅。StartCrawlerAmazon SDK for .NETAPI 参考.

以下以下以下以下代码示例显示如何启动Amazon Glue任务运行。

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

/// <summary> /// Starts an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized Glue client.</param> /// <param name="jobName">The name of the AWS Glue job to start.</param> /// <returns>A Boolean value indicating whether the AWS Glue job /// was started successfully.</returns> public static async Task<bool> StartJobAsync(AmazonGlueClient glueClient, string jobName) { var runRequest = new StartJobRunRequest { WorkerType = WorkerType.G1X, NumberOfWorkers = 10, JobName = jobName, }; var response = await glueClient.StartJobRunAsync(runRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} successfully started. The job run id is {response.JobRunId}."); return true; } Console.WriteLine($"Could not start {jobName}."); return false; }
  • 有关详细信息,请参阅。StartJobRunAmazon SDK for .NETAPI 参考.

场景

以下代码示例显示了如何:

  • 创建并运行一个抓取工具,它可以抓取公共亚马逊Simple Storage Service (Amazon S3) 存储桶并生成描述其找到的 CSV 格式数据的元数据数据库。

  • 列出您的数据库和表的相关信息Amazon Glue Data Catalog.

  • 创建并运行一个任务,该任务从源 Amazon S3 存储桶提取 CSV 数据,通过删除和重命名字段对其进行转换,然后将 JSON 格式的输出加载到另一个 Amazon S3 存储桶中。

  • 列出有关作业运行的信息并查看一些转换后的数据。

  • 删除演示创建的所有资源。

有关更多信息,请参阅教程:开始使用Amazon Glue工作室.

Amazon SDK for .NET
提示

要了解如何设置和运行此示例,请参阅GitHub.

创建一个封装的类Amazon Glue场景中使用的函数。

namespace Glue_Basics { /// <summary> /// Methods for working the AWS Glue by using the AWS SDK for .NET (v3.7). /// </summary> public static class GlueMethods { /// <summary> /// Creates a database for use by an AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="dbName">The name of the new database.</param> /// <param name="locationUri">The location of scripts that will be /// used by the AWS Glue crawler.</param> /// <returns>A Boolean value indicating whether the AWS Glue database /// was created successfully.</returns> public static async Task<bool> CreateDatabaseAsync(AmazonGlueClient glueClient, string dbName, string locationUri) { try { var dataBaseInput = new DatabaseInput { Description = "Built with the AWS SDK for .NET (v3)", Name = dbName, LocationUri = locationUri, }; var request = new CreateDatabaseRequest { DatabaseInput = dataBaseInput, }; var response = await glueClient.CreateDatabaseAsync(request); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine("The database was successfully created"); return true; } else { Console.WriteLine("Could not create the database."); return false; } } catch (AmazonGlueException ex) { Console.WriteLine($"Error occurred: '{ex.Message}'"); return false; } } /// <summary> /// Creates an AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="iam">The Amazon Resource Name (ARN) of the IAM role /// that is used by the crawler.</param> /// <param name="s3Path">The path to the Amazon S3 bucket where /// data is stored.</param> /// <param name="cron">The name of the CRON job that runs the crawler.</param> /// <param name="dbName">The name of the database.</param> /// <param name="crawlerName">The name of the AWS Glue crawler.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler was /// created successfully.</returns> public static async Task<bool> CreateGlueCrawlerAsync( AmazonGlueClient glueClient, string iam, string s3Path, string cron, string dbName, string crawlerName) { var s3Target = new S3Target { Path = s3Path, }; var targetList = new List<S3Target> { s3Target, }; var targets = new CrawlerTargets { S3Targets = targetList, }; var crawlerRequest = new CreateCrawlerRequest { DatabaseName = dbName, Name = crawlerName, Description = "Created by the AWS Glue .NET API", Targets = targets, Role = iam, Schedule = cron, }; var response = await glueClient.CreateCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was successfully created"); return true; } else { Console.WriteLine($"Could not create {crawlerName}."); return false; } } /// <summary> /// Creates an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The name of the job to create.</param> /// <param name="iam">The Amazon Resource Name (ARN) of the IAM role /// that will be used by the job.</param> /// <param name="scriptLocation">The location where the script is stored.</param> /// <returns>A Boolean value indicating whether the AWS Glue job was /// created successfully.</returns> public static async Task<bool> CreateJobAsync(AmazonGlueClient glueClient, string jobName, string iam, string scriptLocation) { var command = new JobCommand { PythonVersion = "3", Name = "MyJob1", ScriptLocation = scriptLocation, }; var jobRequest = new CreateJobRequest { Description = "A Job created by using the AWS SDK for .NET", GlueVersion = "2.0", WorkerType = WorkerType.G1X, NumberOfWorkers = 10, Name = jobName, Role = iam, Command = command, }; var response = await glueClient.CreateJobAsync(jobRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} was successfully created."); return true; } Console.WriteLine($"{jobName} could not be created."); return false; } /// <summary> /// Deletes the named AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawler to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler was /// deleted successfully.</returns> public static async Task<bool> DeleteSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var deleteCrawlerRequest = new DeleteCrawlerRequest { Name = crawlerName, }; var response = await glueClient.DeleteCrawlerAsync(deleteCrawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was deleted"); return true; } Console.WriteLine($"Could not create {crawlerName}."); return false; } /// <summary> /// Deletes an AWS Glue database. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="databaseName">The name of the database to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue database was /// deleted successfully.</returns> public static async Task<bool> DeleteDatabaseAsync(AmazonGlueClient glueClient, string databaseName) { var request = new DeleteDatabaseRequest { Name = databaseName, }; var response = await glueClient.DeleteDatabaseAsync(request); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{databaseName} was successfully deleted"); return true; } Console.WriteLine($"{databaseName} could not be deleted."); return false; } /// <summary> /// Deletes the named job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The name of the job to delete.</param> /// <returns>A Boolean value indicating whether the AWS Glue job was /// deleted successfully.</returns> public static async Task<bool> DeleteJobAsync(AmazonGlueClient glueClient, string jobName) { var jobRequest = new DeleteJobRequest { JobName = jobName, }; var response = await glueClient.DeleteJobAsync(jobRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} was successfully deleted"); return true; } Console.WriteLine($"{jobName} could not be deleted."); return false; } /// <summary> /// Gets a list of AWS Glue jobs. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <returns>A Boolean value indicating whether information about the /// AWS Glue jobs was retrieved successfully.</returns> /// <returns>A Boolean value indicating whether information about /// all AWS Glue jobs was retrieved.</returns> public static async Task<bool> GetAllJobsAsync(AmazonGlueClient glueClient) { var jobsRequest = new GetJobsRequest { MaxResults = 10, }; var response = await glueClient.GetJobsAsync(jobsRequest); var jobs = response.Jobs; if (jobs.Count > 0) { jobs.ForEach(job => { Console.WriteLine($"The job name is: {job.Name}"); }); return true; } else { Console.WriteLine("Didn't find any jobs."); return false; } } /// <summary> /// Gets the tables used by the database for an AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="dbName">The name of the database.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue tables was retrieved successfully.</returns> public static async Task<bool> GetGlueTablesAsync( AmazonGlueClient glueClient, string dbName) { var tableRequest = new GetTablesRequest { DatabaseName = dbName, }; // Get the list of AWS Glue databases. var response = await glueClient.GetTablesAsync(tableRequest); var tables = response.TableList; if (tables.Count > 0) { // Displays the list of table names. tables.ForEach(table => { Console.WriteLine($"Table name is: {table.Name}"); }); return true; } else { Console.WriteLine("No tables found."); return false; } } /// <summary> /// Retrieves information about an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="jobName">The AWS Glue object for which to retrieve run /// information.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue job runs was retrieved successfully.</returns> public static async Task<bool> GetJobRunsAsync(AmazonGlueClient glueClient, string jobName) { var runsRequest = new GetJobRunsRequest { JobName = jobName, MaxResults = 20, }; var response = await glueClient.GetJobRunsAsync(runsRequest); var jobRuns = response.JobRuns; if (jobRuns.Count > 0) { foreach (JobRun jobRun in jobRuns) { Console.WriteLine($"Job run state is {jobRun.JobRunState}"); Console.WriteLine($"Job run Id is {jobRun.Id}"); Console.WriteLine($"The Glue version is {jobRun.GlueVersion}"); } return true; } else { Console.WriteLine("No jobs found."); return false; } } /// <summary> /// Retrieves information about a specific AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawer.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue crawler was retrieved successfully.</returns> public static async Task<bool> GetSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var crawlerRequest = new GetCrawlerRequest { Name = crawlerName, }; var response = await glueClient.GetCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { var databaseName = response.Crawler.DatabaseName; Console.WriteLine($"{crawlerName} has the database {databaseName}"); return true; } Console.WriteLine($"No information regarding {crawlerName} could be found."); return false; } /// <summary> /// Gets information about the database created for this Glue /// example. /// </summary> /// <param name="glueClient">The initialized Glue client.</param> /// <param name="databaseName">The name of the AWS Glue database.</param> /// <returns>A Boolean value indicating whether information about /// the AWS Glue database was retrieved successfully.</returns> public static async Task<bool> GetSpecificDatabaseAsync( AmazonGlueClient glueClient, string databaseName) { var databasesRequest = new GetDatabaseRequest { Name = databaseName, }; var response = await glueClient.GetDatabaseAsync(databasesRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"The Create Time is {response.Database.CreateTime}"); return true; } Console.WriteLine($"No informaton about {databaseName}."); return false; } /// <summary> /// Starts an AWS Glue job. /// </summary> /// <param name="glueClient">The initialized Glue client.</param> /// <param name="jobName">The name of the AWS Glue job to start.</param> /// <returns>A Boolean value indicating whether the AWS Glue job /// was started successfully.</returns> public static async Task<bool> StartJobAsync(AmazonGlueClient glueClient, string jobName) { var runRequest = new StartJobRunRequest { WorkerType = WorkerType.G1X, NumberOfWorkers = 10, JobName = jobName, }; var response = await glueClient.StartJobRunAsync(runRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{jobName} successfully started. The job run id is {response.JobRunId}."); return true; } Console.WriteLine($"Could not start {jobName}."); return false; } /// <summary> /// Starts the named AWS Glue crawler. /// </summary> /// <param name="glueClient">The initialized AWS Glue client.</param> /// <param name="crawlerName">The name of the crawler to start.</param> /// <returns>A Boolean value indicating whether the AWS Glue crawler /// was started successfully.</returns> public static async Task<bool> StartSpecificCrawlerAsync(AmazonGlueClient glueClient, string crawlerName) { var crawlerRequest = new StartCrawlerRequest { Name = crawlerName, }; var response = await glueClient.StartCrawlerAsync(crawlerRequest); if (response.HttpStatusCode == System.Net.HttpStatusCode.OK) { Console.WriteLine($"{crawlerName} was successfully started!"); return true; } Console.WriteLine($"Could not start AWS Glue crawler, {crawlerName}."); return false; } } }

创建运行场景的类。

global using Amazon.Glue; global using Amazon.Glue.Model; global using Glue_Basics; // This example uses .NET Core 6 and the AWS SDK for .NET (v3.7) // Before running the code, set up your development environment, // including your credentials. For more information, see the // following topic: // https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-config.html // // To set up the resources you need, see the following topic: // https://docs.aws.amazon.com/glue/latest/ug/tutorial-add-crawler.html // // This example performs the following tasks: // 1. CreateDatabase // 2. CreateCrawler // 3. GetCrawler // 4. StartCrawler // 5. GetDatabase // 6. GetTables // 7. CreateJob // 8. StartJobRun // 9. ListJobs // 10. GetJobRuns // 11. DeleteJob // 12. DeleteDatabase // 13. DeleteCrawler // Initialize the values that we need for the scenario. // The Amazon Resource Name (ARN) of the service role used by the crawler. var iam = "arn:aws:iam::012345678901:role/AWSGlueServiceRole-CrawlerTutorial"; // The path to the Amazon S3 bucket where the comma-delimited file is stored. var s3Path = "s3://crawler-public-us-east-1/flight/2016/csv"; var cron = "cron(15 12 * * ? *)"; // The name of the database used by the crawler. var dbName = "example-flights-db"; var crawlerName = "Flight Data Crawler"; var jobName = "glue-job34"; var scriptLocation = "s3://aws-glue-scripts-012345678901-us-west-1/GlueDemoUser"; var locationUri = "s3://crawler-public-us-east-1/flight/2016/csv/"; var glueClient = new AmazonGlueClient(); await GlueMethods.DeleteDatabaseAsync(glueClient, dbName); Console.WriteLine("Creating the database and crawler for the AWS Glue example."); var success = await GlueMethods.CreateDatabaseAsync(glueClient, dbName, locationUri); success = await GlueMethods.CreateGlueCrawlerAsync(glueClient, iam, s3Path, cron, dbName, crawlerName); // Get information about the AWS Glue crawler. Console.WriteLine("Showing information about the newly created AWS Glue crawler."); success = await GlueMethods.GetSpecificCrawlerAsync(glueClient, crawlerName); Console.WriteLine("Starting the new AWS Glue crawler."); success = await GlueMethods.StartSpecificCrawlerAsync(glueClient, crawlerName); Console.WriteLine("Displaying information about the database used by the crawler."); success = await GlueMethods.GetSpecificDatabaseAsync(glueClient, dbName); success = await GlueMethods.GetGlueTablesAsync(glueClient, dbName); Console.WriteLine("Creating a new AWS Glue job."); success = await GlueMethods.CreateJobAsync(glueClient, jobName, iam, scriptLocation); Console.WriteLine("Starting the new AWS Glue job."); success = await GlueMethods.StartJobAsync(glueClient, jobName); Console.WriteLine("Getting information about the AWS Glue job."); success = await GlueMethods.GetAllJobsAsync(glueClient); success = await GlueMethods.GetJobRunsAsync(glueClient, jobName); Console.WriteLine("Deleting the AWS Glue job used by the exmple."); success = await GlueMethods.DeleteJobAsync(glueClient, jobName); Console.WriteLine("\n*** Waiting 5 MIN for the " + crawlerName + " to stop. ***"); System.Threading.Thread.Sleep(300000); Console.WriteLine("Clean up the resources created for the example."); success = await GlueMethods.DeleteDatabaseAsync(glueClient, dbName); success = await GlueMethods.DeleteSpecificCrawlerAsync(glueClient, crawlerName); Console.WriteLine("Successfully completed the AWS Glue Scenario ");