查询世系实体 - 亚马逊 SageMaker AI
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

查询世系实体

Amazon SageMaker AI 会在您使用世系实体时自动生成这些图表。您可以查询这些数据来回答各种问题。下面将说明如何在 SDK for Python 中查询这些数据。

有关如何在 Amazon SageMaker Studio 中查看注册模特血统的信息,请参阅在 Studio 中查看模型任务流水线详情

您可以查询世系实体以执行以下操作:

  • 检索创建模型时使用的所有数据集。

  • 检索创建端点时使用的所有作业。

  • 检索所有使用数据集的模型。

  • 检索所有使用模型的端点。

  • 检索哪些端点派生自特定数据集。

  • 检索创建了训练作业的管道执行。

  • 检索实体之间的关系,以便进行调查、治理和再现。

  • 检索所有使用该构件的下游试验。

  • 检索所有使用该构件的上游试验。

  • 检索使用所提供的 S3 URI 的构件列表。

  • 检索使用该数据集构件的上游构件。

  • 检索使用该数据集构件的下游构件。

  • 检索使用该映像构件的数据集。

  • 检索使用该上下文的操作。

  • 检索使用该端点的处理作业。

  • 检索使用该端点的转换作业。

  • 检索使用该端点的试验组件。

  • 检索与模型包组关联的管道执行的 ARN。

  • 检索所有使用该操作的构件。

  • 检索所有使用该模型包批准操作的上游数据集。

  • 从模型包批准操作中检索模型包。

  • 检索使用该端点的下游端点上下文。

  • 检索与试验组件关联的管道执行的 ARN。

  • 检索使用该试验组件的数据集。

  • 检索使用该试验组件的模型。

  • 探索您的世系以实现可视化。

限制
  • 以下区域不提供世系查询功能:

    • 非洲(开普敦)- af-south

    • 亚太地区(雅加达)– ap-southeast-3

    • 亚太地区(大阪)– ap-northeast-3

    • 欧洲地区(米兰)- eu-south-1

    • 欧洲(西班牙)- eu-south-2

    • 以色列(特拉维夫)– il-central-1

  • 要发现的关系的最大深度目前限制为 10。

  • 筛选仅限于以下属性:上次修改日期、创建日期、类型和世系实体类型。

查询世系实体入门

最简单的入门方式是通过:

以下示例说明如何使用LineageQuery和构造查询LineageFilter APIs 来回答有关 Lineage Graph 的问题,并针对一些用例提取实体关系。

例 使用 LineageQuery API 查找实体关联
from sagemaker.lineage.context import Context, EndpointContext from sagemaker.lineage.action import Action from sagemaker.lineage.association import Association from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact from sagemaker.lineage.query import ( LineageQuery, LineageFilter, LineageSourceEnum, LineageEntityEnum, LineageQueryDirectionEnum, ) # Find the endpoint context and model artifact that should be used for the lineage queries. contexts = Context.list(source_uri=endpoint_arn) context_name = list(contexts)[0].context_name endpoint_context = EndpointContext.load(context_name=context_name)
例 查找与端点关联的所有数据集
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the datasets dataset_artifacts = [] for vertex in query_result.vertices: dataset_artifacts.append(vertex.to_lineage_object().source.source_uri) pp.pprint(dataset_artifacts)
例 查找与端点关联的模型
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the model model_artifacts = [] for vertex in query_result.vertices: model_artifacts.append(vertex.to_lineage_object().source.source_uri) # The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with # the S3 URI to the model.tar.gz file associated with the model pp.pprint(model_artifacts)
例 查找与端点关联的试验组件
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`. query_filter = LineageFilter( entities=[LineageEntityEnum.TRIAL_COMPONENT], sources=[LineageSourceEnum.TRAINING_JOB], ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the ARNs of the training jobs associated with this Endpoint trial_components = [] for vertex in query_result.vertices: trial_components.append(vertex.arn) pp.pprint(trial_components)
例 更改世系的焦点

可以修改 LineageQuery,使其具有不同的 start_arns,这将更改世系的焦点。此外,LineageFilter 可以采用多个来源和实体来扩大查询范围。

在下文中,我们使用模型作为世系焦点,并查找与之关联的端点和数据集。

# Get the ModelArtifact model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0] model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn) query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that descend from the model, i.e. the endpoint direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that ascend from the model, i.e. the datasets direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
例 使用 LineageQueryDirectionEnum.BOTH 查找前代和后代关系

当方向设置为 BOTH 时,查询将遍历图表以查找前代和后代关系。这种遍历不仅从起始节点开始,而且从访问的每个节点开始。如果一个训练作业运行了两次,并且该训练作业生成的两个模型都部署到端点,则方向设置为 BOTH 的查询结果会显示这两个端点。这是因为训练和部署模型时使用的是同一映像。由于该映像对模型是通用的,因此 start_arn 和两个端点都会显示在查询结果中。

query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # This specifies that the query should look for associations both ascending and descending for the start direction=LineageQueryDirectionEnum.BOTH, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
LineageQuery 中的方向 - ASCENDANTSDESCENDANTS

要了解世系图表中的方向,请使用以下实体关系图表 - 数据集 -> 训练作业 -> 模型 -> 端点

端点是模型的后代,而模型是数据集的后代。同样,模型是端点的前代。direction 参数可用于指定查询应返回 start_arns 中实体的后代实体还是前代实体。如果 start_arns 包含模型且方向为 DESCENDANTS,则查询将返回端点。如果方向为 ASCENDANTS,则查询将返回数据集。

# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[ LineageSourceEnum.ENDPOINT, LineageSourceEnum.MODEL, LineageSourceEnum.DATASET, LineageSourceEnum.TRAINING_JOB, ], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) ascendant_artifacts = [] # The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a # lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN. for vertex in query_result.vertices: try: ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: ascendant_artifacts.append(vertex.arn) print("Ascendant artifacts : ") pp.pprint(ascendant_artifacts) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) descendant_artifacts = [] for vertex in query_result.vertices: try: descendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: # Handling TrialComponents. descendant_artifacts.append(vertex.arn) print("Descendant artifacts : ") pp.pprint(descendant_artifacts)
例 可简化世系查询的 SDK 帮助程序函数

EndpointContextModelArtifactDatasetArtifact 都有一些帮助程序函数,它们是 LineageQuery API 的包装器,可以让某些世系查询更容易利用。以下示例演示如何使用这些帮助程序函数。

# Find all the datasets associated with this endpoint datasets = [] dataset_artifacts = endpoint_context.dataset_artifacts() for dataset in dataset_artifacts: datasets.append(dataset.source.source_uri) print("Datasets : ", datasets) # Find the training jobs associated with the endpoint training_job_artifacts = endpoint_context.training_job_arns() training_jobs = [] for training_job in training_job_artifacts: training_jobs.append(training_job) print("Training Jobs : ", training_jobs) # Get the ARN for the pipeline execution associated with this endpoint (if any) pipeline_executions = endpoint_context.pipeline_execution_arn() if pipeline_executions: for pipeline in pipelines_executions: print(pipeline) # Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model dataset_artifacts = model_artifact.dataset_artifacts() endpoint_contexts = model_artifact.endpoint_contexts() datasets = [dataset.source.source_uri for dataset in dataset_artifacts] endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Datasets associated with this model : ") pp.pprint(datasets) print("Endpoints associated with this model : ") pp.pprint(endpoints) # Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset # Find the artifact associated with the dataset dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn) # Find the endpoints that used this training dataset endpoint_contexts = dataset_artifact.endpoint_contexts() endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Endpoints associated with the training dataset {}".format(training_data)) pp.pprint(endpoints)
例 获取世系图表可视化

示例笔记本 visualizer.py 中提供了一个帮助程序类 Visualizer 来帮助绘制世系图表。呈现查询响应时,将显示一个包含 StartArns 世系关系的图表。从 StartArns 开始,可视化显示了与 query_lineage API 操作中返回的其他世系实体之间的关系。

# Graph APIs # Here we use the boto3 `query_lineage` API to generate the query response to plot. from visualizer import Visualizer query_response = sm_client.query_lineage( StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True ) viz = Visualizer() viz.render(query_response, "Endpoint") query_response = sm_client.query_lineage( StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True ) viz.render(query_response, "Model")