您的Amazon环境中的自定义奖励函数 - 亚马逊 SageMaker AI
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

您的Amazon环境中的自定义奖励函数

您的Amazon环境中的自定义奖励函数仅支持单回合 RFT。它可以训练模型执行任务,其中单个提示会收到单个响应,并进行独立评估。模型收到一个提示并生成一个响应,然后由你的奖励函数对其进行评分,即没有 back-and-forth对话。这与多回合 RFT 形成鲜明对比,在多回合 RFT 中,模型在获得最终奖励之前与环境或用户进行多轮互动。

架构概述

该架构由两个主要组件组成:

训练 VPC:

  • Rollout:加载数据集和模型,向奖励函数发送推出,并获得奖励

  • T@@ rainer:接收来自 Rollout 组件的推出,执行向前和向后传球,并更新模型权重

客户 VPC:

  • 奖励 Lambda:客户实现的奖励函数,用于评估模型响应并返回奖励分数

工作流程:

  1. Rollout 加载数据集和模型

  2. Rollout 生成模型响应并调用 Lambda 获取奖励

  3. Lambda 返回奖励分数

  4. Rollout 会向 Trainer 发送推出

  5. 培训师根据奖励更新政策权重

配方配置

当您的奖励功能在 15 分钟内完成处理后,请使用此食谱。

## Nova Lite RLVR Training (PEFT) run: name: my-rft-run model_type: amazon.nova-2-lite-v1:0:256k model_name_or_path: nova-lite-2/prod data_s3_path: s3://example-bucket/train.jsonl output_s3_path: "" replicas: 2 # Number of compute instances for training. All supported values: {2, 4, 8, 16} generation_replicas: 2 # LLM inference replicas rollout_worker_replicas: 1 # Lambda functions for RFT reward_lambda_arn: "" ## Training config - essential fields for all services training_config: max_length: 10240 global_batch_size: 256 reasoning_effort: high data: shuffle: false rollout: rollout_strategy: type: off_policy_async age_tolerance: 2 advantage_strategy: number_generation: 8 generator: max_new_tokens: 8192 set_random_seed: true temperature: 1 top_k: 0 rewards: api_endpoint: lambda_arn: ${run.reward_lambda_arn} lambda_concurrency_limit: 100 # Lambda should be able to handle (rollout_worker_replicas * 64) requests # Training configuration trainer: max_steps: 100 save_steps: 5 save_top_k: 5 # RL parameters refit_freq: 4 clip_ratio_high: 0.2 ent_coeff: 0.001 loss_scale: 1 optim_config: # Optimizer settings lr: 7e-7 # Learning rate weight_decay: 0.0 # L2 regularization strength (0.0–1.0) adam_beta1: 0.9 adam_beta2: 0.95 peft: # Parameter-efficient fine-tuning (LoRA) peft_scheme: "lora" # Enable LoRA for PEFT lora_tuning: alpha: 32 lora_plus_lr_ratio: 64.0 # LoRA+ learning rate scaling factor (0.0–100.0)
## Nova Lite RLVR Training run: name: my-rft-run model_type: amazon.nova-2-lite-v1:0:256k model_name_or_path: nova-lite-2/prod data_s3_path: s3://example-bucket/train.jsonl output_s3_path: "" replicas: 2 # Number of compute instances for training. All supported values: {2, 4, 8, 16} generation_replicas: 2 # LLM inference replicas rollout_worker_replicas: 1 # Lambda functions for RFT reward_lambda_arn: "" ## Training config - essential fields for all services training_config: max_length: 10240 global_batch_size: 256 reasoning_effort: high data: shuffle: false rollout: rollout_strategy: type: off_policy_async age_tolerance: 2 advantage_strategy: number_generation: 8 generator: max_new_tokens: 8192 set_random_seed: true temperature: 1 top_k: 0 rewards: api_endpoint: lambda_arn: ${run.reward_lambda_arn} lambda_concurrency_limit: 100 # Lambda should be able to handle (rollout_worker_replicas * 64) requests # Training configuration trainer: max_steps: 100 save_steps: 5 save_top_k: 5 # RL parameters refit_freq: 4 clip_ratio_high: 0.2 ent_coeff: 0.001 loss_scale: 1 optim_config: # Optimizer settings lr: 7e-7 # Learning rate weight_decay: 0.0 # L2 regularization strength (0.0–1.0) adam_beta1: 0.9 adam_beta2: 0.95 peft: # Parameter-efficient fine-tuning (LoRA) peft_scheme: "null" # Disable LoRA for PEFT

配方参数

  • max_steps:模型的梯度更新次数。每次更新都使用global_batch_size × refit_freq示例。每个样本对应一个模型生成。训练样本总数 = max_steps × global_batch_size

  • max_seq_length:模型在训练期间处理的最大上下文长度(以令牌为单位)。应容纳输入提示长度 + 生成的响应长度。设置得太短会导致训练错误;设置得太大会浪费 GPU 内存并减慢训练速度。可用预设:8K(默认)、16K、32K。

  • global_batch_size:模型每次梯度更新的样本数。值越大,梯度越稳定,但需要更多的内存。请注意,每个样本对应于模型的一代,而不是提示。使用单个提示来创建number_generation示例。推荐:64-4096 以 2 的次方为单位。

  • refit_freq:模型权重更新的频率。每次更新中的样本数为refit_freq * global_batch_size。控制生成模型的更新频率。较高的值会增加有效的批次大小,从而提高学习的稳定性。较低的值会增加训练速度,但会增加方差。refit_freq 的增加,“非政策” 数据增加。推荐:4(最小:1,最大:4)。

  • rollout_strategy.off_policy_async:使模型的更新变得 “不合时宜”,即用于计算损失的代可能来自模型的先前版本,而不是当前模型。启用政策外可以加快训练速度,但如果设置过高,则age_tolerance可能不稳定。推荐:真(对,假)。

  • rollout_strategy.age_tolerance:仅在启用时off_policy_async才起作用。仅保留模型版本中低于当前模型age_tolerance旧版本的数据。较低的值会丢弃数据,较高的值会包含更多来自模型先前版本的数据。推荐:2(最小:1,最大:20)。

  • clip_ratio_high:剪辑有助于防止可能破坏训练稳定的大型政策更新。较大的值会鼓励更新以修复模型错误,但可能会破坏训练的稳定性。值越小,学习量越少。推荐值:0.3 (0.1、10)。

  • ent_coeff: 该参数是 “熵系数” 的缩写,通过在损失函数中添加熵加成来鼓励在训练期间进行探索。较高的值会促进更多的 diverse/exploratory 行为,而较低的值则侧重于利用当前的知识。推荐值:0.0(最小值:0,最大值:0.1)。

推理模式选择

根据任务的复杂程度,从三个推理努力级别中进行选择:

推理努力

使用场景

成本/延迟

适用场景

省略字段(无推理)

简单的事实查询、分类

速度和成本优化

low

复杂性适中,需要一些推理

平衡的性能和效率

high

复杂的分析任务、多步骤的问题

最大推理能力

默认行为:reasoning_effort如果指定时没有值,则默认为high

指导方针:

  • high用于 step-by-step思维可以增加价值的复杂分析任务(数学、逻辑、代码调试)

  • low用于需要一些推理的中等复杂度任务

  • 对于直接的事实查询、简单的分类以及针对速度和成本进行优化,请完全省略该字段

重要

更高的推理模式可以提高需要逻辑分析和复杂推理的任务的性能,但会增加训练和部署期间的成本和延迟。它们对诸如 “法国的首都是什么?” 之类的简单事实查询无济于事

奖励功能实现

奖励函数(也称为记分器或评分器)是评估模型响应并为训练提供反馈信号的核心组件。它必须作为接受模型响应并返回奖励分数的 Lambda 函数来实现。

先决条件

确保您的 Lambda 函数和 SQS 队列遵循所需的命名格式,并且您的执行角色具有必要的权限。

Lambda ARN 命名:

Lambda ARN 必须遵循以下命名格式:

arn:aws:lambda:*:*:function:*SageMaker*

SQS 命名(仅适用于您自己Amazon环境中的远程奖励功能):

  • 确保为集群创建的执行角色具有 SQS 权限 HyperPod

  • SQS ARN 必须与以下命名格式之一匹配:

    arn:aws:sqs:*:*:*SageMaker* arn:aws:sqs:*:*:*Sagemaker* arn:aws:sqs:*:*:*sagemaker*
  • 在 SQS 客户端中,使用终端节点覆盖:--endpoint https://sqs.us-west-2.amazonaws.com因为在 VPCE 中,传统 SQS 服务端点不可用

执行角色的 IAM 策略:

{ "Action": "lambda:InvokeFunction", "Resource": [ "arn:aws:lambda:*:*:function:*SageMaker*" ], "Effect": "Allow" }, { "Action": [ "sqs:DeleteMessage", "sqs:ReceiveMessage", "sqs:SendMessage" ], "Resource": [ "arn:aws:sqs:*:*:*SageMaker*" ], "Effect": "Allow" }

VPC 终端节点:

要使集 HyperPod 群调用 Lambda 函数,您必须:

  • 在 HyperPod 集群的 VPC 中为 Lambda 服务创建 VPC 终端节点

  • 将终端节点与集群的安全组关联

  • 确保 VPC 终端节点策略允许 lambda: 操作 InvokeFunction

确认您在 VPC 中看到连接到 EKS 的 lambda 终端节点。

接口格式

您的奖励函数必须接受并返回以下格式的数据。

训练的样本输入:

[{ "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" } ], "metadata": { "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." }, "my_key": "sample-001" } }]

奖励 Lambda 的有效载荷示例:

系统会将助手回合(生成的响应)附加到messages场地的最后一回合,并添加一个独特的id回合:

[{ "id": "123", "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" }, { "role": "assistant", "content": "As an AI developed by Amazon, I do not have a dedicated security team..." } ], "metadata": { "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." }, "my_key": "sample-001" } }]

奖励 Lambda 合约:

def lambda_handler(event, context): return lambda_grader(event) def lambda_grader(samples: list[dict]) -> list[dict]: """ Args: samples: List of dictionaries in OpenAI format Example input (List of such sample): { "id": "123", "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" }, { "role": "assistant", "content": "As an AI developed by Company, I do not have a dedicated security team..." } ], "metadata": { "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." }, "my_key": "sample-001" } } Returns: List of dictionaries with reward scores: { "id": str, # Same id as input sample "aggregate_reward_score": float, # Overall score for the sample "metrics_list": [ # OPTIONAL: Component scores { "name": str, # Name of the component score "value": float, # Value of the component score "type": str # "Reward" or "Metric" } ] } """

输入字段:

字段

说明

附加说明

id

样本的唯一标识符

在输出中回声。字符串格式

消息

以 OpenAI 格式排序的聊天记录

消息对象数组

消息 [] .role

留言的发言人

常用值:“用户”、“助手”、“系统”

消息 [] .content

消息的文字内容

纯字符串

元数据

有助于评分的自由格式信息

对象;从训练数据传递的可选字段

输出字段:

字段

说明

附加说明

id

与输入样本相同的标识符

必须匹配输入

聚合_奖励_分数

样本的总分数

浮点型(例如,0.0—1.0 或任务定义的范围)

指标列表

构成汇总的分量分数

指标对象数组

metrics_list [] .name

组件的名称指标/奖励

字符串(例如,“准确性”、“policy_reward”)

metrics_list [] .value

成分指标/奖励的价值

浮点型

metrics_list [] .type

组件的类别

字符串:“奖励” 或 “指标”

技术限制

  • 超时限制:每次 Lambda 调用的最大执行时间 15 分钟

  • 并@@ :必须处理rollout_worker_replicas × 64并发请求

  • 可靠性:必须实施正确的错误处理并始终如一地返回有效分数

  • 性能:针对快速执行(几秒钟,而不是几分钟)进行优化,从而实现高效训练

最佳实践:

  • 尽量减少外部 API 调用

  • 使用高效的算法和数据结构

  • 为暂时失败实现重试逻辑

  • 缓存可重复使用的计算

  • 在训练前进行彻底测试,确保执行无错误

使用自定义奖励功能

当你有特定任务的评估标准时,可以实现自定义奖励函数:

  1. 定义评估标准:确定哪些因素可以很好地响应您的任务

  2. 实现 Lambda 函数:按照接口格式创建一个 Lambda 函数

  3. 本地测试:验证您的函数会返回样本输入的正确分数

  4. 部署到 Amazon:部署您的 Lambda 并记下 ARN

  5. 配置食谱:将 Lambda ARN 添加到您的食谱字段中 reward_lambda_arn

  6. 使用小型数据集进行测试:使用最少的数据运行 RFT 以验证集成

示例 Lambda 函数

此示例验证输入格式,并将模型输出与参考答案进行比较。将评分逻辑替换为实际评估标准。

from typing import List import json from dataclasses import asdict, dataclass @dataclass class RewardOutput: """Reward service output.""" id: str aggregate_reward_score: float def lambda_handler(event, context): """ Main lambda handler """ return lambda_grader(event) def lambda_grader(samples: list[dict]) -> list[dict]: """ Core grader function """ scores: List[RewardOutput] = [] for sample in samples: # Extract components idx = sample["id"] ground_truth = sample.get("metadata", {}).get("reference_answer") if "messages" not in sample: print(f"Messages is None/empty for id: {idx}") ro = RewardOutput(id=idx, aggregate_reward_score=0.0) scores.append(ro) if ground_truth is None: print(f"No answer found in ground truth for id: {idx}") ro = RewardOutput(id=idx, aggregate_reward_score=0.0) scores.append(ro) # Get model's response (last turn is assistant turn) last_message = sample["messages"][-1] assert last_message["role"] == "assistant", "Last message must be from assistant" model_text = last_message["content"] ground_truth_text = _extract_ground_truth_text(ground_truth) if model_text.lower() == ground_truth_text.lower(): score = 1.0 else: score = 0.0 ro = RewardOutput(id=idx, aggregate_reward_score=score) scores.append(ro) # Convert to dict format for JSON serialization return [asdict(score) for score in scores] def _extract_ground_truth_text(ground_truth) -> str: """ Turn the `ground_truth` field into a plain string. """ if isinstance(ground_truth, str): return ground_truth if isinstance(ground_truth, dict): # Common patterns: { "explanation": "...", "answer": "..." } if "explanation" in ground_truth and isinstance(ground_truth["explanation"], str): return ground_truth["explanation"] if "answer" in ground_truth and isinstance(ground_truth["answer"], str): return ground_truth["answer"] # Fallback: stringify the whole dict return json.dumps(ground_truth, ensure_ascii=False) # Fallback: stringify anything else return str(ground_truth)

使用法学硕士作为奖励职能的评委

大型语言模型 (LLMs) 越来越多地被用作强化微调 (RFT) 工作流程中的评判,它提供了指导模型优化的自动奖励信号。在这种方法中,法学硕士根据指定的标准(无论是评估正确性、质量、风格依从性还是语义等效性)评估模型输出,并分配推动强化学习过程的奖励。

这对于难以以编程方式定义传统奖励函数的任务特别有价值,例如确定不同的表示形式(例如 “1/3”、“0.333” 和 “三分之一”)在语义上是否相同,或者评估连贯性和相关性等细微差别的品质。通过利用基于LLM的评委作为奖励功能,您可以将RFT扩展到复杂的领域,而无需大量的人工注释,从而可以在不同的用例中快速迭代和持续改进模型,而不仅仅是传统的对齐问题。

在生产环境 LLM-as-a-Judge中部署之前,请验证评判模型的评估是否与人类判断一致。这包括衡量法学硕士评委和人类评估人员之间对你的任务的代表性样本的同意率,理想的情况是确保法学硕士与人类的协议达到或超过人际一致率。此验证步骤有助于识别潜在的偏差,确保奖励信号引导您的模型朝着预期的方向发展,并建立信心,即自动评估过程将生成符合您的生产质量标准的模型。

使用 LLM-as-a-Judge是使用 Lambda 函数进行可验证奖励 (RLVR) 的强化学习的简单扩展。在 Lambda 函数中,您可以调用 Amazon Bedrock 中托管的其中一个模型。为确保训练和评估与评判模型配合良好,请确保所使用的 Amazon Bedrock 模型的吞吐量配额足够。

配置您的 Lambda 函数,使超时时间较长,最长不超过 15 分钟。Lambda 的默认设置为 3 秒,并且更改 Lambda 配置中的超时对于考虑与基于逻辑的奖励函数相比,Amazon Bedrock 模型的响应时间更长,因此必须更改 Lambda 配置中的超时时间。Lambda 也会在训练期间并行调用,因此请增加并发度以完全最大限度地提高可用吞吐量。请注意,需要在 Lambda 配置和训练作业配方中设置并发限制。

训练食谱示例:

display_name: "Nova Lite V2 LoRA RLVR SMTJ training on GPU" version: "1.0" instance_types: ["ml.p5.48xlarge", "ml.p5en.48xlarge"] run: name: <experiment_name> model_type: amazon.nova-2-lite-v1:0:256k model_name_or_path: "nova-lite-2/prod" data_s3_path: s3://<path>/<training_data>.jsonl replicas: 4 reward_lambda_arn: arn:aws:lambda:<region>:<account>:function:<lambda-name> ## SMTJ RFT Training specific configs training_config: max_length: 1200 # Context window (tokens) for inputs+prompt global_batch_size: 64 # Total samples per optimizer step across all replicas (16/32/64/128/256) reasoning_effort: high # Enables reasoning mode High / Low / or null for non-reasoning test_freq: 10 rollout: # How responses are generated for GRPO/advantage calc advantage_strategy: number_generation: 4 # N samples per prompt to estimate advantages (variance vs cost) generator: max_new_tokens: 1024 # Cap on tokens generated per sample set_random_seed: true # Seed generation for reproducibility across runs temperature: 1 # Softmax temperature top_k: 1 # Sample only from top-K logits rewards: preset_reward_function: null # Usage of reward functions built into Verl [exact_match, code_executions, math_answers] api_endpoint: lambda_arn: arn:aws:lambda:<region>:<account>:function:<lambda-name> lambda_concurrency_limit: 12 # Max concurrent Lambda invocations (throughput vs. throttling) trainer: max_steps: 100 # Steps to train for. One Step = global_batch_size save_steps: 20 test_freq:10 # RL parameters ent_coeff: 0.0 # A bonus added to the policy loss that rewards higher-output entropy kl_loss_coef: 0.0 # Weight on the KL penalty between the actor (trainable policy) and a frozen reference model optim_config: # Optimizer settings lr: 1e-6 # Learning rate weight_decay: 0.0 # L2 regularization strength (0.0–1.0) adam_beta1: 0.9 adam_beta2: 0.95

示例 Lambda:

此 Lambda 函数实现了用于强化微调的 LLM-as-a-Judge奖励评分系统。它通过从格式良好的输出中提取答案(寻找\boxed{}符号)来处理模型生成的批量响应,然后使用Claude Haiku作为判断模型,以0.0-1.0的比例评估提取的答案与基本真相参考答案之间的语义相似性。法官比较答案以确定它们在语义上是否相同(即使表示方式不同,例如 “1/3” 与 “0.333”),处理答案可能以各种方式格式化的案件。该函数包括用于限制的重试逻辑,验证消息结构,并返回可在强化学习过程中用作训练信号的奖励分数列表,当无法提取答案或验证失败时,分数为 0.0。

import json import random from dataclasses import asdict, dataclass import re from typing import Dict, Optional, Any, List import boto3 from botocore.exceptions import ClientError from copy import deepcopy import time import base64 def extract_solution_nova(solution_str: str) -> Optional[str]: """ Extract solution from Nova-formatted response. Args: solution_str: The solution text from Nova model method: "strict" or "flexible" extraction method Returns: Extracted numerical answer or None """ boxed_matches = re.findall(r'\\boxed\{([^}]+)\}', solution_str) if boxed_matches: final_answer = boxed_matches[-1].replace(",", "").replace("$", "") return final_answer return 0.0 bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1') JUDGE_MODEL_ID = "global.anthropic.claude-haiku-4-5-20251001-v1:0" SYSTEM_PROMPT = "You must output ONLY a number between 0.0 and 1.0. No explanations, no text, just the number." JUDGE_PROMPT_TEMPLATE = """Compare the following two responses and rate how similar they are on a scale of 0.0 to 1.0, where: - 1.0 means the responses are semantically equivalent (same meaning, even if worded differently) - 0.5 means the responses are partially similar - 0.0 means the responses are completely different or contradictory Response A: {response_a} Response B: {response_b} Output ONLY a number between 0.0 and 1.0. No explanations.""" def lambda_graded(id: str, response_a: str, response_b: str, max_retries: int = 50) -> float: """Call Bedrock to compare responses and return similarity score.""" prompt = JUDGE_PROMPT_TEMPLATE.format(response_a=response_a, response_b=response_b) print(f"Calling judge: {JUDGE_MODEL_ID}") for attempt in range(max_retries): try: print(f"Attempt: {attempt}") response = bedrock_runtime.converse( modelId=JUDGE_MODEL_ID, messages=[{"role": "user", "content": [{"text": prompt}]}], system=[{"text": SYSTEM_PROMPT}], inferenceConfig={"temperature": 0.0, "maxTokens": 10} ) print(f"Bedrock call successful: {response}") output = response['output']['message']['content'][0]['text'].strip() score = float(output) print(f"Score parsed: {score}") return max(0.0, min(1.0, score)) except Exception as e: if "ThrottlingException" in str(e) and attempt < max_retries - 1: time.sleep(2 ** attempt) print(f"Throttling {id}") else: print(f"Bedrock call failed: {e}") return 0.0 print("Max retries reached. Unable to complete the request.") return 0.0 def compute_score(id: str, solution_str: str, ground_truth: str, method: str = "strict", format_score: float = 0.0, score: float = 1.0, data_source: str ='dataset_name', extra_info: Optional[dict] = None) -> float: """ The scoring function for PandaLM with Nova format. Args: solution_str: The solution text from Nova model ground_truth: JSON string containing the ground truth answer method: The method to extract the solution, choices are 'strict' and 'flexible' format_score: The score for format compliance score: The score for correct answer data_source: Should match the data_source in the given dataset extra_info: Optional dict with additional fields. Required in function signature. Returns: Score between 0 and 1 """ import json answer = extract_solution_nova(solution_str=solution_str, method=method) if answer is None: return 0.0 print(f"Answer: {str(answer)}, Reference: {str(ground_truth)}") # Clean both answers for comparison clean_answer = str(answer) clean_ground_truth = str(ground_truth) score = lambda_graded(id, response_a=clean_answer, response_b=clean_ground_truth) print(f"Raw score: {score}") return score @dataclass class RewardOutput: """Reward service.""" id: str aggregate_reward_score: float def lambda_handler(event, context): scores: List[RewardOutput] = [] samples = event print(len(samples)) for sample in samples: # Extract the ground truth key. In the current dataset it's answer print("Sample: ", json.dumps(sample, indent=2)) ground_truth = sample["reference_answer"] idx = "no id" # print(sample) if not "id" in sample: print(f"ID is None/empty for sample: {sample}") else: idx = sample["id"] ro = RewardOutput(id=idx, aggregate_reward_score=0.0) if not "messages" in sample: print(f"Messages is None/empty for id: {idx}") scores.append(RewardOutput(id="0", aggregate_reward_score=0.0)) continue # Extract answer from ground truth dict if ground_truth is None: print(f"No answer found in ground truth for id: {idx}") scores.append(RewardOutput(id="0", aggregate_reward_score=0.0)) continue # Get completion from last message (assistant message) last_message = sample["messages"][-1] completion_text = last_message["content"] if last_message["role"] not in ["assistant", "nova_assistant"]: print(f"Last message is not from assistant for id: {idx}") scores.append(RewardOutput(id="0", aggregate_reward_score=0.0)) continue if not "content" in last_message: print(f"Completion text is empty for id: {idx}") scores.append(RewardOutput(id="0", aggregate_reward_score=0.0)) continue random_score = compute_score(id=id, solution_str=completion_text, ground_truth=ground_truth) ro = RewardOutput(id=idx, aggregate_reward_score=random_score) print(f"Response for id: {idx} is {ro}") scores.append(ro) return [asdict(score) for score in scores]