Reasoning model evaluation - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Reasoning model evaluation

Overview

Reasoning model support enables evaluation with reasoning-capable Nova models that perform explicit internal reasoning before generating final responses. This feature uses API-level control via the reasoning_effort parameter to dynamically enable or disable reasoning functionality, potentially improving response quality for complex analytical tasks.

Supported models

  • amazon.nova-2-lite-v1:0:256k

Recipe configuration

Enable reasoning by adding the reasoning_effort parameter to the inference section of your recipe:

run: name: reasoning-eval-job-name # [MODIFIABLE] Unique identifier for your evaluation job model_type: amazon.nova-2-lite-v1:0:256k # [FIXED] Must be a reasoning-supported model model_name_or_path: nova-lite-2/prod # [FIXED] Path to model checkpoint or identifier replicas: 1 # [MODIFIABLE] Number of replicas for SageMaker Training job data_s3_path: "" # [MODIFIABLE] Leave empty for SageMaker Training job; optional for SageMaker HyperPod job output_s3_path: "" # [MODIFIABLE] Output path for SageMaker HyperPod job (not compatible with SageMaker Training jobs) evaluation: task: mmlu # [MODIFIABLE] Evaluation task strategy: zs_cot # [MODIFIABLE] Evaluation strategy metric: accuracy # [MODIFIABLE] Metric calculation method inference: reasoning_effort: high # [MODIFIABLE] Enables reasoning mode; options: low/high or null to disable max_new_tokens: 32768 # [MODIFIABLE] Maximum tokens to generate, recommended value when reasoning_effort set to high top_k: -1 # [MODIFIABLE] Top-k sampling parameter top_p: 1.0 # [MODIFIABLE] Nucleus sampling parameter temperature: 0 # [MODIFIABLE] Sampling temperature (0 = deterministic)

Using the reasoning_effort parameter

The reasoning_effort parameter controls the reasoning behavior for reasoning-capable models.

Prerequisites

  • Model compatibility – Set reasoning_effort only when model_type specifies a reasoning-capable model (currently amazon.nova-2-lite-v1:0:256k)

  • Error handling – Using reasoning_effort with unsupported models will fail with ConfigValidationError: "Reasoning mode is enabled but model '{model_type}' does not support reasoning. Please use a reasoning-capable model or disable reasoning mode."

Available options

Option Behavior Token limit Use case
null (default) Disables reasoning mode N/A Standard evaluation without reasoning overhead
low Enables reasoning with constraints 4,000 tokens for internal reasoning Scenarios requiring concise reasoning; optimizes for speed and cost
high Enables reasoning without constraints No token limit on internal reasoning Complex problems requiring extensive analysis and step-by-step reasoning
Training method Available options How to configure
SFT (Supervised Fine-Tuning) High or Off only Use reasoning_enabled: true (high) or reasoning_enabled: false (off)
RFT (Reinforcement Fine-Tuning) Low, High, or Off Use reasoning_effort: low or reasoning_effort: high. Omit field to disable.
Evaluation Low, High, or Off Use reasoning_effort: low or reasoning_effort: high. Use null to disable.

When to enable reasoning

Use reasoning mode (low or high) for

  • Complex problem-solving tasks (mathematics, logic puzzles, coding)

  • Multi-step analytical questions requiring intermediate reasoning

  • Tasks where detailed explanations or step-by-step thinking improve accuracy

  • Scenarios where response quality is prioritized over speed

Use non-reasoning mode (null or omit parameter) for

  • Simple Q&A or factual queries

  • Creative writing tasks

  • When faster response times are critical

  • Performance benchmarking where reasoning overhead should be excluded

  • Cost optimization when reasoning doesn't improve task performance

Troubleshooting

Error: "Reasoning mode is enabled but model does not support reasoning"

Cause: The reasoning_effort parameter is set to a non-null value, but the specified model_type doesn't support reasoning.

Resolution:

  • Verify your model type is amazon.nova-2-lite-v1:0:256k

  • If using a different model, either switch to a reasoning-capable model or remove the reasoning_effort parameter from your recipe