View X-Ray traces in Step Functions - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

View X-Ray traces in Step Functions

In this tutorial, you will learn how to use X-Ray to trace errors that occur when running a state machine. You can use Amazon X-Ray to visualize the components of your state machine, identify performance bottlenecks, and troubleshoot requests that resulted in an error. In this tutorial, you will create several Lambda functions that randomly produce errors, which you can then trace and analyze using X-Ray.

The Creating a Step Functions state machine that uses Lambda tutorial walks you though creating a state machine that calls a Lambda function. If you have completed that tutorial, skip to Step 2 and use the Amazon Identity and Access Management (IAM) role that you previously created.

Step 1: Create an IAM role for Lambda

Both Amazon Lambda and Amazon Step Functions can execute code and access Amazon resources (for example, data stored in Amazon S3 buckets). To maintain security, you must grant Lambda and Step Functions access to these resources.

Lambda requires you to assign an Amazon Identity and Access Management (IAM) role when you create a Lambda function, in the same way Step Functions requires you to assign an IAM role when you create a state machine.

You use the IAM console to create a service-linked role.

To create a role (console)
  1. Sign in to the Amazon Web Services Management Console and open the IAM console at https://console.amazonaws.cn/iam/.

  2. In the navigation pane of the IAM console, choose Roles. Then choose Create role.

  3. Choose the Amazon Service role type, and then choose Lambda.

  4. Choose the Lambda use case. Use cases are defined by the service to include the trust policy required by the service. Then choose Next: Permissions.

  5. Choose one or more permissions policies to attach to the role (for example, AWSLambdaBasicExecutionRole). See Amazon Lambda Permissions Model.

    Select the box next to the policy that assigns the permissions that you want the role to have, and then choose Next: Review.

  6. Enter a Role name.

  7. (Optional) For Role description, edit the description for the new service-linked role.

  8. Review the role, and then choose Create role.

Step 2: Create a Lambda function

Your Lambda function will randomly throw errors or time out, producing example data to view in X-Ray.

Important

Ensure that your Lambda function is under the same Amazon account and Amazon Region as your state machine.

  1. Open the Lambda console and choose Create function.

  2. In the Create function section, choose Author from scratch.

  3. In the Basic information section, configure your Lambda function:

    1. For Function name, enter TestFunction1.

    2. For Runtime, choose Node.js 18.x.

    3. For Role, select Choose an existing role.

    4. For Existing role, select the Lambda role that you created earlier.

      Note

      If the IAM role that you created doesn't appear in the list, the role might still need a few minutes to propagate to Lambda.

    5. Choose Create function.

      When your Lambda function is created, note its Amazon Resource Name (ARN) in the upper-right corner of the page. For example:

      arn:aws-cn:lambda:us-east-1:123456789012:function:TestFunction1
  4. Copy the following code for the Lambda function into the Function code section of the TestFunction1 page.

    function getRandomSeconds(max) { return Math.floor(Math.random() * Math.floor(max)) * 1000; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } export const handler = async (event) => { if(getRandomSeconds(4) === 0) { throw new Error("Something went wrong!"); } let wait_time = getRandomSeconds(5); await sleep(wait_time); return { 'response': true } };

    This code creates randomly timed failures, which will be used to generate example errors in your state machine that can be viewed and analyzed using X-Ray traces.

  5. Choose Save.

Step 3: Create two more Lambda functions

Create two more Lambda functions.

  1. Repeat Step 2 to create two more Lambda functions. For the next function, in Function name, enter TestFunction2. For the last function, in Function name, enter TestFunction3.

  2. In the Lambda console, check that you now have three Lambda functions, TestFunction1, TestFunction2, and TestFunction3.

Step 4: Create a state machine

In this step, you'll use the Step Functions console to create a state machine with three Task states. Each Task state will a reference one of your three Lambda functions.

  1. Open the Step Functions console and choose Create state machine.

    Important

    Make sure that your state machine is under the same Amazon account and Region as the Lambda functions you created earlier in Step 2 and Step 3.

  2. In the Choose a template dialog box, select Blank.

  3. Choose Select. This opens Workflow Studio in Design mode.

  4. For this tutorial, you'll write the Amazon States Language (ASL) definition of your state machine in the Code editor. To do this, choose Code.

  5. Remove the existing boilerplate code and paste the following code. In the Task state definition, remember to replace the example ARNs with the ARNs of the Lambda functions you created.

    { "StartAt": "CallTestFunction1", "States": { "CallTestFunction1": { "Type": "Task", "Resource": "arn:aws-cn:lambda:us-east-1:123456789012:function:test-function1", "Catch": [ { "ErrorEquals": [ "States.TaskFailed" ], "Next": "AfterTaskFailed" } ], "Next": "CallTestFunction2" }, "CallTestFunction2": { "Type": "Task", "Resource": "arn:aws-cn:lambda:us-east-1:123456789012:function:test-function2", "Catch": [ { "ErrorEquals": [ "States.TaskFailed" ], "Next": "AfterTaskFailed" } ], "Next": "CallTestFunction3" }, "CallTestFunction3": { "Type": "Task", "Resource": "arn:aws-cn:lambda:us-east-1:123456789012:function:test-function3", "TimeoutSeconds": 5, "Catch": [ { "ErrorEquals": [ "States.Timeout" ], "Next": "AfterTimeout" }, { "ErrorEquals": [ "States.TaskFailed" ], "Next": "AfterTaskFailed" } ], "Next": "Succeed" }, "Succeed": { "Type": "Succeed" }, "AfterTimeout": { "Type": "Fail" }, "AfterTaskFailed": { "Type": "Fail" } } }

    This is a description of your state machine using the Amazon States Language. It defines three Task states named CallTestFunction1, CallTestFunction2 and CallTestFunction3. Each calls one of your three Lambda functions. For more information, see State Machine Structure.

  6. Specify a name for your state machine. To do this, choose the edit icon next to the default state machine name of MyStateMachine. Then, in State machine configuration, specify a name in the State machine name box.

    For this tutorial, enter the name TraceFunctions.

  7. (Optional) In State machine configuration, specify other workflow settings, such as state machine type and its execution role.

    For this tutorial, under Additional configuration, choose Enable X-Ray tracing. Keep all the other default selections in State machine settings.

    If you've previously created an IAM role with the correct permissions for your state machine and want to use it, in Permissions, select Choose an existing role, and then select a role from the list. Or select Enter a role ARN and then provide an ARN for that IAM role.

  8. In the Confirm role creation dialog box, choose Confirm to continue.

    You can also choose View role settings to go back to State machine configuration.

    Note

    If you delete the IAM role that Step Functions creates, Step Functions can't recreate it later. Similarly, if you modify the role (for example, by removing Step Functions from the principals in the IAM policy), Step Functions can't restore its original settings later.

Step 5: Run the state machine

State machine executions are instances where you run your workflow to perform tasks.

  1. On the TraceFunctions page, choose Start execution.

    The New execution page is displayed.

  2. In the Start execution dialog box, do the following:

    1. (Optional) To identify your execution, you can specify a name for it in the Name box. By default, Step Functions generates a unique execution name automatically.

      Note

      Step Functions allows you to create names for state machines, executions, and activities, and labels that contain non-ASCII characters. These non-ASCII names don't work with Amazon CloudWatch. To ensure that you can track CloudWatch metrics, choose a name that uses only ASCII characters.

    2. Choose Start execution.

    3. The Step Functions console directs you to a page that's titled with your execution ID. This page is known as the Execution Details page. On this page, you can review the execution results as the execution progresses or after it's complete.

      To review the execution results, choose individual states on the Graph view, and then choose the individual tabs on the Step details pane to view each state's details including input, output, and definition respectively. For details about the execution information you can view on the Execution Details page, see Execution Details page – Interface overview.

      Run several (at least three) executions.

  3. After the executions have finished, follow the X-Ray trace map link. You can view the trace while an execution is still running, but you may want to see the execution results before viewing the X-Ray trace map.

    
                            X-Ray enable
  4. View the service map to identify where errors are occurring, connections with high latency, or traces for requests that were unsuccessful. In this example, you can see how much traffic each function is receiving. TestFunction2 was called more often than TestFunction3, and TestFunction1 was called more than twice as often as TestFunction2.

    The service map indicates the health of each node by coloring it based on the ratio of successful calls to errors and faults:

    • Green for successful calls

    • Red for server faults (500 series errors)

    • Yellow for client errors (400 series errors)

    • Purple for throttling errors (429 Too Many Requests)

    
                            X-Ray enable

    You can also choose a service node to view requests for that node, or an edge between two nodes to view requests that traveled that connection.

  5. View the X-Ray trace map to see what has happened for each execution. The Timeline view shows a hierarchy of segments and subsegments. The first entry in the list is the segment, which represents all data recorded by the service for a single request. Below the segment are subsegments. This example shows subsegments recorded by the Lambda functions.

    
                            X-Ray enable

    For more information on understanding X-Ray traces and using X-Ray with Step Functions, see the Amazon X-Ray and Step Functions