Monitor an Elastic Fabric Adapter on Amazon EC2 - Amazon Elastic Compute Cloud
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitor an Elastic Fabric Adapter on Amazon EC2

You can use the following features to monitor the performance of your Elastic Fabric Adapters.

EFA driver metrics for an Amazon EC2 instance

The Elastic Fabric Adapter (EFA) driver publishes multiple metrics from the instances that have EFA interfaces attached, in real time. You can use these metrics to troubleshoot application performance and networking issues, choose the right cluster size for a workload, plan scaling activities proactively, and benchmark applications to determine whether they maximize the EFA performance available on an instance.

Available EFA driver metrics

The EFA driver publishes the following metrics to the instance in real time. They provide the cumulative number of errors, connection events, and packets or bytes sent, received, retransmitted, or dropped by the attached EFA devices since instance launch or the last driver reset.

Metric Description Supported instance types
tx_bytes

The number of bytes transmitted.

Unit: bytes

All instance types that support EFA
rx_bytes

The number of bytes received.

Unit: bytes

All instance types that support EFA
tx_pkts

The number of packets transmitted.

Unit: count

All instance types that support EFA
rx_pkts

The number of packets received.

Unit: count

All instance types that support EFA
rx_drops

The number of packets that were received and then dropped.

Unit: count

All instance types that support EFA
send_bytes

The number of bytes sent using send operations.

Unit: bytes

All instance types that support EFA
recv_bytes

The number of bytes received by send operations.

Unit: bytes

All instance types that support EFA
send_wrs

The number of packets sent using send operations.

Unit: count

All instance types that support EFA
recv_wrs

The number of packets received by send operations.

Unit: count

All instance types that support EFA
rdma_write_wrs

The number of completed rdma write operations.

Unit: count

All instance types that support EFA
rdma_read_wrs

The number of completed rdma read operations.

Unit: count

All instance types that support EFA
rdma_write_bytes

The number of bytes written to it by other instances using rdma write operations.

Unit: bytes

All instance types that support EFA
rdma_read_bytes

The number of bytes received using rdma read operations.

Unit: bytes

All instance types that support EFA
rdma_write_wr_err

The number of rdma write operations that had local or remote errors.

Unit: count

All instance types that support EFA
rdma_read_wr_err

The number of rdma read operations that had local or remote errors.

Unit: count

All instance types that support EFA
rdma_read_resp_bytes

The number of bytes sent in response to rdma read operations.

Unit: bytes

All instance types that support EFA
rdma_write_recv_bytes

The number of bytes received by rdma write operations.

Unit: bytes

All instance types that support EFA
retrans_bytes

The number of EFA SRD bytes retransmitted.

Unit: count

Nitro v4 and later instance types that support EFA
retrans_pkts

The number of EFA SRD packets retransmitted.

Unit: bytes

Nitro v4 and later instance types that support EFA
retrans_timeout_events

The number of times EFA SRD traffic timed out and resulted in a network path change.

Unit: count

Nitro v4 and later instance types that support EFA
impaired_remote_conn_events

The number of times EFA SRD connections entered an impaired state, resulting in a reduced throughput rate limit.

Unit: count

Nitro v4 and later instance types that support EFA
unresponsive_remote_events

The number of times an EFA SRD remote connection was unresponsive.

Unit: count

Nitro v4 and later instance types that support EFA

For more information about the instance types that support EFA, see Supported instance types.

Retrieve EFA driver metrics for your instance

You can use the rdma-tool command line tool to retrieve the metrics for all EFA interfaces attached to an instance as follows:

$ rdma -p statistic show link rdmap0s31/1 tx_bytes 0 tx_pkts 0 rx_bytes 0 rx_pkts 0 rx_drops 0 send_bytes 0 send_wrs 0 recv_bytes 0 recv_wrs 0 rdma_read_wrs 0 rdma_read_bytes 0 rdma_read_wr_err 0 rdma_read_resp_bytes 0 rdma_write_wrs 0 rdma_write_bytes 0 rdma_write_wr_err 0 retrans_bytes 0 retrans_pkts 0 retrans_timeout_events 0 unresponsive_remote_events 0 impaired_remote_conn_events 0

Alternatively, you can retrieve the metrics for each EFA interface attached to an instance from the sys files using the following command.

$ more /sys/class/infiniband/device_number/ports/port_number/hw_counters/* | cat

For example

$ more /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/* | cat :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/lifespan :::::::::::::: 12 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_resp_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_wr_err :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_recv_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_wr_err :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/recv_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/recv_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_drops :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_pkts :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/send_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/send_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/tx_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/tx_pkts :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/retrans_bytes :::::::::::::: 0 /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/retrans_pkts :::::::::::::: 0 /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/retrans_timeout_events :::::::::::::: 0 /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/unresponsive_remote_events :::::::::::::: 0 /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/impaired_remote_conn_events :::::::::::::: 0

Amazon VPC flow logs

You can create an Amazon VPC Flow Log to capture information about the traffic going to and from an EFA. Flow log data can be published to Amazon CloudWatch Logs and Amazon S3. After you create a flow log, you can retrieve and view its data in the chosen destination. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

You create a flow log for an EFA in the same way that you create a flow log for an elastic network interface. For more information, see Create a flow log in the Amazon VPC User Guide.

In the flow log entries, EFA traffic is identified by the srcAddress and destAddress, which are both formatted as MAC addresses, as shown in the following example.

version accountId eniId srcAddress destAddress sourcePort destPort protocol packets bytes start end action log-status 2 3794735123 eni-10000001 01:23:45:67:89:ab 05:23:45:67:89:ab - - - 9 5689 1521232534 1524512343 ACCEPT OK

Amazon CloudWatch

If you are using EFA in an Amazon EKS cluster, you can monitor your EFAs using CloudWatch Container Insights. Amazon CloudWatch Container Insights supports all of the EFA driver metrics, except: retrans_bytes, retrans_pkts, retrans_timeout_events, unresponsive_remote_events, and impaired_remote_conn_events.

For more information, see Amazon EKS and Kubernetes Container Insights metrics in the Amazon CloudWatch User Guide.