Amazon EKS and Kubernetes Container Insights metrics
The following tables list the metrics and dimensions that Container Insights collects
for Amazon EKS and Kubernetes. These metrics are in the ContainerInsights
namespace.
For more information, see Metrics.
If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights. Metrics do not appear before Container Insights has been set up completely. For more information, see Setting up Container Insights.
Metric name | Dimensions | Description |
---|---|---|
|
|
The number of failed worker nodes in the cluster. A node is considered failed
if it is suffering from any node conditions. For more
information, see Conditions |
|
|
The total number of worker nodes in the cluster. |
|
|
The number of pods running per namespace in the resource that is specified by the dimensions that you're using. |
|
|
The maximum number of CPU units that can be assigned to a single node in this cluster. |
|
|
The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker. Formula: Note
|
|
|
The number of CPU units being used on the nodes in the cluster. |
|
|
The total percentage of CPU units being used on the nodes in the cluster. Formula: |
|
|
The total number of GPU(s) available on the node. |
|
|
The number of GPU(s) being used by the running pods on the node. |
|
|
The percentage of GPU currently being reserved on the node. The formula is, Note
|
|
|
The total percentage of file system capacity being used on nodes in the cluster. Formula: Note
|
|
|
The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster. |
|
|
The percentage of memory currently being used on the nodes in the cluster. Formula: Note
|
|
|
The percentage of memory currently being used by the node or nodes. It is the percentage of node memory usage divided by the node memory limitation. Formula: |
|
|
The amount of memory, in bytes, being used in the working set of the nodes in the cluster. |
|
|
The total number of bytes per second transmitted and received over the network per node in a cluster. Formula: Note
|
|
|
The number of running containers per node in a cluster. |
|
|
The number of running pods per node in a cluster. |
|
|
The CPU capacity that is reserved per pod in a cluster. Formula: Note
|
|
|
The percentage of CPU units being used by pods. Formula: |
|
|
The percentage of CPU units being used by pods relative to the pod limit. Formula: |
|
|
The GPU requests for the pod. This value must always be equal to |
|
|
The maximum number of GPU(s) that can be assigned to the pod in a node. |
|
|
The number of GPU(s) being allocated on the pod. |
|
|
The percentage of GPU currently being reserved for the pod. The formula is - pod_gpu_request / node_gpu_reserved_capacity. |
|
|
The percentage of memory that is reserved for pods. Formula: Note
|
|
|
The percentage of memory currently being used by the pod or pods. Formula: |
|
|
The percentage of memory that is being used by pods relative to the pod limit. If any containers in the pod don't have a memory limit defined, this metric doesn't appear. Formula: |
|
|
The number of bytes per second being received over the network by the pod. Formula: Note
|
|
|
The number of bytes per second being transmitted over the network by the pod. Formula: Note
|
|
|
The total number of container restarts in a pod. |
|
|
The number of pods running the service or services in the cluster. |
Kueue metrics
Beginning with version v2.4.0-eksbuild.1
of the the CloudWatch Observability EKS add-on, Container Insights for Amazon EKS supports
collecting Kueue metrics from Amazon EKS clusters. For more information about the add-on, see
Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart.
For information about enabling the metrics, see Enable Kueue metrics to enable them.
The Kueue metrics that are
collected are listed in the following table. These metrics are published into the
ContainerInsights/Prometheus
namespace in CloudWatch. Some of these metrics use the following dimensions:
ClusterQueue
is the name of the ClusterQueueThe possible values of
Status
areactive
andinadmissible
The possible values of
Reason
arePreempted
,PodsReadyTimeout
,AdmissionCheck
,ClusterQueueStopped
, andInactiveWorkload
Flavor
is the referenced flavor.Resource
refers to cluster computer resources, such ascpu
,memory
,gpu
, and so on.
Metric name | Dimensions | Description |
---|---|---|
|
|
The number of pending workloads. |
|
|
The total number of evicted workloads. |
|
|
The number of admitted workloads that are active (unsuspended and not finished). |
|
|
Reports the total resource usage of the ClusterQueue. |
|
|
Reports the resource quota of the ClusterQueue. |