Help improve this page
Want to contribute to this user guide? Choose the Edit this page on GitHub link that is located in the right pane of every page. Your contributions will help make our user guide better for everyone.
View the health status of your nodes
This topic explains the tools and methods available for monitoring node health status in Amazon EKS clusters. The information covers node conditions, events, and detection cases that help you identify and diagnose node-level issues. Use the commands and patterns described here to inspect node health resources, interpret status conditions, and analyze node events for operational troubleshooting.
You can get some node health information with Kubernetes commands for all nodes. And if you use the node monitoring agent through Amazon EKS Auto Mode or the Amazon EKS managed add-on, you will get a wider variety of node signals to help troubleshoot. Descriptions of detected health issues by the node monitoring agent are also made available in the observability dashboard. For more information, see Enable node auto repair and investigate node health issues.
Node conditions
Node conditions represent terminal issues requiring remediation actions like instance replacement or reboot.
To get conditions for all nodes:
kubectl get nodes -o 'custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type,STATUS:.status.conditions[*].status'
To get detailed conditions for a specific node
kubectl describe node
node-name
Example condition output of a healthy node:
- lastHeartbeatTime: "2024-11-21T19:07:40Z" lastTransitionTime: "2024-11-08T03:57:40Z" message: Monitoring for the Networking system is active reason: NetworkingIsReady status: "True" type: NetworkingReady
Example condition of a unhealthy node with a networking problem:
- lastHeartbeatTime: "2024-11-21T19:12:29Z" lastTransitionTime: "2024-11-08T17:04:17Z" message: IPAM-D has failed to connect to API Server which could be an issue with IPTable rules or any other network configuration. reason: IPAMDNotReady status: "False" type: NetworkingReady
Node events
Node events indicate temporary issues or sub-optimal configurations.
To get all events reported by the node monitoring agent
When the node monitoring agent is available, you can run the following command.
kubectl get events --field-selector=reportingComponent=eks-node-monitoring-agent
Sample output:
LAST SEEN TYPE REASON OBJECT MESSAGE 4s Warning SoftLockup node/ip-192-168-71-251.us-west-2.compute.internal CPU stuck for 23s
To get events for all nodes
kubectl get events --field-selector involvedObject.kind=Node
To get events for a specific node
kubectl get events --field-selector involvedObject.kind=Node,involvedObject.name=
node-name
To watch events in real-time
kubectl get events -w --field-selector involvedObject.kind=Node
Example event output:
LAST SEEN TYPE REASON OBJECT MESSAGE 2m Warning MemoryPressure Node/node-1 Node experiencing memory pressure 5m Normal NodeReady Node/node-1 Node became ready
Common troubleshooting commands
# Get comprehensive node status kubectl get node
node-name
-o yaml # Watch node status changes kubectl get nodes -w # Get node metrics kubectl top node