Troubleshooting Container Insights
The following sections can help if you're having trouble issues with Container Insights.
Failed deployment on Amazon EKS or Kubernetes
If the agent doesn't deploy correctly on a Kubernetes cluster, try the following:
-
Run the following command to get the list of pods.
kubectl get pods -n amazon-cloudwatch
-
Run the following command and check the events at the bottom of the output.
kubectl describe pod
pod-name
-n amazon-cloudwatch -
Run the following command to check the logs.
kubectl logs
pod-name
-n amazon-cloudwatch
Unauthorized panic: Cannot retrieve cadvisor data from kubelet
If your deployment fails with the error Unauthorized panic: Cannot retrieve
cadvisor data from kubelet
, your kubelet might not have Webhook authorization mode
enabled. This mode is required for Container Insights. For more information, see Verify prerequisites.
Deploying Container Insights on a deleted and re-created cluster on Amazon ECS
If you delete an existing Amazon ECS cluster that does not have Container Insights enabled, and you re-create it with the same name, you can't enable Container Insights on this new cluster at the time you re-create it. You can enable it by re-creating it, and then entering the following command:
aws ecs update-cluster-settings --cluster
myCICluster
--settings name=containerInsights,value=enabled
Invalid endpoint error
If you see an error message similar to the following, check to make sure that you
replaced all the placeholders such as cluster-name
and
region-name
in the commands that you are using with the correct
information for your deployment.
"log": "2020-04-02T08:36:16Z E! cloudwatchlogs: code: InvalidEndpointURL, message: invalid endpoint uri, original error: &url.Error{Op:\"parse\", URL:\"https://logs.{{region_name}}.amazonaws.com/\", Err:\"{\"}, &awserr.baseError{code:\"InvalidEndpointURL\", message:\"invalid endpoint uri\", errs:[]error{(*url.Error)(0xc0008723c0)}}\n",
Metrics don't appear in the console
If you don't see any Container Insights metrics in the Amazon Web Services Management Console, be sure that you have completed the setup of Container Insights. Metrics don't appear before Container Insights has been set up completely. For more information, see Setting up Container Insights.
Pod metrics missing on Amazon EKS or Kubernetes after upgrading cluster
This section might be useful if all or some pod metrics are missing after you deploy the CloudWatch agent as a daemonset on a new or upgraded cluster, or
you see an error log with the message W! No pod metric collected
.
These errors can be caused by changes in the container runtime, such as containerd or the docker systemd cgroup driver. You can usually solve this by updating your deployment manifest so that the containerd socket from the host is mounted into the container. See the following example:
# For full example see https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: cloudwatch-agent namespace: amazon-cloudwatch spec: template: spec: containers: - name: cloudwatch-agent # ... # Don't change the mountPath volumeMounts: # ... - name: dockersock mountPath: /var/run/docker.sock readOnly: true - name: varlibdocker mountPath: /var/lib/docker readOnly: true - name: containerdsock # NEW mount mountPath: /run/containerd/containerd.sock readOnly: true # ... volumes: # ... - name: dockersock hostPath: path: /var/run/docker.sock - name: varlibdocker hostPath: path: /var/lib/docker - name: containerdsock # NEW volume hostPath: path: /run/containerd/containerd.sock
No pod metrics when using Bottlerocket for Amazon EKS
Bottlerocket is a Linux-based open source operating system that is purpose-built by Amazon for running containers.
Bottlerocket uses a different containerd
path on the host, so you need to change the volumes to its
location. If you don't, you see an error in the logs that includes W! No pod metric collected
. See
the following example.
volumes: # ... - name: containerdsock hostPath: # path: /run/containerd/containerd.sock # bottlerocket does not mount containerd sock at normal place # https://github.com/bottlerocket-os/bottlerocket/commit/91810c85b83ff4c3660b496e243ef8b55df0973b path: /run/dockershim.sock
No container filesystem metrics when using the containerd runtime for Amazon EKS or Kubernetes
This is a known issue and is being worked on by community contributors. For more information,
see Disk usage metric for containerd
Unexpected log volume increase from CloudWatch agent when collecting Prometheus metrics
This was a regression introduced in version 1.247347.6b250880 of the CloudWatch agent. This regression has
already been fixed in more recent versions of the agent. It's impact was limited to scenarios where customers
collected the logs of the CloudWatch agent itself and were also using Prometheus.
For more information,
see [prometheus] agent is printing all the scraped metrics in log
Latest docker image mentioned in release notes not found from Dockerhub
We update the release note and tag on Github before we start the actual release internally. It usually takes
1-2 weeks to see the latest docker image on registries after we bump the version number on Github. There
is no nightly release for the CloudWatch agent container image. You can build the image directly from source at the
following location:
https://github.com/aws/amazon-cloudwatch-agent/tree/main/amazon-cloudwatch-container-insights/cloudwatch-agent-dockerfile
CrashLoopBackoff error on the CloudWatch agent
If you see a CrashLoopBackOff
error for the CloudWatch agent, make sure that your
IAM permissions are set correctly. For more information, see Verify prerequisites.
CloudWatch agent or Fluentd pod stuck in pending
If you have a CloudWatch agent or Fluentd pod stuck in Pending
or with a
FailedScheduling
error, determine if your nodes have enough compute resources
based on the number of cores and amount of RAM required by the agents. Enter the following
command to describe the pod:
kubectl describe pod cloudwatch-agent-85ppg -n amazon-cloudwatch