App Mesh Kubernetes troubleshooting - Amazon App Mesh
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

App Mesh Kubernetes troubleshooting

This topic details common issues that you may experience when you use App Mesh with Kubernetes.

App Mesh resources created in Kubernetes cannot be found in App Mesh

Symptoms

You have created the App Mesh resources using the Kubernetes custom resource definition (CRD), but the resources that you created are not visible in App Mesh when you use the Amazon Web Services Management Console or APIs.

Resolution

The likely cause is an error in the Kubernetes controller for App Mesh. For more information, see Troubleshooting on GitHub. Check the controller logs for any errors or warnings indicating that the controller could not create any resources.

kubectl logs -n appmesh-system -f \ $(kubectl get pods -n appmesh-system -o name | grep controller)

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

Pods are failing readiness and liveliness checks after Envoy sidecar is injected

Symptoms

Pods for your application were previously running successfully, but after the Envoy sidecar is injected into a pod, readiness and liveliness checks begin failing.

Resolution

Make sure that the Envoy container that was injected into the pod has bootstrapped with App Mesh’s Envoy management service. You can verify any errors by referencing the error codes in Envoy disconnected from App Mesh Envoy management service with error text. You can use the following command to inspect Envoy logs for the relevant pod.

kubectl logs -n appmesh-system -f \ $(kubectl get pods -n appmesh-system -o name | grep controller) \ | grep "gRPC config stream closed"

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

Pods not registering or deregistering as Amazon Cloud Map instances

Symptoms

Your Kubernetes pods are not being registered in or de-registered from Amazon Cloud Map as part of their life cycle. A pod may start successfully and be ready to serve traffic, but not receive any. When a pod is terminated, clients may still retain its IP address and attempt to send traffic to it, failing.

Resolution

This is a known issue. For more information, see the Pods don't get auto registered/deregistered in Kubernetes with Amazon Cloud Map GitHub issue. Due to the relationship between pods, App Mesh virtual nodes, and Amazon Cloud Map resources, the App Mesh controller for Kubernetes may become desynchronized and lose resources. For example, this can happen if a virtual node resource is deleted from Kubernetes before terminating its associated pods.

To mitigate this issue:

  • Make sure that you are running the latest version of the App Mesh controller for Kubernetes.

  • Make sure that the Amazon Cloud Map namespaceName and serviceName are correct in your virtual node definition.

  • Make sure that you delete any associated pods prior to deleting your virtual node definition. If you need help identifying which pods are associated with a virtual node, see Cannot determine where a pod for an App Mesh resource is running.

  • If your issue persists, run the following command to inspect your controller logs for errors that may help reveal the underlying issue.

    kubectl logs -n appmesh-system \ $(kubectl get pods -n appmesh-system -o name | grep appmesh-controller)
  • Consider using the following command to restart your controller pods. This may fix synchronization issues.

    kubectl delete -n appmesh-system \ $(kubectl get pods -n appmesh-system -o name | grep appmesh-controller)

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

Cannot determine where a pod for an App Mesh resource is running

Symptoms

When you run App Mesh on a Kubernetes cluster, an operator cannot determine where a workload, or pod, is running for a given App Mesh resource.

Resolution

Kubernetes pod resources are annotated with the mesh and virtual node that they are associated to. You can query which pods are running for a given virtual node name with the following command.

kubectl get pods --all-namespaces -o json | \ jq '.items[] | { metadata } | select(.metadata.annotations."appmesh.k8s.aws/virtualNode" == "virtual-node-name")'

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

Cannot determine what App Mesh resource a pod is running as

Symptoms

When running App Mesh on a Kubernetes cluster, an operator cannot determine what App Mesh resource a given pod is running as.

Resolution

Kubernetes pod resources are annotated with the mesh and virtual node that they are associated to. You can output the mesh and virtual node names by querying the pod directly using the following command.

kubectl get pod pod-name -n namespace -o json | \ jq '{ "mesh": .metadata.annotations."appmesh.k8s.aws/mesh", "virtualNode": .metadata.annotations."appmesh.k8s.aws/virtualNode" }'

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

Client Envoys are not able to communicate with App Mesh Envoy Management Service with IMDSv1 disabled

Symptoms

When IMDSv1 is disabled, client Envoys aren't able to communicate with the App Mesh control plane (Envoy Management Service). IMDSv2 support is not available on App Mesh Envoy version before v1.24.0.0-prod.

Resolution

To resolve this issue, you can do one of these three things.

  • Upgrade to App Mesh Envoy version v1.24.0.0-prod or later, which has IMDSv2 support.

  • Re-enable IMDSv1 on the Instance where Envoy is running. For instructions on restoring IMDSv1, see Configure the instance metadata options.

  • If your services are running on Amazon EKS, it is recommended to use IAM roles for service accounts (IRSA) for fetching credentials. For instructions to enable IRSA, see IAM roles for service accounts.

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.

IRSA does not work on application container when App Mesh is enabled and Envoy is injected

Symptoms

When App Mesh is enabled on an Amazon EKS cluster with the help of the App Mesh controller for Amazon EKS, Envoy and proxyinit containers are injected into the application pod. The application is not able to assume IRSA and instead assumes the node role. When we describe the pod details, we then see that either the AWS_WEB_IDENTITY_TOKEN_FILE or AWS_ROLE_ARN environment variable are not included in the application container.

Resolution

If either AWS_WEB_IDENTITY_TOKEN_FILE or AWS_ROLE_ARN environment variables are defined, then the webhook will skip the pod. Don't provide either of these variables and the webhook will take care of injecting them for you.

reservedKeys := map[string]string{ "AWS_ROLE_ARN": "", "AWS_WEB_IDENTITY_TOKEN_FILE": "", } ... for _, env := range container.Env { if _, ok := reservedKeys[env.Name]; ok { reservedKeysDefined = true }

If your issue is still not resolved, then consider opening a GitHub issue or contact Amazon Support.