Prometheus metrics
Important
Amazon Managed Service for Prometheus isn't available in this Amazon Web Services Region.
Prometheus
Amazon Managed Service for Prometheus is a Prometheus-compatible monitoring and alerting service that makes it easy to monitor containerized applications and infrastructure at scale. It is a fully-managed service that automatically scales the ingestion, storage, querying, and alerting of your metrics. It also integrates with Amazon security services to enable fast and secure access to your data. You can use the open-source PromQL query language to query your metrics and alert on them.
For more information about how to use the Prometheus metrics after you turn them on, see the Amazon Managed Service for Prometheus User Guide.
Turn on Prometheus metrics when creating a cluster
Important
Amazon Managed Service for Prometheus resources are outside of the cluster lifecycle and need to be maintained independent of the cluster. When you delete your cluster, make sure to also delete any applicable scrapers to stop applicable costs. For more information, see Find and delete scrapers in the Amazon Managed Service for Prometheus User Guide.
When you create a new cluster, you can turn on the option to send metrics to Prometheus. In the Amazon Web Services Management Console, this option is in the Configure observability step of creating a new cluster. For more information, see Creating an Amazon EKS cluster.
Prometheus discovers and collects metrics from your cluster through a pull-based model called scraping. Scrapers are set up to gather data from your cluster infrastructure and containerized applications.
When you turn on the option to send Prometheus metrics, Amazon Managed Service for Prometheus provides a fully managed agentless scraper. Use the following Advanced configuration options to customize the default scraper as needed.
- Scraper alias
-
(Optional) Enter a unique alias for the scraper.
- Destination
-
Choose an Amazon Managed Service for Prometheus workspace. A workspace is a logical space dedicated to the storage and querying of Prometheus metrics. With this workspace, you will be able to view Prometheus metrics across the accounts that have access to it. The Create new workspace option tells Amazon EKS to create a workspace on your behalf using the Workspace alias you provide. With the Select existing workspace option, you can select an existing workspace from a dropdown list. For more information about workspaces, see Managing workspaces in the Amazon Managed Service for Prometheus User Guide.
- Service access
-
This section summarizes the permissions you grant when sending Prometheus metrics:
-
Allow Amazon Managed Service for Prometheus to describe the scraped Amazon EKS cluster
-
Allow remote writing to the Amazon Managed Prometheus workspace
If the
AmazonManagedScraperRole
already exists, the scraper uses it. Choose theAmazonManagedScraperRole
link to see the Permission details. If theAmazonManagedScraperRole
doesn’t exist already, choose the View permission details link to see the specific permissions you are granting by sending Prometheus metrics. -
- Subnets
-
View the subnets that the scraper will inherit. If you need to change them, go back to the create cluster Specify networking step.
- Security groups
-
View the security groups that the scraper will inherit. If you need to change them, go back to the create cluster Specify networking step.
- Scraper configuration
-
Modify the scraper configuration in YAML format as needed. To do so, use the form or upload a replacement YAML file. For more information, see Scraper configuration in the Amazon Managed Service for Prometheus User Guide.
Amazon Managed Service for Prometheus refers to the agentless scraper that is created alongside the cluster as an Amazon managed collector. For more information about Amazon managed collectors, see Amazon managed collectors in the Amazon Managed Service for Prometheus User Guide.
Important
You must set up your aws-auth
ConfigMap
to give the scraper in-cluster permissions. For more
information, see Configuring your Amazon EKS cluster in the Amazon Managed Service for Prometheus User
Guide.
Viewing Prometheus scraper details
After creating a cluster with the Prometheus metrics option turned on, you can view your Prometheus scraper details. When viewing your cluster in the Amazon Web Services Management Console, choose the Observability tab. A table shows a list of scrapers for the cluster, including information such as the scraper ID, alias, status, and creation date.
To see more details about the scraper, choose a scraper ID link. For example, you can
view the scraper configuration, Amazon Resource Name (ARN), remote write URL, and networking information.
You can use the scraper ID as input to Amazon Managed Service for Prometheus API operations like
DescribeScraper
and DeleteScraper
. You can also use the
API to create more scrapers.
For more information on using the Prometheus API, see the Amazon Managed Service for Prometheus API Reference.
Deploying Prometheus using Helm
Alternatively, you can deploy Prometheus into your cluster with
Helm V3. If you already have Helm installed, you can
check your version with the helm version
command. Helm is a
package manager for Kubernetes clusters. For more information about Helm and
how to install it, see Using Helm with Amazon EKS.
After you configure Helm for your Amazon EKS cluster, you can use it to deploy Prometheus with the following steps.
To deploy Prometheus using Helm
-
Create a Prometheus namespace.
kubectl create namespace prometheus
-
Add the
prometheus-community
chart repository.helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-
Deploy Prometheus.
helm upgrade -i prometheus prometheus-community/prometheus \ --namespace prometheus \ --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"
Note
If you get the error
Error: failed to download "stable/prometheus" (hint: running `helm repo update` may help)
when executing this command, runhelm repo update prometheus-community
, and then try running the Step 2 command again.If you get the error
Error: rendered manifests contain a resource that already exists
, runhelm uninstall your-release-name -n namespace
, then try running the Step 3 command again. -
Verify that all of the Pods in the
prometheus
namespace are in theREADY
state.kubectl get pods -n prometheus
An example output is as follows.
NAME READY STATUS RESTARTS AGE prometheus-alertmanager-59b4c8c744-r7bgp 1/2 Running 0 48s prometheus-kube-state-metrics-7cfd87cf99-jkz2f 1/1 Running 0 48s prometheus-node-exporter-jcjqz 1/1 Running 0 48s prometheus-node-exporter-jxv2h 1/1 Running 0 48s prometheus-node-exporter-vbdks 1/1 Running 0 48s prometheus-pushgateway-76c444b68c-82tnw 1/1 Running 0 48s prometheus-server-775957f748-mmht9 1/2 Running 0 48s
-
Use
kubectl
to port forward the Prometheus console to your local machine.kubectl --namespace=prometheus port-forward deploy/prometheus-server 9090
-
Point a web browser to
http://localhost:9090
to view the Prometheus console. -
Choose a metric from the - insert metric at cursor menu, then choose Execute. Choose the Graph tab to show the metric over time. The following image shows
container_memory_usage_bytes
over time. -
From the top navigation bar, choose Status, then Targets.
All of the Kubernetes endpoints that are connected to Prometheus using service discovery are displayed.
Viewing the control plane raw metrics
As an alternative to deploying Prometheus, the Kubernetes API server exposes
a number of metrics that are represented in a Prometheus format/metrics
HTTP API. Like other endpoints, this endpoint is
exposed on the Amazon EKS control plane. This endpoint is primarily useful for looking at a
specific metric. To analyze metrics over time, we recommend deploying
Prometheus.
To view the raw metrics output, use kubectl
with the --raw
flag. This
command allows you to pass any HTTP path and returns the raw response.
kubectl get --raw /metrics
An example output is as follows.
[...] # HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host. # TYPE rest_client_requests_total counter rest_client_requests_total{code="200",host="127.0.0.1:21362",method="POST"} 4994 rest_client_requests_total{code="200",host="127.0.0.1:443",method="DELETE"} 1 rest_client_requests_total{code="200",host="127.0.0.1:443",method="GET"} 1.326086e+06 rest_client_requests_total{code="200",host="127.0.0.1:443",method="PUT"} 862173 rest_client_requests_total{code="404",host="127.0.0.1:443",method="GET"} 2 rest_client_requests_total{code="409",host="127.0.0.1:443",method="POST"} 3 rest_client_requests_total{code="409",host="127.0.0.1:443",method="PUT"} 8 # HELP ssh_tunnel_open_count Counter of ssh tunnel total open attempts # TYPE ssh_tunnel_open_count counter ssh_tunnel_open_count 0 # HELP ssh_tunnel_open_fail_count Counter of ssh tunnel failed open attempts # TYPE ssh_tunnel_open_fail_count counter ssh_tunnel_open_fail_count 0
This raw output returns verbatim what the API server exposes. The different metrics are listed by line, with each line including a metric name, tags, and a value.
metric_name
{"tag
"="value
"[,...
]}value