Monitoring Amazon MSK Connect
Monitoring is an important part of maintaining the reliability, availability, and performance of MSK Connect and your other Amazon solutions. Amazon CloudWatch monitors your Amazon resources and the applications that you run on Amazon in real-time. You can collect and track metrics, create customized dashboards, and set alarms that notify you or take actions when a specified metric reaches a threshold that you specify. For example, you can have CloudWatch track CPU usage or other metrics of your connector, so that you can increase its capacity if needed. For more information, see the Amazon CloudWatch User Guide.
You can use the following API operations:
-
DescribeConnectorOperation
: Monitor the status of connector update operations. -
ListConnectorOperations
: Track previous updates run on your connector.
The following table shows the metrics that MSK Connect sends to CloudWatch under the ConnectorName
dimension. MSK Connect delivers these metrics by default and at no additional cost. CloudWatch keeps these metrics for 15 months, so that you can access historical information and gain a better perspective on how your connectors are performing. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.
Metric name | Description |
---|---|
CpuUtilization |
The percentage of CPU consumption by system and user. |
ErroredTaskCount |
The number of tasks that have errored out. |
MemoryUtilization |
The percentage of the total memory on a worker instance, not just the Java virtual machine (JVM) heap memory currently in use. JVM doesn't typically release memory back to the operational system. So, JVM heap size (MemoryUtilization) usually starts with a minimum heap size that incrementally increases to a stable maximum of about 80-90%. JVM heap usage might increase or decrease as the connector's actual memory usage changes. |
RebalanceCompletedTotal |
The total number of rebalances completed by this connector. |
RebalanceTimeAvg |
The average time in milliseconds spent by the connector on rebalancing. |
RebalanceTimeMax |
The maximum time in milliseconds spent by the connector on rebalancing. |
RebalanceTimeSinceLast |
The time in milliseconds since this connector completed the most recent rebalance. |
RunningTaskCount |
The running number of tasks in the connector. |
SinkConsumerByteRate |
The average number of bytes consumed per second by the Kafka Connect framework's Sink consumer before any transformations are applied to data. |
SinkRecordReadRate |
The average per-second number of records read from the Apache Kafka or Amazon MSK cluster. |
SinkRecordSendRate |
The average per-second number of records that are output from the transformations and sent to the destination. This number doesn't include filtered records. |
SourceRecordPollRate |
The average per-second number of records produced or polled. |
SourceProducerByteRate |
The average number of bytes produced per second by the Kafka Connect framework's Source producer after any transformations are applied to data. |
SourceRecordWriteRate |
The average per-second number of records output from the transformations and written to the Apache Kafka or Amazon MSK cluster. |
TaskStartupAttemptsTotal |
The total number of task startups that the connector has attempted. You can use this metric to identify anomalies in task startup attempts. |
TaskStartupSuccessPercentage |
The average percentage of successful task starts for the connector. You can use this metric to identify anomalies in task startup attempts. |
WorkerCount |
The number of workers that are running in the connector. |
BytesInPerSec |
Metadata bytes transferred to Kafka Connect framework for communication between workers. |
BytesOutPerSec |
Metadata bytes transferred from Kafka Connect framework for communication between workers. |