Throughput is Too Slow - Managed Service for Apache Flink
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Managed Service for Apache Flink was previously known as Amazon Kinesis Data Analytics for Apache Flink.

Throughput is Too Slow

If your application is not processing incoming streaming data quickly enough, it will perform poorly and become unstable. This section describes symptoms and troubleshooting steps for this condition.

Symptoms

This condition can have the following symptoms:

  • If the data source for your application is a Kinesis stream, the stream's millisbehindLatest metric continually increases.

  • If the data source for your application is an Amazon MSK cluster, the cluster's consumer lag metrics continually increase. For more information, see Consumer-Lag Monitoring in the Amazon MSK Developer Guide.

  • If the data source for your application is a different service or source, check any available consumer lag metrics or data available.

Causes and Solutions

There can be many causes for slow application throughput. If your application is not keeping up with input, check the following:

  • If throughput lag is spiking and then tapering off, check if the application is restarting. Your application will stop processing input while it restarts, causing lag to spike. For information about application failures, see Application is Restarting.

  • If throughput lag is consistent, check to see if your application is optimized for performance. For information on optimizing your application's performance, see Troubleshooting Performance.

  • If throughput lag is not spiking but continuously increasing, and your application is optimized for performance, you must increase your application resources. For information on increasing application resources, see Scaling.

  • If your application reads from a Kafka cluster in a different Region and FlinkKafkaConsumer or KafkaSource are mostly idle (high idleTimeMsPerSecond or low CPUUtilization) despite high consumer lag, you can increase the value for receive.buffer.byte, such as 2097152. For more information, see the high latency environment section in Custom MSK configurations.

For troubleshooting steps for slow throughput or consumer lag increasing in the application source, see Troubleshooting Performance.