Flink 1.15 Async Sink Deadlock - Managed Service for Apache Flink
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Managed Service for Apache Flink was previously known as Amazon Kinesis Data Analytics for Apache Flink.

Flink 1.15 Async Sink Deadlock

There is a known issue with Amazon connectors for Apache Flink implementing AsyncSink interface. This affects applications using Flink 1.15 with the following connectors:

  • For Java applications:

    • KinesisStreamsSink – org.apache.flink:flink-connector-kinesis

    • KinesisStreamsSink – org.apache.flink:flink-connector-aws-kinesis-streams

    • KinesisFirehoseSink – org.apache.flink:flink-connector-aws-kinesis-firehose

    • DynamoDbSink – org.apache.flink:flink-connector-dynamodb

  • Flink SQL/TableAPI/Python applications:

    • kinesis – org.apache.flink:flink-sql-connector-kinesis

    • kinesis – org.apache.flink:flink-sql-connector-aws-kinesis-streams

    • firehose – org.apache.flink:flink-sql-connector-aws-kinesis-firehose

    • dynamodb – org.apache.flink:flink-sql-connector-dynamodb

Affected applications will experience the following symptoms:

  • Flink job is in RUNNING state, but not processing data;

  • There are no job restarts;

  • Checkpoints are timing out.

The issue is caused by a bug in Amazon SDK resulting in it not surfacing certain errors to the caller when using the async HTTP client. This results in the sink waiting indefinitely for an “in-flight request” to complete during a checkpoint flush operation.

This issue had been fixed in Amazon SDK starting from version 2.20.144.

Following are instructions on how to update affected connectors to use the new version of Amazon SDK in your applications: