Data Freshness Metric Increasing or Not Emitted - Amazon Kinesis Data Firehose
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Data Freshness Metric Increasing or Not Emitted

Data freshness is a measure of how current your data is within your delivery stream. It is the age of the oldest data record in the delivery stream, measured from the time that Kinesis Data Firehose ingested the data to the present time. Kinesis Data Firehose provides metrics that you can use to monitor data freshness. To identify the data-freshness metric for a given destination, see Monitoring Kinesis Data Firehose Using CloudWatch Metrics.

If you enable backup for all events or all documents, monitor two separate data-freshness metrics: one for the main destination and one for the backup.

If the data-freshness metric isn't being emitted, this means that there is no active delivery for the delivery stream. This happens when data delivery is completely blocked or when there's no incoming data.

If the data-freshness metric is constantly increasing, this means that data delivery is falling behind. This can happen for one of the following reasons.

  • The destination can't handle the rate of delivery. If Kinesis Data Firehose encounters transient errors due to high traffic, then the delivery might fall behind. This can happen for destinations other than Amazon S3 (it can happen for OpenSearch Service, Amazon Redshift, or Splunk). Ensure that your destination has enough capacity to handle the incoming traffic.

  • The destination is slow. Data delivery might fall behind if Kinesis Data Firehose encounters high latency. Monitor the destination's latency metric.

  • The Lambda function is slow. This might lead to a data delivery rate that is less than the data ingestion rate for the delivery stream. If possible, improve the efficiency of the Lambda function. For instance, if the function does network IO, use multiple threads or asynchronous IO to increase parallelism. Also, consider increasing the memory size of the Lambda function so that the CPU allocation can increase accordingly. This might lead to faster Lambda invocations. For information about configuring Lambda functions, see Configuring Amazon Lambda Functions.

  • There are failures during data delivery. For information about how to monitor errors using Amazon CloudWatch Logs, see Monitoring Kinesis Data Firehose Using CloudWatch Logs.

  • If the data source of the delivery stream is a Kinesis data stream, throttling might be happening. Check the ThrottledGetRecords, ThrottledGetShardIterator, and ThrottledDescribeStream metrics. If there are multiple consumers attached to the Kinesis data stream, consider the following:

    • If the ThrottledGetRecords and ThrottledGetShardIterator metrics are high, we recommend you increase the number of shards provisioned for the data stream.

    • If the ThrottledDescribeStream is high, we recommend you add the kinesis:listshards permission to the role configured in KinesisStreamSourceConfiguration.

  • Low buffering hints for the destination. This might increase the number of round trips that Kinesis Data Firehose needs to make to the destination, which might cause delivery to fall behind. Consider increasing the value of the buffering hints. For more information, see BufferingHints.

  • A high retry duration might cause delivery to fall behind when the errors are frequent. Consider reducing the retry duration. Also, monitor the errors and try to reduce them. For information about how to monitor errors using Amazon CloudWatch Logs, see Monitoring Kinesis Data Firehose Using CloudWatch Logs.

  • If the destination is Splunk and DeliveryToSplunk.DataFreshness is high but DeliveryToSplunk.Success looks good, the Splunk cluster might be busy. Free the Splunk cluster if possible. Alternatively, contact Amazon Support and request an increase in the number of channels that Kinesis Data Firehose is using to communicate with the Splunk cluster.