Writing to Amazon Data Firehose Using CloudWatch Logs - Amazon Data Firehose
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Data Firehose was previously known as Amazon Kinesis Data Firehose

Writing to Amazon Data Firehose Using CloudWatch Logs

CloudWatch Logs events can be sent to Firehose using CloudWatch subscription filters. For more information, see Subscription filters with Amazon Data Firehose.

CloudWatch Logs events are sent to Firehose in compressed gzip format. If you want to deliver decompressed log events to Firehose destinations, you can use the decompression feature in Firehose to automatically decompress CloudWatch Logs.

Important

Amazon Firehose currently does not support the delivery of CloudWatch Logs to Amazon OpenSearch Service destination because Amazon CloudWatch combines multiple log events into one Firehose record and Amazon OpenSearch Service cannot accept multiple log events in one record. As an alternative, you can consider Using subscription filter for Amazon OpenSearch Service in CloudWatch Logs.

Decompression of CloudWatch Logs

If you are using Firehose to deliver CloudWatch Logs and want to deliver decompressed data to your delivery stream destination, use Firehose Data Format Conversion (Parquet, ORC) or Dynamic partitioning. You must enable decompression for your Firehose delivery stream.

You can enable decompression using the Amazon Web Services Management Console, Amazon Command Line Interface or Amazon SDKs.

Note

If you enable the decompression feature on a stream, use that stream exclusively for CloudWatch Logs subscriptions filters, and not for Vended Logs. If you enable the decompression feature on a stream that is used to ingest both CloudWatch Logs and Vended Logs, the Vended Logs ingestion to Firehose fails. This decompression feature is only for CloudWatch Logs.

Message extraction after decompression of CloudWatch Logs

When you enable decompression, you have the option to also enable message extraction. When using message extraction, Firehose filters out all metadata, such as owner, loggroup, logstream, and others from the decompressed CloudWatch Logs records and delivers only the content inside the message fields. If you are delivering data to a Splunk destination, you must turn on message extraction for Splunk to parse the data. Following are sample outputs after decompression with and without message extraction.

Fig 1: Sample output after decompression without message extraction:

{ "owner": "111111111111", "logGroup": "CloudTrail/logs", "logStream": "111111111111_CloudTrail/logs_us-east-1", "subscriptionFilters": [ "Destination" ], "messageType": "DATA_MESSAGE", "logEvents": [ { "id": "31953106606966983378809025079804211143289615424298221568", "timestamp": 1432826855000, "message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root1\"}" }, { "id": "31953106606966983378809025079804211143289615424298221569", "timestamp": 1432826855000, "message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root2\"}" }, { "id": "31953106606966983378809025079804211143289615424298221570", "timestamp": 1432826855000, "message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root3\"}" } ] }

Fig 2: Sample output after decompression with message extraction:

{"eventVersion":"1.03","userIdentity":{"type":"Root1"} {"eventVersion":"1.03","userIdentity":{"type":"Root2"} {"eventVersion":"1.03","userIdentity":{"type":"Root3"}

Enabling and disabling decompression

You can enable and disable decompression using the Amazon Web Services Management Console, Amazon Command Line Interface or Amazon SDKs.

Enabling decompression on a new data stream using the Amazon Web Services Management Console

To enable decompression on a new data stream using the Amazon Web Services Management Console
  1. Sign in to the Amazon Web Services Management Console and open the Kinesis console at https://console.amazonaws.cn/kinesis.

  2. Choose Data Firehose in the navigation pane.

  3. Choose Create delivery stream.

  4. Under Choose source and destination

    Delivery stream source

    The source of your Firehose stream. Choose one of the following sources:

    • Direct PUT – Choose this option to create a Firehose stream that producer applications write to directly. Currently, the following are Amazon services and agents and open source services that are integrated with Direct PUT in Firehose:

    • Kinesis stream: Choose this option to configure a Firehose stream that uses a Kinesis data stream as a data source. You can then use Firehose to read data easily from an existing Kinesis data stream and load it into destinations. For more information, see Writing to Firehose Using Kinesis Data Streams

    Destination

    The destination of your Firehose stream. Choose one of the following:

    • Amazon S3

    • Splunk

  5. Under Delivery stream name, enter a name for your stream.

  6. Under Transform records - optional:

    • In the Decompress source records from Amazon CloudWatch Logs section, choose Turn on decompression.

    • If you want to use message extraction after decompression, choose Turn on message extraction.

Enabling decompression on an existing data stream using the Amazon Web Services Management Console

If you have a Firehose stream with a Lambda function to perform decompression, you can replace it with the Firehose decompression feature. Before you proceed, review your Lambda function code to confirm that it only performs decompression or message extraction. The output of your Lambda function should look similar to the examples shown in Fig 1 or Fig 2 in the previous section. If the output looks similar, you can replace the Lambda function using the following steps.

  1. Replace your current Lambda function with this blueprint. The new blueprint Lambda function automatically detects whether the incoming data is compressed or decompressed. It only performs decompression if its input data is compresssed.

  2. Turn on decompression using the built-in Firehose option for decompression.

  3. Enable CloudWatch metrics for your Firehose stream if it's not already enabled. Monitor the metric CloudWatchProcessorLambda_IncomingCompressedData and wait until this metric changes to zero. This confirms that all input data sent to your Lambda function is decompressed and the Lambda function is no longer required.

  4. Remove the Lambda data transformation because you no longer need it to decompress your stream.

Disabling decompression using the Amazon Web Services Management Console

To disable decompression on a data stream using the Amazon Web Services Management Console

  1. Sign in to the Amazon Web Services Management Console and open the Kinesis console at https://console.amazonaws.cn/kinesis.

  2. Choose Data Firehose in the navigation pane.

  3. Choose the delivery stream you wish to edit.

  4. Choose Configuration.

  5. In the Transform records section, choose Edit.

  6. Under Decompress source records from Amazon CloudWatch Logs, uncheck Turn on decompression and then choose Save changes.

FAQ

What happens to the source data in case of an error during decompression?

If Amazon Firehose is not able to decompress the record, the record is delivered as is (in compressed format) to error S3 bucket you specified during delivery stream creation time. Along with the record, the delivered object also includes error code and error message and these objects will be delivered to an S3 bucket prefix called decompression-failed. Firehose will continue to process other records after a failed decompression of a record.

What happens to the source data in case of an error in the processing pipeline after successful decompression?

If Amazon Firehose errors out in the processing steps after decompression like Dynamic Partitioning and Data Format Conversion, the record is delivered in compressed format to the error S3 bucket you specified during delivery stream creation time. Along with the record, the delivered object also includes error code and error message.

How are you informed in case of an error or an exception?

In case of an error or an exception during decompression, if you configure CloudWatch Logs, Firehose will log error messages into CloudWatch Logs. Additionally, Amazon Firehose sends metrics to CloudWatch metrics that you can monitor. You can also optionally create alarms based on metrics emitted by Firehose.

What happens when put operations don't come from CloudWatch Logs?

When customer puts do not come from CloudWatch Logs, then the following error message is returned:

Put to Firehose failed for AccountId: <accountID>, FirehoseName: <firehosename> because the request is not originating from allowed source types.

What metrics does Firehose emit for the decompression feature?

Firehose emits metrics for decompression of every record. You should select the period (1 min), statistic (sum), date range to get the number of DecompressedRecords failed or succeeded or DecompressedBytes failed or succeeded. For more information, see CloudWatch Logs Decompression Metrics.