# Configurable parser-type processors
<a name="CloudWatch-Logs-Transformation-Configurable"></a>

This section contains information about the configurable data parser processors that you can use in a log event transformer. 

**Contents**
+ [parseJSON](#CloudWatch-Logs-Transformation-parseJSON)
+ [grok](#CloudWatch-Logs-Transformation-Grok)
  + [Grok examples](#Grok-Examples)
    + [Example 1: Use grok to extract a field from unstructured logs](#Grok-Example1)
    + [Example 2: Use grok in combination with parseJSON to extract fields from a JSON log event](#Grok-Example3)
    + [Example 3: Grok pattern with dotted annotation in FIELD\$1NAME](#Grok-Example4)
  + [Supported grok patterns](#Grok-Patterns)
    + [Common log format examples](#Common-Log-Examples)
      + [Apache log example](#Apache-Log-Example)
      + [NGINX log example](#NGINX-Log-Example)
      + [Syslog Protocol (RFC 5424) log example](#syslog5424-Log-Example)
+ [csv](#CloudWatch-Logs-Transformation-csv)
+ [parseKeyValue](#CloudWatch-Logs-Transformation-parseKeyValue)

## parseJSON
<a name="CloudWatch-Logs-Transformation-parseJSON"></a>

The **parseJSON** processor parses JSON log events and inserts extracted JSON key-value pairs under the destination. If you don't specify a destination, the processor places the key-value pair under the root node. When using `parseJSON` as the first processor, you must parse the entire log event using `@message` as the source field. After the initial JSON parsing, you can then manipulate specific fields in subsequent processors. 

The original `@message` content is not changed, the new keys are added to the message.


| Field | Description | Required? | Default | Limits | 
| --- | --- | --- | --- | --- | 
|  source | Path to the field in the log event that will be parsed. Use dot notation to access child fields. For example, store.book |  No | `@message`  | Maximum length: 128 Maximum nested key depth: 3 | 
|  destination | The destination field of the parsed JSON |  No | `Parent JSON node`  | Maximum length: 128 Maximum nested key depth: 3 | 

**Example**

Suppose an ingested log event looks like this:

```
{
    "outer_key": {
        "inner_key": "inner_value"
    }
}
```

Then if we have this **parseJSON** processor:

```
[
   {
        "parseJSON": {
            "destination": "new_key"
        }
   }
]
```

The transformed log event would be the following.

```
{
    "new_key": {
        "outer_key": {
            "inner_key": "inner_value"
        }
    }
}
```

## grok
<a name="CloudWatch-Logs-Transformation-Grok"></a>

Use the grok processor to parse and structure unstructured data using pattern matching. This processor can also extract fields from log messages.


| Field | Description | Required? | Default | Limits | Notes | 
| --- | --- | --- | --- | --- | --- | 
|  source | Path of the field to apply Grok matching on |  No | `@message`  | Maximum length: 128 Maximum nested key depth: 3 | 
|  match | The grok pattern to match against the log event  |  Yes |  | Maximum length: 512 Maximum grok patterns: 20 Some grok pattern types have individual usage limits. Any combination of the following patterns can be used as many as five times: \$1URI, URIPARAM, URIPATHPARAM, SPACE, DATA, GREEDYDATA, GREEDYDATA\$1MULTILINE\$1 Grok patterns don't support type conversions. For common log format patterns (APACHE\$1ACCESS\$1LOG, NGINX\$1ACCESS\$1LOG, SYSLOG5424), only DATA, GREEDYDATA, or GREEDYDATA\$1MULTILINE patterns are supported to be included after the common log pattern.  | [See all supported Grok patterns](#Grok-Patterns) | 

**Structure of a Grok Pattern**

This is the supported grok pattern structure:

```
%{PATTERN_NAME:FIELD_NAME}
```
+ **PATTERN\$1NAME**: Refers to a pre-defined regular expression for matching a specific type of data. Only predefined [grok patterns](#Grok-Patterns) are supported. Creating custom patterns is not allowed.
+ **FIELD\$1NAME**: Assigns a name to the extracted value. `FIELD_NAME` is optional, but if you don't specify this value then the extracted data will be dropped from the transformed log event. If `FIELD_NAME` uses dotted notation (e.g., "parent.child"), it is considered as a JSON path.
+ **Type Conversion**: Explicit type conversions are not be supported. Use [TypeConverter processor](CloudWatch-Logs-Transformation-Datatype.md#CloudWatch-Logs-Transformation-typeConverter) to convert the datatype of any value extracted by grok.

To create more complex matching expressions, you can combine several grok patterns. As many as 20 grok patterns can be combined to match a log event. For example, this combination of patterns `%{NUMBER:timestamp} [%{NUMBER:db} %{IP:client_ip}:%{NUMBER:client_port}] %{GREEDYDATA:data}` can be used to extract fields from a Redis slow log entry like this:

`1629860738.123456 [0 127.0.0.1:6379] "SET" "key1" "value1"`

### Grok examples
<a name="Grok-Examples"></a>

#### Example 1: Use grok to extract a field from unstructured logs
<a name="Grok-Example1"></a>

Sample log:

```
293750 server-01.internal-network.local OK "[Thread-000] token generated"
```

Transformer used:

```
[
     {
         "grok": {
             "match": "%{NUMBER:version} %{HOSTNAME:hostname} %{NOTSPACE:status} %{QUOTEDSTRING:logMsg}"
         }
    }
]
```

Output:

```
{
  "version": "293750",
  "hostname": "server-01.internal-network.local",
  "status": "OK",
  "logMsg": "[Thread-000] token generated"
}
```

Sample log:

```
23/Nov/2024:10:25:15 -0900 172.16.0.1 200
```

Transformer used:

```
[
    {
        "grok": {
            "match": "%{HTTPDATE:timestamp} %{IPORHOST:clientip} %{NUMBER:response_status}"
        }
    }
]
```

Output:

```
{
  "timestamp": "23/Nov/2024:10:25:15 -0900",
  "clientip": "172.16.0.1",
  "response_status": "200"
}
```

#### Example 2: Use grok in combination with parseJSON to extract fields from a JSON log event
<a name="Grok-Example3"></a>

Sample log:

```
{
    "timestamp": "2024-11-23T16:03:12Z",
    "level": "ERROR",
    "logMsg": "GET /page.html HTTP/1.1"
}
```

Transformer used:

```
[
     {
        "parseJSON": {}
    },
    {
         "grok": {
            "source": "logMsg",
             "match": "%{WORD:http_method} %{NOTSPACE:request} HTTP/%{NUMBER:http_version}"
         }
    }
]
```

Output:

```
{
  "timestamp": "2024-11-23T16:03:12Z",
  "level": "ERROR",
  "logMsg": "GET /page.html HTTP/1.1",
  "http_method": "GET",
  "request": "/page.html",
  "http_version": "1.1"
}
```

#### Example 3: Grok pattern with dotted annotation in FIELD\$1NAME
<a name="Grok-Example4"></a>

Sample log:

```
192.168.1.1 GET /index.html?param=value 200 1234
```

Transformer used:

```
[
    {
        "grok": {
            "match": "%{IP:client.ip} %{WORD:method} %{URIPATHPARAM:request.uri} %{NUMBER:response.status} %{NUMBER:response.bytes}"
        }
    }
]
```

Output:

```
{
  "client": {
    "ip": "192.168.1.1"
  },
  "method": "GET",
  "request": {
    "uri": "/index.html?param=value"
  },
  "response": {
    "status": "200",
    "bytes": "1234"
  }
}
```

### Supported grok patterns
<a name="Grok-Patterns"></a>

The following tables list the patterns that are supported by the `grok` processor.

**General grok patterns**


| Grok Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
| USERNAME or USER | Matches one or more characters that can include lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), dots (.), underscores (\$1), or hyphens (-) | 20 |  Input: `user123.name-TEST` Pattern: `%{USERNAME:name}` Output: `{"name": "user123.name-TEST"}`  | 
| INT | Matches an optional plus or minus sign followed by one or more digits. | 20 |  Input: `-456` Pattern: `%{INT:num}` Output: `{"num": "-456"}`  | 
| BASE10NUM | Matches an integer or a floating-point number with optional sign and decimal point | 20 |  Input: `-0.67` Pattern: `%{BASE10NUM:num}` Output: `{"num": "-0.67"}`  | 
| BASE16NUM | Matches decimal and hexadecimal numbers with an optional sign (\$1 or -) and an optional 0x prefix | 20 |  Input: `+0xA1B2` Pattern: `%{BASE16NUM:num}` Output: `{"num": "+0xA1B2"}`  | 
| POSINT | Matches whole positive integers without leading zeros, consisting of one or more digits (1-9 followed by 0-9) | 20 |  Input: `123` Pattern: `%{POSINT:num}` Output: `{"num": "123"}`  | 
| NONNEGINT | Matches any whole numbers (consisting of one or more digits 0-9) including zero and numbers with leading zeros. | 20 |  Input: `007` Pattern: `%{NONNEGINT:num}` Output: `{"num": "007"}`  | 
| WORD | Matches whole words composed of one or more word characters (\$1w), including letters, digits, and underscores | 20 |  Input: `user_123` Pattern: `%{WORD:user}` Output: `{"user": "user_123"}`  | 
| NOTSPACE | Matches one or more non-whitespace characters. | 5 |  Input: `hello_world123` Pattern: `%{NOTSPACE:msg}` Output: `{"msg": "hello_world123"}`  | 
| SPACE | Matches zero or more whitespace characters. | 5 |  Input: `" "` Pattern: `%{SPACE:extra}` Output: `{"extra": " "}`  | 
| DATA | Matches any character (except newline) zero or more times, non-greedy. | 5 |  Input: `abc def ghi` Pattern: `%{DATA:x} %{DATA:y}` Output: `{"x": "abc", "y": "def ghi"}`  | 
| GREEDYDATA | Matches any character (except newline) zero or more times, greedy. | 5 |  Input: `abc def ghi` Pattern: `%{GREEDYDATA:x} %{GREEDYDATA:y}` Output: `{"x": "abc def", "y": "ghi"}`  | 
| GREEDYDATA\$1MULTILINE | Matches any character (including newline) zero or more times, greedy. | 1 |  Input: `abc` `def` `ghi` Pattern: `%{GREEDYDATA_MULTILINE:data}` Output: `{"data": "abc\ndef\nghi"}`  | 
| QUOTEDSTRING | Matches quoted strings (single or double quotes) with escaped characters. | 20 |  Input: `"Hello, world!"` Pattern: `%{QUOTEDSTRING:msg}` Output: `{"msg": "Hello, world!"}`  | 
| UUID | Matches a standard UUID format: 8 hexadecimal characters, followed by three groups of 4 hexadecimal characters, and ending with 12 hexadecimal characters, all separated by hyphens. | 20 |  Input: `550e8400-e29b-41d4-a716-446655440000` Pattern: `%{UUID:id}` Output: `{"id": "550e8400-e29b-41d4-a716-446655440000"}`  | 
| URN | Matches URN (Uniform Resource Name) syntax. | 20 |  Input: `urn:isbn:0451450523` Pattern: `%{URN:urn}` Output: `{"urn": "urn:isbn:0451450523"}`  | 

**Amazon grok patterns**


| Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
|  ARN  |  Matches Amazon Amazon Resource Names (ARNs), capturing the partition (`aws`, `aws-cn`, or `aws-us-gov`), service, Region, account ID, and up to 5 hierarchical resource identifiers separated by slashes. It will not match ARNs that are missing information between colons.  | 5 |  Input: `arn:aws:iam:us-east-1:123456789012:user/johndoe` Pattern: `%{ARN:arn}` Output: `{"arn": "arn:aws:iam:us-east-1:123456789012:user/johndoe"}`  | 

**Networking grok patterns**


| Grok Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
| CISCOMAC | Matches a MAC address in 4-4-4 hexadecimal format. | 20 |  Input: `0123.4567.89AB` Pattern: `%{CISCOMAC:MacAddress}` Output: `{"MacAddress": "0123.4567.89AB"}`  | 
| WINDOWSMAC | Matches a MAC address in hexadecimal format with hyphens | 20 |  Input: `01-23-45-67-89-AB` Pattern: `%{WINDOWSMAC:MacAddress}` Output: `{"MacAddress": "01-23-45-67-89-AB"}`  | 
| COMMONMAC | Matches a MAC address in hexadecimal format with colons. | 20 |  Input: `01:23:45:67:89:AB` Pattern: `%{COMMONMAC:MacAddress}` Output: `{"MacAddress": "01:23:45:67:89:AB"}`  | 
| MAC | Matches one of CISCOMAC, WINDOWSMAC or COMMONMAC grok patterns | 20 |  Input: `01:23:45:67:89:AB` Pattern: `%{MAC:m1}` Output: `{"m1":"01:23:45:67:89:AB"}`  | 
| IPV6 | Matches IPv6 addresses, including compressed forms and IPv4-mapped IPv6 addresses. | 5 |  Input: `2001:db8:3333:4444:5555:6666:7777:8888` Pattern: `%{IPV6:ip}` Output: `{"ip": "2001:db8:3333:4444:5555:6666:7777:8888"}`  | 
| IPV4 | Matches an IPv4 address. | 20 |  Input: `192.168.0.1` Pattern: `%{IPV4:ip}` Output: `{"ip": "192.168.0.1"}`  | 
| IP | Matches either IPv6 addresses as supported by %\$1IPv6\$1 or IPv4 addresses as supported by %\$1IPv4\$1 | 5 |  Input: `192.168.0.1` Pattern: `%{IP:ip}` Output: `{"ip": "192.168.0.1"}`  | 
| HOSTNAME or HOST | Matches domain names, including subdomains | 5 |  Input: `server-01.internal-network.local` Pattern: `%{HOST:host}` Output: `{"host": "server-01.internal-network.local"}`  | 
| IPORHOST | Matches either a hostname or an IP address | 5 |  Input: `2001:db8:3333:4444:5555:6666:7777:8888` Pattern: `%{IPORHOST:ip}` Output: `{"ip": "2001:db8:3333:4444:5555:6666:7777:8888"}`  | 
| HOSTPORT | Matches an IP address or hostname as supported by %\$1IPORHOST\$1 pattern followed by a colon and a port number, capturing the port as "PORT" in the output. | 5 |  Input: `192.168.1.1:8080` Pattern: `%{HOSTPORT:ip}` Output: `{"ip":"192.168.1.1:8080","PORT":"8080"}`  | 
| URIHOST | Matches an IP address or hostname as supported by %\$1IPORHOST\$1 pattern, optionally followed by a colon and a port number, capturing the port as "port" if present. | 5 |  Input: `example.com:443 10.0.0.1` Pattern: `%{URIHOST:host} %{URIHOST:ip}` Output: `{"host":"example.com:443","port":"443","ip":"10.0.0.1"}`  | 

**Path grok patterns**


| Grok Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
| UNIXPATH | Matches URL paths, potentially including query parameters. | 20 |  Input: `/search?q=regex` Pattern: `%{UNIXPATH:path}` Output: `{"path":"/search?q=regex"}`  | 
| WINPATH | Matches Windows file paths. | 5 |  Input: `C:\Users\John\Documents\file.txt` Pattern: `%{WINPATH:path}` Output: `{"path": "C:\\Users\\John\\Documents\\file.txt"}`  | 
| PATH | Matches either URL or Windows file paths | 5 |  Input: `/search?q=regex` Pattern: `%{PATH:path}` Output: `{"path":"/search?q=regex"}`  | 
| TTY | Matches Unix device paths for terminals and pseudo-terminals. | 20 |  Input: `/dev/tty1` Pattern: `%{TTY:path}` Output: `{"path":"/dev/tty1"}`  | 
| URIPROTO | Matches letters, optionally followed by a plus (\$1) character and additional letters or plus (\$1) characters | 20 |  Input: `web+transformer` Pattern: `%{URIPROTO:protocol}` Output: `{"protocol":"web+transformer"}`  | 
| URIPATH | Matches the path component of a URI | 20 |  Input: `/category/sub-category/product_name` Pattern: `%{URIPATH:path}` Output: `{"path":"/category/sub-category/product_name"}`  | 
| URIPARAM | Matches URL query parameters | 5 |  Input: `?param1=value1&param2=value2` Pattern: `%{URIPARAM:url}` Output: `{"url":"?param1=value1&param2=value2"}`  | 
| URIPATHPARAM | Matches a URI path optionally followed by query parameters | 5 |  Input: `/category/sub-category/product?id=12345&color=red` Pattern: `%{URIPATHPARAM:path}` Output: `{"path":"/category/sub-category/product?id=12345&color=red"}`  | 
| URI | Matches a complete URI | 5 |  Input: `https://user:password@example.com/path/to/resource?param1=value1&param2=value2` Pattern: `%{URI:uri}` Output: `{"path":"https://user:password@example.com/path/to/resource?param1=value1&param2=value2"}`  | 

**Date and time grok patterns**


| Grok Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
| MONTH | Matches full or abbreviated english month names as whole words | 20 |  Input: `Jan` Pattern: `%{MONTH:month}` Output: `{"month":"Jan"}` Input: `January` Pattern: `%{MONTH:month}` Output: `{"month":"January"}`  | 
| MONTHNUM | Matches month numbers from 1 to 12, with optional leading zero for single-digit months. | 20 |  Input: `5` Pattern: `%{MONTHNUM:month}` Output: `{"month":"5"}` Input: `05` Pattern: `%{MONTHNUM:month}` Output: `{"month":"05"}`  | 
| MONTHNUM2 | Matches two-digit month numbers from 01 to 12. | 20 |  Input: `05` Pattern: `%{MONTHNUM2:month}` Output: `{"month":"05"}`  | 
| MONTHDAY | Matches day of the month from 1 to 31, with optional leading zero. | 20 |  Input: `31` Pattern: `%{MONTHDAY:monthDay}` Output: `{"monthDay":"31"}`  | 
| YEAR | Matches year in two or four digits | 20 |  Input: `2024` Pattern: `%{YEAR:year}` Output: `{"year":"2024"}` Input: `24` Pattern: `%{YEAR:year}` Output: `{"year":"24"}`  | 
| DAY | Matches full or abbreviated day names. | 20 |  Input: `Tuesday` Pattern: `%{DAY:day}` Output: `{"day":"Tuesday"}`  | 
| HOUR | Matches hour in 24-hour format with an optional leading zero (0)0-23. | 20 |  Input: `22` Pattern: `%{HOUR:hour}` Output: `{"hour":"22"}`  | 
| MINUTE | Matches minutes (00-59). | 20 |  Input: `59` Pattern: `%{MINUTE:min}` Output: `{"min":"59"}`  | 
| SECOND | Matches a number representing seconds (0)0-60, optionally followed by a decimal point or colon and one or more digits for fractional minutes | 20 |  Input: `3` Pattern: `%{SECOND:second}` Output: `{"second":"3"}` Input: `30.5` Pattern: `%{SECOND:minSec}` Output: `{"minSec":"30.5"}` Input: `30:5` Pattern: `%{SECOND:minSec}` Output: `{"minSec":"30:5"}`  | 
| TIME | Matches a time format with hours, minutes, and seconds in the format (H)H:mm:(s)s. Seconds include leap second (0)0-60. | 20 |  Input: `09:45:32` Pattern: `%{TIME:time}` Output: `{"time":"09:45:32"}`  | 
| DATE\$1US | Matches a date in the format of (M)M/(d)d/(yy)yy or (M)M-(d)d-(yy)yy. | 20 |  Input: `11/23/2024` Pattern: `%{DATE_US:date}` Output: `{"date":"11/23/2024"}` Input: `1-01-24` Pattern: `%{DATE_US:date}` Output: `{"date":"1-01-24"}`  | 
| DATE\$1EU | Matches date in format of (d)d/(M)M/(yy)yy, (d)d-(M)M-(yy)yy, or (d)d.(M)M.(yy)yy. | 20 |  Input: `23/11/2024` Pattern: `%{DATE_EU:date}` Output: `{"date":"23/11/2024"}` Input: `1.01.24` Pattern: `%{DATE_EU:date}` Output: `{"date":"1.01.24"}`  | 
| ISO8601\$1TIMEZONE | Matches UTC offset 'Z' or time zone offset with optional colon in format [\$1-](H)H(:)mm. | 20 |  Input: `+05:30` Pattern: `%{ISO8601_TIMEZONE:tz}` Output: `{"tz":"+05:30"}` Input: `-530` Pattern: `%{ISO8601_TIMEZONE:tz}` Output: `{"tz":"-530"}` Input: `Z` Pattern: `%{ISO8601_TIMEZONE:tz}` Output: `{"tz":"Z"}`  | 
| ISO8601\$1SECOND | Matches a number representing seconds (0)0-60, optionally followed by a decimal point or colon and one or more digits for fractional seconds | 20 |  Input: `60` Pattern: `%{ISO8601_SECOND:second}` Output: `{"second":"60"}`  | 
| TIMESTAMP\$1ISO8601 | Matches ISO8601 datetime format (yy)yy-(M)M-(d)dT(H)H:mm:((s)s)(Z\$1[\$1-](H)H:mm) with optional seconds and timezone. | 20 |  Input: `2023-05-15T14:30:00+05:30` Pattern: `%{TIMESTAMP_ISO8601:timestamp}` Output: `{"timestamp":"2023-05-15T14:30:00+05:30"}` Input: `23-5-1T1:25+5:30` Pattern: `%{TIMESTAMP_ISO8601:timestamp}` Output: `{"timestamp":"23-5-1T1:25+5:30"}` Input: `23-5-1T1:25Z` Pattern: `%{TIMESTAMP_ISO8601:timestamp}` Output: `{"timestamp":"23-5-1T1:25Z"}`  | 
| DATE | Matches either a date in the US format using %\$1DATE\$1US\$1 or in the EU format using %\$1DATE\$1EU\$1 | 20 |  Input: `11/29/2024` Pattern: `%{DATE:date}` Output: `{"date":"11/29/2024"}` Input: `29.11.2024` Pattern: `%{DATE:date}` Output: `{"date":"29.11.2024"}`  | 
| DATESTAMP | Matches %\$1DATE\$1 followed by %\$1TIME\$1 pattern, separated by space or hyphen. | 20 |  Input: `29-11-2024 14:30:00` Pattern: `%{DATESTAMP:dateTime}` Output: `{"dateTime":"29-11-2024 14:30:00"}`  | 
| TZ | Matches common time zone abbreviations (PST, PDT, MST, MDT, CST CDT, EST, EDT, UTC). | 20 |  Input: `PDT` Pattern: `%{TZ:tz}` Output: `{"tz":"PDT"}`  | 
| DATESTAMP\$1RFC822 | Matches date and time in format: Day MonthName (D)D (YY)YY (H)H:mm:(s)s Timezone | 20 |  Input: `Monday Jan 5 23 1:30:00 CDT` Pattern: `%{DATESTAMP_RFC822:dateTime}` Output: `{"dateTime":"Monday Jan 5 23 1:30:00 CDT"}` Input: `Mon January 15 2023 14:30:00 PST` Pattern: `%{DATESTAMP_RFC822:dateTime}` Output: `{"dateTime":"Mon January 15 2023 14:30:00 PST"}`  | 
| DATESTAMP\$1RFC2822 | Matches RFC2822 date-time format: Day, (d)d MonthName (yy)yy (H)H:mm:(s)s Z\$1[\$1-](H)H:mm | 20 |  Input: `Mon, 15 May 2023 14:30:00 +0530` Pattern: `%{DATESTAMP_RFC2822:dateTime}` Output: `{"dateTime":"Mon, 15 May 2023 14:30:00 +0530"}` Input: `Monday, 15 Jan 23 14:30:00 Z` Pattern: `%{DATESTAMP_RFC2822:dateTime}` Output: `{"dateTime":"Monday, 15 Jan 23 14:30:00 Z"}`  | 
| DATESTAMP\$1OTHER | Matches date and time in format: Day MonthName (d)d (H)H:mm:(s)s Timezone (yy)yy | 20 |  Input: `Mon May 15 14:30:00 PST 2023` Pattern: `%{DATESTAMP_OTHER:dateTime}` Output: `{"dateTime":"Mon May 15 14:30:00 PST 2023"}`  | 
| DATESTAMP\$1EVENTLOG | Matches compact datetime format without separators: (yy)yyMM(d)d(H)Hmm(s)s | 20 |  Input: `20230515143000` Pattern: `%{DATESTAMP_EVENTLOG:dateTime}` Output: `{"dateTime":"20230515143000"}`  | 

**Log grok patterns**


| Grok Pattern | Description | Maximum pattern limit | Example | 
| --- | --- | --- | --- | 
| LOGLEVEL | Matches standard log levels in different capitalizations and abbreviations, including the following: Alert/ALERT, Trace/TRACE, Debug/DEBUG, Notice/NOTICE, Info/INFO, Warn/Warning/WARN/WARNING, Err/Error/ERR/ERROR, Crit/Critical/CRIT/CRITICAL, Fatal/FATAL, Severe/SEVERE, Emerg/Emergency/EMERG/EMERGENCY | 20 |  Input: `INFO` Pattern: `%{LOGLEVEL:logLevel}` Output: `{"logLevel":"INFO"}`  | 
| HTTPDATE | Matches date and time format often used in log files. Format: (d)d/MonthName/(yy)yy:(H)H:mm:(s)s Timezone MonthName: Matches full or abbreviated english month names (Example: "Jan" or "January") Timezone: Matches %\$1INT\$1 grok pattern | 20 |  Input: `23/Nov/2024:14:30:00 +0640` Pattern: `%{HTTPDATE:date}` Output: `{"date":"23/Nov/2024:14:30:00 +0640"}`  | 
| SYSLOGTIMESTAMP | Matches date format with MonthName (d)d (H)H:mm:(s)s MonthName: Matches full or abbreviated english month names (Example: "Jan" or "January") | 20 |  Input: `Nov 29 14:30:00` Pattern: `%{SYSLOGTIMESTAMP:dateTime}` Output: `{"dateTime":"Nov 29 14:30:00"}`  | 
| PROG | Matches a program name consisting of string of letters, digits, dot, underscore, forward slash, percent sign, and hyphen characters. | 20 |  Input: `user.profile/settings-page` Pattern: `%{PROG:program}` Output: `{"program":"user.profile/settings-page"}`  | 
| SYSLOGPROG | Matches PROG grok pattern optionally followed by a process ID in square brackets. | 20 |  Input: `user.profile/settings-page[1234]` Pattern: `%{SYSLOGPROG:programWithId}` Output: `{"programWithId":"user.profile/settings-page[1234]","program":"user.profile/settings-page","pid":"1234"}`  | 
| SYSLOGHOST | Matches either a %\$1HOST\$1 or %\$1IP\$1 pattern | 5 |  Input: `2001:db8:3333:4444:5555:6666:7777:8888` Pattern: `%{SYSLOGHOST:ip}` Output: `{"ip": "2001:db8:3333:4444:5555:6666:7777:8888"}`  | 
| SYSLOGFACILITY | Matches syslog priority in decimal format. The value should be enclosed in angular brackets (<>). | 20 |  Input: `<13.6>` Pattern: `%{SYSLOGFACILITY:syslog}` Output: `{"syslog":"<13.6>","facility":"13","priority":"6"}`  | 

**Common log grok patterns**

You can use pre-defined custom grok patterns to match Apache, NGINX and Syslog Protocol (RFC 5424) log formats. When you use these specific patterns, they must be the first patterns in your matching configuration, and no other patterns can precede them. Also, you can follow them only with exactly one **DATA**. **GREEDYDATA** or **GREEDYDATA\$1MULTILINE** pattern. 


| Grok pattern | Description | Maximum pattern limit | 
| --- | --- | --- | 
|  APACHE\$1ACCESS\$1LOG | Matches Apache access logs | 1 | 
|  NGINX\$1ACCESS\$1LOG | Matches NGINX access logs | 1 | 
|  SYSLOG5424 | Matches Syslog Protocol (RFC 5424) logs | 1 | 

The following shows valid and invalid examples for using these common log format patterns.

```
"%{NGINX_ACCESS_LOG} %{DATA}" // Valid
"%{SYSLOG5424}%{DATA:logMsg}" // Valid
"%{APACHE_ACCESS_LOG} %{GREEDYDATA:logMsg}" // Valid
"%{APACHE_ACCESS_LOG} %{SYSLOG5424}" // Invalid (multiple common log patterns used)
"%{NGINX_ACCESS_LOG} %{NUMBER:num}" // Invalid (Only GREEDYDATA and DATA patterns are supported with common log patterns)
"%{GREEDYDATA:logMsg} %{SYSLOG5424}" // Invalid (GREEDYDATA and DATA patterns are supported only after common log patterns)
```

#### Common log format examples
<a name="Common-Log-Examples"></a>

##### Apache log example
<a name="Apache-Log-Example"></a>

Sample log:

```
127.0.0.1 - - [03/Aug/2023:12:34:56 +0000] "GET /page.html HTTP/1.1" 200 1234
```

Transformer:

```
[
     {
        "grok": {
            "match": "%{APACHE_ACCESS_LOG}"
        }
    }
]
```

Output:

```
{
    "request": "/page.html",
    "http_method": "GET",
    "status_code": 200,
    "http_version": "1.1",
    "response_size": 1234,
    "remote_host": "127.0.0.1",
    "timestamp": "2023-08-03T12:34:56Z"
}
```

##### NGINX log example
<a name="NGINX-Log-Example"></a>

Sample log:

```
192.168.1.100 - Foo [03/Aug/2023:12:34:56 +0000] "GET /account/login.html HTTP/1.1" 200 42 "https://www.amazon.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
```

Transformer:

```
[
     {
        "grok": {
            "match": "%{NGINX_ACCESS_LOG}"
        }
    }
]
```

Output:

```
{
    "request": "/account/login.html",
    "referrer": "https://www.amazon.com/",
    "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36",
    "http_method": "GET",
    "status_code": 200,
    "auth_user": "Foo",
    "http_version": "1.1",
    "response_size": 42,
    "remote_host": "192.168.1.100",
    "timestamp": "2023-08-03T12:34:56Z"
}
```

##### Syslog Protocol (RFC 5424) log example
<a name="syslog5424-Log-Example"></a>

Sample log:

```
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource= "Application" eventID="1011"][examplePriority@32473 class="high"]
```

Transformer:

```
[
     {
        "grok": {
            "match": "%{SYSLOG5424}"
        }
    }
]
```

Output:

```
{
  "pri": 165,
  "version": 1,
  "timestamp": "2003-10-11T22:14:15.003Z",
  "hostname": "mymachine.example.com",
  "app": "evntslog",
  "msg_id": "ID47",
  "structured_data": "exampleSDID@32473 iut=\"3\" eventSource= \"Application\" eventID=\"1011\"",
  "message": "[examplePriority@32473 class=\"high\"]"
}
```

## csv
<a name="CloudWatch-Logs-Transformation-csv"></a>

The **csv** processor parses comma-separated values (CSV) from the log events into columns.


| Field | Description | Required? | Default | Limits | 
| --- | --- | --- | --- | --- | 
|  source | Path to the field in the log event that will be parsed |  No | `@message`  | Maximum length: 128 Maximum nested key depth: 3 | 
|  delimiter | The character used to separate each column in the original comma-separated value log event |  No | `,`  | Maximum length: 1 unless the value is `\t` or `\s`  | 
|  quoteCharacter | Character used as a text qualifier for a single column of data |  No | `"`  | Maximum length: 1  | 
|  columns | List of names to use for the columns in the transformed log event. |  No | `[column_1, column_2 ...]`  | Maximum CSV columns: 100 Maximum length: 128 Maximum nested key depth: 3  | 
|  destination | The parent field to put transformed key value pairs under |  No | `Root node`  | Maximum length: 128 Maximum nested key depth: 3  | 

Setting `delimiter` to `\t` will separate each column on a tab character, and `\t` will separate each column on a single space character.

**Example**

Suppose part of an ingested log event looks like this:

```
'Akua Mansa':28:'New York: USA'
```

Suppose we use only the **csv** processor: 

```
[
     {
        "csv": {
            "delimiter": ":",
            "quoteCharacter": "'"
        }
    }
]
```

The transformed log event would be the following.

```
{
  "column_1": "Akua Mansa",
  "column_2": "28",
  "column_3": "New York: USA"
}
```

**Example 2**

Suppose an ingested log event looks like this:

```
{
    "timestamp": "2024-11-23T16:03:12Z",
    "type": "user_data",
    "logMsg": "'Akua Mansa':28:'New York: USA'"
}
```

Suppose we parse the event as JSON, they parse a JSON field with the **csv** processor, specifying column names and destination: 

```
[
    {
        "parseJSON": {}
    },
    {
        "csv": {
            "source": "logMsg",
            "delimiter": ":",
            "quoteCharacter": "'",
            "columns":["name","age","location"],
            "destination": "msg"
        }
    }
]
```

The transformed log event would be the following.

```
{
    "timestamp": "2024-11-23T16:03:12Z",
    "logMsg": "'Akua Mansa':28:'New York: USA'",
    "type": "user_data",
    "msg": {
        "name": "Akua Mansa",
        "age": "28",
        "location": "New York: USA"
    }
}
```

## parseKeyValue
<a name="CloudWatch-Logs-Transformation-parseKeyValue"></a>

Use the **parseKeyValue** processor to parse a specified field into key-value pairs. You can customize the processor to parse field information with the following options. 


| Field | Description | Required? | Default | Limits | 
| --- | --- | --- | --- | --- | 
|  source | Path to the field in the log event that will be parsed |  No | `@message`  | Maximum length: 128 Maximum nested key depth: 3 | 
|  destination | The destination field to put the extracted key-value pairs into |  No |   | Maximum length: 128  | 
|  fieldDelimiter | The field delimiter string that is used between key-value pairs in the original log events |  No | `&`  | Maximum length: 128  | 
|  keyValueDelimiter | The delimiter string to use between the key and value in each pair in the transformed log event |  No | `=`  | Maximum length: 128  | 
|  nonMatchValue | A value to insert into the value field in the result, when a key-value pair is not successfully split. |  No |   | Maximum length: 128  | 
|  keyPrefix | If you want to add a prefix toall transformed keys, specify it here. |  No |   | Maximum length: 128  | 
|  overwriteIfExists | Whether to overwrite the value if the destination key already exists |  No | `false`  |   | 

**Example**

Take the following example log event:

```
key1:value1!key2:value2!key3:value3!key4
```

Suppose we use the following processor configuration: 

```
[
    {
        "parseKeyValue": {
            "destination": "new_key",
            "fieldDelimiter": "!",
            "keyValueDelimiter": ":",
            "nonMatchValue": "defaultValue",
            "keyPrefix": "parsed_"
        }
    }
]
```

The transformed log event would be the following.

```
{
  "new_key": {
    "parsed_key1": "value1",
    "parsed_key2": "value2",
    "parsed_key3": "value3",
    "parsed_key4": "defaultValue"
  }
}
```