Firehose supports database as a source in all Amazon Web Services Regions
Set up the Firehose stream
To create a Firehose stream with databases as your source, you need to configure the following.
Configure source and destination
To source data from your database, choose the source for your stream. Firehose supports MySQL and PostgreSQL databases as database sources. Next, choose Apache Iceberg Tables as the destination and provide a Firehose stream name.
Configure database connectivity
For Firehose to connect to a database instance, it needs database endpoint, VPC service endpoint, a port, and a valid database user with the right credentials.
-
Database endpoint – Database endpoint of the primary server of your database cluster. For example, the endpoint would either be self-managed
xyz.amazonaws.com
or RDS databasesmydb.123456789012.us-east-1.rds.amazonaws.com
. -
VPC Service Endpoint name – Firehose supports private connectivity to databases. You must provide VPC endpoint service name. For example, the service endpoint can be
com.amazonaws.vpce.us-east-1.vpce-svc-XXXXXXXXXXXXXX
. -
Port – For port, you must configure 3306 for MySQL databases and 5432 for PostgreSQL databases.
-
SSL Mode – You can choose to either enable or disable the SSL mode. If enabled, Firehose uses
verify_identity
for MySQL andverify-full
SSL mode for PostgreSQL. The certificate must be signed by a trusted CA. For more information, see Using SSL/TLS to encrypt a connection to a DB instance or cluster. Note that for RDS PostgreSQL and Aurora PostgreSQL, theforce_ssl
parameter is set to 1, so you must either specify SSL Mode as enabled in Firehose configuration or you changeforce_ssl
parameter to 0 in database parameter group. -
Authenticate with Amazon Secrets Manager – Select a secret from Amazon Secrets Manager that contains the credentials for connecting to databases. If you do not have an existing secret, create one in Amazon Secrets Manager. For more information, refer to Authenticate with Amazon Secrets Manager in Amazon Data Firehose.
Configure data capture
If you want Firehose to capture changes from specific databases, tables, and columns, then you can configure them as a part of Firehose stream creation. You can specify required databases, tables, and columns by either providing regular expressions to include or exclude them or by explicitly providing comma separated specific database, table, and column names.
Note
Since in PostgreSQL, the schema within each database contains database objects such as tables and views, the fully qualified name or the regular expression must take schemas into consideration.
For MySQL, the fully qualified name is
<Sampledatabase>.<SampleTable>
and for
PostgreSQL the fully qualified name is
<SampleSchema>.<SampleTable>
.
Here are some examples of each type.
Databases
Example of sample regular expression (for including databases): .* Example of explicit naming of tables:
<SampleDatabase>
Tables
Example of sample regular expression for excluding tables:
<SampleDatabase>
.* Example of explicit naming of tables for MySQL :<SampleDatabase>
.<SampleTable1>
Example of explicit naming of tables for PostgreSQL :<SampleSchema>
.<SampleTable>
Columns
Example of sample regular expression (for excluding columns):
<SampleDatabase>
.*.* Example of explicit naming of columns for MySQL :<SampleDatabase>
.<SampleTable>
.<SampleColumn>
Example of explicit naming of columns for PostgreSQL :<SampleSchema>
.<SampleTable>
.<SampleColumn>
Configure surrogate keys
Firehose requires unique keys for configured tables to take the initial copy of data. If you have tables without a primary key in your databases, then you must provide surrogate key for such tables. If all your tables have primary keys, then you need not configure this section. If Firehose finds missing surrogate keys for tables without primary keys, then its snapshot (initial copy) process fails. In such scenarios, Firehose throws an error to CloudWatch Logs. For surrogate keys, you must explicitly configure keys with fully qualified name as shown in the following example.
For MySQL
SampleDatabase.SampleTable:SampleColumn
For PostgreSQL
SampleSchema.SampleTable:SampleColumn
Provide snapshot watermark table
Firehose uses watermark mechanism during incremental snapshot of tables. You must provide this snapshot watermark table that you created as part of the prerequisite. Input the snapshot watermark table in the format as show in the following example.
For MySQL: DatabaseName.TableName For PostgreSQL: SchemaName.TableName
Note
Don't delete the watermark table nor manually insert or delete records from the watermark table. Also, you mustn't revoke the permission for database user created for Firehose to insert or delete records from the watermark table.
Next Step: Configure destination settings