File naming conventions for exports to Amazon S3 for Amazon RDS
Exported data for specific tables is stored in the format
, where the base prefix is
the following:base_prefix
/files
export_identifier
/database_name
/schema_name
.table_name
/
For example:
export-1234567890123-459/rdststdb/rdststdb.DataInsert_7ADB5D19965123A2/
There are two conventions for how files are named.
-
Current convention:
batch_index
/part-partition_index
-random_uuid
.format-based_extension
The batch index is a sequence number that represents a batch of data read from the table. If we can't partition your table into small chunks to be exported in parallel, there will be multiple batch indexes. The same thing happens if your table is partitioned into multiple tables. There will be multiple batch indexes, one for each of the table partitions of your main table.
If we can partition your table into small chunks to be read in parallel, there will be only the batch index
1
folder.Inside the batch index folder, there are one or more Parquet files that contain your table's data. The prefix of the Parquet filename is
part-
. If your table is partitioned, there will be multiple files starting with the partition indexpartition_index
00000
.There can be gaps in the partition index sequence. This happens because each partition is obtained from a ranged query in your table. If there is no data in the range of that partition, then that sequence number is skipped.
For example, suppose that the
id
column is the table's primary key, and its minimum and maximum values are100
and1000
. When we try to export this table with nine partitions, we read it with parallel queries such as the following:SELECT * FROM table WHERE id <= 100 AND id < 200 SELECT * FROM table WHERE id <= 200 AND id < 300
This should generate nine files, from
part-00000-
torandom_uuid
.gz.parquetpart-00008-
. However, if there are no rows with IDs betweenrandom_uuid
.gz.parquet200
and350
, one of the completed partitions is empty, and no file is created for it. In the previous example,part-00001-
isn't created.random_uuid
.gz.parquet -
Older convention:
part-
partition_index
-random_uuid
.format-based_extension
This is the same as the current convention, but without the
prefix, for example:batch_index
part-00000-c5a881bb-58ff-4ee6-1111-b41ecff340a3-c000.gz.parquet part-00001-d7a881cc-88cc-5ab7-2222-c41ecab340a4-c000.gz.parquet part-00002-f5a991ab-59aa-7fa6-3333-d41eccd340a7-c000.gz.parquet
The file naming convention is subject to change. Therefore, when reading target tables, we recommend that you read everything inside the base prefix for the table.