Loading a shapefile into Amazon Redshift
You can use the COPY command to ingest Esri shapefiles stored in Amazon S3 into Amazon Redshift tables.
A shapefile
stores the geometric location and attribute information of geographic features in a vector format.
The shapefile format can spatially describe spatial objects such as points, lines, and
polygons. For more information about a shapefile, see Shapefile
The COPY command supports the data format parameter SHAPEFILE
.
By default, the first column of the shapefile is either a GEOMETRY
or IDENTITY
column.
All subsequent columns follow the order specified in the shapefile.
However, the target table doesn't need to be in this exact layout because you can use COPY column mapping to define the order.
For information about the COPY command shapefile support, see
SHAPEFILE.
In some cases, the resulting geometry size might be greater than the maximum for storing
a geometry in Amazon Redshift. If so, you can use the COPY option SIMPLIFY
or
SIMPLIFY AUTO
to simplify the geometries during ingestion as
follows:
Specify
SIMPLIFY tolerance
to simplify all geometries during ingestion using the Ramer-Douglas-Peucker algorithm and the given tolerance.Specify
SIMPLIFY AUTO
without tolerance to simplify only geometries that are larger than the maximum size using the Ramer-Douglas-Peucker algorithm. This approach calculates the minimum tolerance that is large enough to store the object within the maximum size limit.Specify
SIMPLIFY AUTO max_tolerance
to simplify only geometries that are larger than the maximum size using the Ramer-Douglas-Peucker algorithm and the automatically calculated tolerance. This approach makes sure that the tolerance doesn't exceed the maximum tolerance.
For information about the maximum size of a GEOMETRY
data value, see
Considerations when using spatial data with Amazon Redshift.
In some cases, the tolerance is low enough that the record can't shrink below the
maximum size of a GEOMETRY
data value. In these cases, you can use the
MAXERROR
option of the COPY command to ignore all or up to a certain number
of ingestion errors.
The COPY command also supports loading GZIP shapefiles. To do this, specify the COPY GZIP parameter. With this option, all shapefile components must be independently compressed and share the same compression suffix.
If a projection description file (.prj) exists with the shapefile, Redshift
uses it to determine the spatial reference system id (SRID). If the SRID is valid, the resulting
geometry has this SRID assigned. If the SRID value associated
with the input geometry does not exist, the resulting geometry has the SRID value zero. You can disable automatic detection of
the spatial reference system id at the session level by using SET read_srid_on_shapefile_ingestion
to OFF
.
Query the SYS_SPATIAL_SIMPLIFY
or SVL_SPATIAL_SIMPLIFY
system views to view which records have been
simplified, along with the calculated tolerance. When you specify SIMPLIFY
tolerance
, this view contains a record for each COPY
operation. Otherwise, it contains a record for each simplified geometry. For more
information, see
SYS_SPATIAL_SIMPLIFY or
SVL_SPATIAL_SIMPLIFY.
For examples of loading a shapefile, see Loading a shapefile into Amazon Redshift.