CREATE DATABASE - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

CREATE DATABASE

Creates a new database.

To create a database, you must be a superuser or have the CREATEDB privilege. To create a database associated with a zero-ETL integration, you must be a superuser or have both CREATEDB and CREATEUSER privileges.

You can't run CREATE DATABASE within a transaction block (BEGIN ... END). For more information about transactions, see Serializable isolation.

Syntax

CREATE DATABASE database_name [ { [ WITH ] [ OWNER [=] db_owner ] [ CONNECTION LIMIT { limit | UNLIMITED } ] [ COLLATE { CASE_SENSITIVE | CASE_INSENSITIVE } ] [ ISOLATION LEVEL { SERIALIZABLE | SNAPSHOT } ] } | { [ WITH PERMISSIONS ] FROM DATASHARE datashare_name ] OF [ ACCOUNT account_id ] NAMESPACE namespace_guid } | { FROM { { ARN '<arn>' } { WITH DATA CATALOG SCHEMA '<schema>' | WITH NO DATA CATALOG SCHEMA } } | { INTEGRATION '<integration_id>' [ DATABASE '<source_database>' ] [SET {REFRESH_INTERVAL <interval>} ] } } | { IAM_ROLE {default | 'SESSION' | 'arn:aws:iam::<account-id>:role/<role-name>' } } ]

Parameters

database_name

Name of the new database. For more information about valid names, see Names and identifiers.

WITH

Optional keyword.

OWNER

Specifies a database owner.

=

Optional character.

db_owner

Username for the database owner.

CONNECTION LIMIT { limit | UNLIMITED }

The maximum number of database connections users are permitted to have open concurrently. The limit isn't enforced for superusers. Use the UNLIMITED keyword to permit the maximum number of concurrent connections. A limit on the number of connections for each user might also apply. For more information, see CREATE USER. The default is UNLIMITED. To view current connections, query the STV_SESSIONS system view.

Note

If both user and database connection limits apply, an unused connection slot must be available that is within both limits when a user attempts to connect.

COLLATE { CASE_SENSITIVE | CASE_INSENSITIVE }

A clause that specifies whether string search or comparison is CASE_SENSITIVE or CASE_INSENSITIVE. The default is CASE_SENSITIVE.

ISOLATION LEVEL { SERIALIZABLE | SNAPSHOT }

A clause that specifies the isolation level used when queries run against a database.

  • SERIALIZABLE isolation – Provides full serializability for concurrent transactions. For more information, see Serializable isolation.

  • SNAPSHOT isolation – Provides an isolation level with protection against update and delete conflicts. This is the default for a database created in a provisioned cluster or serverless namespace.

You can view which concurrency model your database is running as follows:

  • Query the STV_DB_ISOLATION_LEVEL catalog view. For more information, see STV_DB_ISOLATION_LEVEL.

    SELECT * FROM stv_db_isolation_level;
  • Query the PG_DATABASE_INFO view.

    SELECT datname, datconfig FROM pg_database_info;

    The isolation level per database appears next to the key concurrency_model. A value of 1 denotes SNAPSHOT. A value of 2 denotes SERIALIZABLE.

In Amazon Redshift databases, both SERIALIZABLE and SNAPSHOT isolation are types of serializable isolation levels. That is, dirty reads, non-repeatable reads, and phantom reads are prevented according to the SQL standard. Both isolation levels guarantee that a transaction operates on a snapshot of data as it exists when the transaction begins, and that no other transaction can change that snapshot. However, SNAPSHOT isolation doesn't provide full serializability, because it doesn't prevent write skew inserts and updates on different table rows.

The following scenario illustrates write skew updates using the SNAPSHOT isolation level. A table named Numbers contains a column named digits that contains 0 and 1 values. Each user's UPDATE statement doesn't overlap the other user. However, the 0 and 1 values are swapped. The SQL they run follows this timeline with these results:

Time User 1 action User 2 action
1 BEGIN;
2 BEGIN;
3 SELECT * FROM Numbers;
digits
------
0
1
4 SELECT * FROM Numbers;
digits
------
0
1
5 UPDATE Numbers SET digits=0 WHERE digits=1;
6 SELECT * FROM Numbers;
digits
------
0
0
7 COMMIT;
8 Update Numbers SET digits=1 WHERE digits=0;
9 SELECT * FROM Numbers;
digits
------
1
1
10 COMMIT;
11 SELECT * FROM Numbers;
digits
------
1
0
12 SELECT * FROM Numbers;
digits
------
1
0

If the same scenario is run using serializable isolation, then Amazon Redshift terminates user 2 due to a serializable violation and returns error 1023. For more information, see How to fix serializable isolation errors. In this case, only user 1 can commit successfully. Not all workloads require serializable isolation as a requirement, in which case snapshot isolation suffices as the target isolation level for your database.

FROM ARN '<ARN>'

The Amazon Glue database ARN to use to create the database.

{ DATA CATALOG SCHEMA '<schema>' | WITH NO DATA CATALOG SCHEMA }
Note

This parameter is only applicable if your CREATE DATABASE command also uses the FROM ARN parameter.

Specifies whether to create the database using a schema to help access objects in the Amazon Glue Data Catalog.

FROM INTEGRATION '<integration_id>' [ DATABASE '<source_database>' ] [SET {REFRESH_INTERVAL <interval>} ]

Specifies whether to create the database using a zero-ETL integration identifier. You can retrieve the integration_id from SVV_INTEGRATION system view. For Aurora PostgreSQL zero-ETL integrations, you also need to specify source_database name, which can also be retrieved from SVV_INTEGRATION. The SET REFRESH_INTERVAL clause sets the approximate time interval, in seconds, to refresh data from the zero-ETL source to the target database. The value can be set 0–432,000 seconds (5 days) for zero-ETL integrations whose source type is Aurora MySQL, Aurora PostgreSQL, or RDS for MySQL. For Amazon DynamoDB zero-ETL integrations, the value can be set 900–432,000 seconds (15 minutes –5 days). The default interval is zero (0) seconds for zero-ETL integrations whose source type is Aurora MySQL, Aurora PostgreSQL, or RDS for MySQL. For Amazon DynamoDB zero-ETL integrations, the default interval is 900 seconds (15 minutes).

For an example, see Create databases to receive results of zero-ETL integrations. For more information about creating databases with zero-ETL integrations, see Creating destination databases in Amazon Redshift in the Amazon Redshift Management Guide.

IAM_ROLE { default | 'SESSION' | 'arn:aws:iam::<Amazon Web Services account-id>:role/<role-name>' }
Note

This parameter is only applicable if your CREATE DATABASE command also uses the FROM ARN parameter.

If you specify an IAM role that is associated with the cluster when running the CREATE DATABASE command, Amazon Redshift will use the role’s credentials when you run queries on the database.

Specifying the default keyword means to use the IAM role that's set as the default and associated with the cluster.

Use 'SESSION' if you connect to your Amazon Redshift cluster using a federated identity and access the tables from the external schema created using this command. For an example of using a federated identity, see Using a federated identity to manage Amazon Redshift access to local resources and Amazon Redshift Spectrum external tables, which explains how to configure federated identity.

Use the Amazon Resource Name (ARN) for an IAM role that your cluster uses for authentication and authorization. As a minimum, the IAM role must have permission to perform a LIST operation on the Amazon S3 bucket to be accessed and a GET operation on the Amazon S3 objects the bucket contains. To learn more about using IAM_ROLE when creating a database using Amazon Glue Data Catalog for datashares, see Working with Lake Formation-managed datashares as a consumer.

The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'

You can chain roles so that your cluster can assume another IAM role, possibly belonging to another account. You can chain up to 10 roles. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

To this IAM role, attach an IAM permissions policy similar to the following.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AccessSecret", "Effect": "Allow", "Action": [ "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecretVersionIds" ], "Resource": "arn:aws:secretsmanager:us-west-2:123456789012:secret:my-rds-secret-VNenFy" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "secretsmanager:GetRandomPassword", "secretsmanager:ListSecrets" ], "Resource": "*" } ] }

For the steps to create an IAM role to use with federated query, see Creating a secret and an IAM role to use federated queries.

Note

Don't include spaces in the list of chained roles.

The following shows the syntax for chaining three roles.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-1-name>,arn:aws:iam::<aws-account-id>:role/<role-2-name>,arn:aws:iam::<aws-account-id>:role/<role-3-name>'

Syntax for using CREATE DATABASE with a datashare

The following syntax describes the CREATE DATABASE command used to create databases from a datashare for sharing data within the same Amazon account.

CREATE DATABASE database_name [ [ WITH PERMISSIONS ] FROM DATASHARE datashare_name ] OF [ ACCOUNT account_id ] NAMESPACE namespace_guid

The following syntax describes the CREATE DATABASE command used to create databases from a datashare for sharing data across Amazon accounts.

CREATE DATABASE database_name [ [ WITH PERMISSIONS ] FROM DATASHARE datashare_name ] OF ACCOUNT account_id NAMESPACE namespace_guid

Parameters for using CREATE DATABASE with a datashare

FROM DATASHARE

A keyword that indicates where the datashare is located.

datashare_name

The name of the datashare that the consumer database is created on.

WITH PERMISSIONS

Specifies that the database created from the datashare requires object-level permissions to access individual database objects. Without this clause, users or roles granted the USAGE permission on the database will automatically have access to all database objects in the database.

NAMESPACE namespace_guid

A value that specifies the producer namespace that the datashare belongs to.

ACCOUNT account_id

A value that specifies the producer account that the datashare belongs to.

Usage notes for CREATE DATABASE for data sharing

As a database superuser, when you use CREATE DATABASE to create databases from datashares within the Amazon account, specify the NAMESPACE option. The ACCOUNT option is optional. When you use CREATE DATABASE to create databases from datashares across Amazon accounts, specify both the ACCOUNT and NAMESPACE from the producer.

You can create only one consumer database for one datashare on a consumer cluster. You can't create multiple consumer databases referring to the same datashare.

CREATE DATABASE from Amazon Glue Data Catalog

To create a database using an Amazon Glue database ARN, specify the ARN in your CREATE DATABASE command.

CREATE DATABASE sampledb FROM ARN <glue-database-arn> WITH NO DATA CATALOG SCHEMA;

Optionally, you can also supply a value into the IAM_ROLE parameter. For more information about the parameter and accepted values, see Parameters.

The following are examples that demonstrate how to create a database from an ARN using an IAM role.

CREATE DATABASE sampledb FROM ARN <glue-database-arn> WITH NO DATA CATALOG SCHEMA IAM_ROLE <iam-role-arn>
CREATE DATABASE sampledb FROM ARN <glue-database-arn> WITH NO DATA CATALOG SCHEMA IAM_ROLE default;

You can also create a database using a DATA CATALOG SCHEMA.

CREATE DATABASE sampledb FROM ARN <glue-database-arn> WITH DATA CATALOG SCHEMA <sample_schema> IAM_ROLE default;

Create databases to receive results of zero-ETL integrations

To create a database using a zero-ETL integration identity, specify the integration_id in your CREATE DATABASE command.

CREATE DATABASE destination_db_name FROM INTEGRATION 'integration_id';

For example, first, retrieve the integration ids from SVV_INTEGRATION;

SELECT integration_id FROM SVV_INTEGRATION;

Then use one of the integration ids retrieved to create the database that receives zero-ETL integrations.

CREATE DATABASE sampledb FROM INTEGRATION 'a1b2c3d4-5678-90ab-cdef-EXAMPLE11111';

When the zero-ETL integrations source database is needed, then, for example, specify.

CREATE DATABASE sampledb FROM INTEGRATION 'a1b2c3d4-5678-90ab-cdef-EXAMPLE11111' DATABASE 'sourcedb';

You can also set a refresh interval for the database. For example, to set the refresh interval to 7,200 seconds for data from a zero-ETL integration source:

CREATE DATABASE myacct_mysql FROM INTEGRATION 'a1b2c3d4-5678-90ab-cdef-EXAMPLE11111' SET REFRESH_INTERVAL 7200;

Query the SVV_INTEGRATION catalog view for information about a zero-ETL integration, such as, integration_id, target_database, source, refresh_interval, and more.

SELECT * FROM svv_integration;

CREATE DATABASE limits

Amazon Redshift enforces these limits for databases:

  • Maximum of 60 user-defined databases per cluster.

  • Maximum of 127 bytes for a database name.

  • A database name can't be a reserved word.

Database collation

Collation is a set of rules that defines how database engine compares and sorts the character type data in SQL. Case-insensitive collation is the most commonly used collation. Amazon Redshift uses case-insensitive collation to facilitate migration from other data warehouse systems. With the native support of case-insensitive collation, Amazon Redshift continues to use important tuning or optimization methods, such as distribution keys, sort keys, or range restricted scan.

The COLLATE clause specifies the default collation for all CHAR and VARCHAR columns in the database. If CASE_INSENSITIVE is specified, all CHAR or VARCHAR columns use case-insensitive collation. For information about collation, see Collation sequences.

Data inserted or ingested in case-insensitive columns will keep its original case. But all comparison-based string operations including sorting and grouping are case-insensitive. Pattern matching operations such as LIKE predicates, similar to, and regular expression functions are also case-insensitive.

The following SQL operations support applicable collation semantics:

  • Comparison operators: =, <>, <, <=, >, >=.

  • LIKE operator

  • ORDER BY clauses

  • GROUP BY clauses

  • Aggregate functions that use string comparison, such as MIN and MAX and LISTAGG

  • Window functions, such as PARTITION BY clauses and ORDER BY clauses

  • Scalar functions greatest() and least(), STRPOS(), REGEXP_COUNT(), REGEXP_REPLACE(), REGEXP_INSTR(), REGEXP_SUBSTR()

  • Distinct clause

  • UNION, INTERSECT and EXCEPT

  • IN LIST

For external queries, including Amazon Redshift Spectrum and Aurora PostgreSQL federated queries, collation of VARCHAR or CHAR column is the same as the current database-level collation.

The following example queries a Amazon Redshift Spectrum table:

SELECT ci_varchar FROM spectrum.test_collation WHERE ci_varchar = 'AMAZON'; ci_varchar ---------- amazon Amazon AMAZON AmaZon (4 rows)

For information on how to create tables using database collation, see CREATE TABLE.

For information on the COLLATE function, see COLLATE function.

Database collation limitations

The following are limitations when working with database collation in Amazon Redshift:

  • All system tables or views, including PG catalog tables and Amazon Redshift system tables are case-sensitive.

  • When consumer database and producer database have different database-level collations, Amazon Redshift doesn't support cross-database and cross-cluster queries.

  • Amazon Redshift doesn't support case-insensitive collation in leader node-only query.

    The following example shows an unsupported case-insensitive query and the error that Amazon Redshift sends:

    SELECT collate(usename, 'case_insensitive') FROM pg_user; ERROR: Case insensitive collation is not supported in leader node only query.
  • Amazon Redshift doesn't support interaction between case-sensitive and case-insensitive columns, such as comparison, function, join, or set operations.

    The following examples show errors when case-sensitive and case-insensitive columns interact:

    CREATE TABLE test (ci_col varchar(10) COLLATE case_insensitive, cs_col varchar(10) COLLATE case_sensitive, cint int, cbigint bigint);
    SELECT ci_col = cs_col FROM test; ERROR: Query with different collations is not supported yet.
    SELECT concat(ci_col, cs_col) FROM test; ERROR: Query with different collations is not supported yet.
    SELECT ci_col FROM test UNION SELECT cs_col FROM test; ERROR: Query with different collations is not supported yet.
    SELECT * FROM test a, test b WHERE a.ci_col = b.cs_col; ERROR: Query with different collations is not supported yet.
    Select Coalesce(ci_col, cs_col) from test; ERROR: Query with different collations is not supported yet.
    Select case when cint > 0 then ci_col else cs_col end from test; ERROR: Query with different collations is not supported yet.
  • Amazon Redshift doesn't support collation for SUPER data type. Creating SUPER columns in case-insensitive databases and interactions between SUPER and case-insensitive columns aren't supported.

    The following example creates a table with the SUPER as the data type in the case-insensitive database:

    CREATE TABLE super_table (a super); ERROR: SUPER column is not supported in case insensitive database.

    The following example queries data with a case-insensitive string comparing with the SUPER data:

    CREATE TABLE test_super_collation (s super, c varchar(10) COLLATE case_insensitive, i int);
    SELECT s = c FROM test_super_collation; ERROR: Coercing from case insensitive string to SUPER is not supported.

To make these queries work, use the COLLATE function to convert collation of one column to match the other. For more information, see COLLATE function.

Examples

Creating a database

The following example creates a database named TICKIT and gives ownership to the user DWUSER.

create database tickit with owner dwuser;

To view details about databases, query the PG_DATABASE_INFO catalog table.

select datname, datdba, datconnlimit from pg_database_info where datdba > 1; datname | datdba | datconnlimit -------------+--------+------------- admin | 100 | UNLIMITED reports | 100 | 100 tickit | 100 | 100

The following example creates a database named sampledb with SNAPSHOT isolation level.

CREATE DATABASE sampledb ISOLATION LEVEL SNAPSHOT;

The following example creates the database sales_db from the datashare salesshare.

CREATE DATABASE sales_db FROM DATASHARE salesshare OF NAMESPACE '13b8833d-17c6-4f16-8fe4-1a018f5ed00d';

Database collation examples

Creating a case-insensitive database

The following example creates the sampledb database, creates the T1 table, and inserts data into the T1 table.

create database sampledb collate case_insensitive;

Connect to the new database that you just created using your SQL client. When using Amazon Redshift query editor v2, choose the sampledb in the Editor. When using RSQL, use a command like the following.

\connect sampledb;
CREATE TABLE T1 ( col1 Varchar(20) distkey sortkey );
INSERT INTO T1 VALUES ('bob'), ('john'), ('Mary'), ('JOHN'), ('Bob');

Then the query finds results with John.

SELECT * FROM T1 WHERE col1 = 'John'; col1 ------ john JOHN (2 row)
Ordering in a case-insensitive order

The following example shows the case-insensitive ordering with table T1. The ordering of Bob and bob or John and john is nondeterministic because they are equal in case-insensitive column.

SELECT * FROM T1 ORDER BY 1; col1 ------ bob Bob JOHN john Mary (5 rows)

Similarly, the following example shows case-insensitive ordering with the GROUP BY clause. Bob and bob are equal and belong to the same group. It is nondeterministic which one shows up in the result.

SELECT col1, count(*) FROM T1 GROUP BY 1; col1 | count -----+------ Mary | 1 bob | 2 JOHN | 2 (3 rows)
Querying with a window function on case-insensitive columns

The following example queries a window function on a case-insensitive column.

SELECT col1, rank() over (ORDER BY col1) FROM T1; col1 | rank -----+------ bob | 1 Bob | 1 john | 3 JOHN | 3 Mary | 5 (5 rows)
Querying with the DISTINCT keyword

The following example queries the T1 table with the DISTINCT keyword.

SELECT DISTINCT col1 FROM T1; col1 ------ bob Mary john (3 rows)
Querying with the UNION clause

The following example shows the results from the UNION of the tables T1 and T2.

CREATE TABLE T2 AS SELECT * FROM T1;
SELECT col1 FROM T1 UNION SELECT col1 FROM T2; col1 ------ john bob Mary (3 rows)