What is a datashare? - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

What is a datashare?

A datashare is the unit of sharing data in Amazon Redshift. Use datashares to share data in the same Amazon Web Services account or different Amazon Web Services accounts. Also, share data for read purposes across different Amazon Redshift clusters.

Each datashare is associated with a specific database in your Amazon Redshift cluster.

A producer cluster administrator can create datashares and add datashare objects to share data with other clusters, referred to as outbound shares. A consumer cluster administrator can receive datashares from other clusters, referred to as inbound shares. For details on producers and consumers, see Datashare producers and consumers.

Datashare objects are objects from specific databases on a cluster that producer cluster administrators can add to datashares to be shared with data consumers. Datashare objects are read-only for data consumers. Examples of datashare objects are tables, views, and user-defined functions. You can add datashare objects to datashares while creating datashares or editing a datashare at any time.

Data sharing continues to work when clusters are resized or when the producer cluster is paused.

There are different types of datashares.

Standard datashares

With standard datashares, you can share data across provisioned clusters, serverless workgroups, Availability Zones, Amazon Web Services accounts, and Amazon Web Services Regions. You can share between cluster types as well as between provisioned clusters and Amazon Redshift Serverless.

To share data, note the following provisioned cluster, serverless namespace, and Amazon Web Services account identifiers:

  • Provisioned cluster namespaces are identifiers that identify Amazon Redshift provisioned clusters. A namespace globally unique identifier (GUID) is automatically created during provisioned cluster creation and attached to the cluster. A namespace Amazon Resource Name (ARN) is in the arn:{partition}:redshift:{region}:{account-id}:namespace:{namespace-guid} format. You can see the namespace of a provisioned cluster on the cluster details page on the Amazon Redshift console.

    In the data sharing workflow, the namespace GUID value and the cluster namespace ARN are used to share data with clusters in the Amazon Web Services account. You can also find the namespace for the current cluster by using the current_namespace function.

  • Serverless namespaces are identifiers that identify Amazon Redshift Serverless. A namespace globally unique identifier (GUID) is automatically created during Amazon Redshift Serverless creation and attached to the instance. A serverless namespace ARN is in the arn:{partition}:redshift-serverless:{region}:{account-id}:namespace/{namespace-guid} format.

  • Amazon Web Services accounts can be consumers for datashares and are each represented by a 12-digit Amazon Web Services account ID.

For standard datashares, consider the following:

  • When a producer cluster is deleted, Amazon Redshift deletes the datashares created by the producer cluster. When a producer cluster is backed up and restored, the created datashares still persist on the restored cluster. However, datashare permissions granted to other clusters are no longer valid on the restored cluster. Re-grant usage permissions of datashares to desired consumer clusters. The consumer database on the consumer cluster points to the datashare from the original cluster where the snapshot is taken. To query the shared data from the restored cluster, the consumer cluster administrator creates a different database. Or the administrator can drop and recreate an existing consumer database to use the datashare from the newly restored cluster.

  • When a consumer cluster is deleted and restored from a snapshot, the previous access shared to this cluster would no longer be valid and visible. If access to datashares is still required on the restored consumer cluster, the producer cluster administrator must grant usage of datashares to the restored consumer cluster again. The consumer cluster administrator must drop any stale consumer databases created from the inactive datashares. Then the administrator must recreate the consumer database from the datashare, after the producer re-granted the permissions. As the cluster namespace GUID is different on a restored cluster from the original cluster, re-grant datashare permissions when the consumer or producer cluster is restored from backup.

Amazon Web Services Data Exchange datashares

An Amazon Web Services Data Exchange datashare is a unit of licensing for sharing your data through Amazon Web Services Data Exchange. Amazon manages all billing and payments associated with subscriptions to Amazon Web Services Data Exchange and use of Amazon Redshift data sharing. Approved data providers can add Amazon Web Services Data Exchange datashares to Amazon Web Services Data Exchange products. When customers subscribe to a product with Amazon Web Services Data Exchange datashares, they get access to the datashares in the product.

Amazon Web Services Data Exchange for Amazon Redshift makes it convenient to license access to your Amazon Redshift data through Amazon Web Services Data Exchange. When a customer subscribes to a product with Amazon Web Services Data Exchange datashares, Amazon Web Services Data Exchange automatically adds the customer as a data consumer on all Amazon Web Services Data Exchange datashares included with the product. Invoices are automatically generated, and payments are centrally collected and automatically disbursed through Amazon Marketplace Entitlement Service.

Providers can license data in Amazon Redshift at a granular level, such as schemas, tables, views, and user-defined functions. You can use the same Amazon Web Services Data Exchange datashare across multiple Amazon Web Services Data Exchange products. Any objects added to the Amazon Web Services Data Exchange datashare is available to consumers. Producers can view all Amazon Web Services Data Exchange datashares managed by Amazon Web Services Data Exchange on their behalf using Amazon Redshift API operations, SQL commands, and the Amazon Redshift console. Customers who subscribe to a product Amazon Web Services Data Exchange datashares have read-only access to the objects in the datashares.

Customers who want to consume third-party producer data can browse the Amazon Web Services Data Exchange catalog to discover and subscribe to datasets in Amazon Redshift. After their Amazon Web Services Data Exchange subscription is active, they can create a database from the datashare in their cluster and query the data in Amazon Redshift.

How Amazon Web Services Data Exchange datashares work

Managing Amazon Web Services Data Exchange datashares as a producer administrator

If you are a data producer (also known as a provider on Amazon Web Services Data Exchange), you can create Amazon Web Services Data Exchange datashares that connect to your Amazon Redshift databases. To add Amazon Web Services Data Exchange datashares to products on Amazon Web Services Data Exchange, you must be a registered Amazon Web Services Data Exchange provider.

For more information on how to get started with Amazon Web Services Data Exchange datashares, see Sharing licensed Amazon Redshift data on Amazon Web Services Data Exchange.

Using Amazon Web Services Data Exchange datashares as a consumer with an active Amazon Web Services Data Exchange subscription

If you are a consumer with an active Amazon Web Services Data Exchange subscription (also known as a subscriber on Amazon Web Services Data Exchange), you can browse the Amazon Web Services Data Exchange catalog on the Amazon Web Services Data Exchange console to discover products containing Amazon Web Services Data Exchange datashares.

After you subscribe to a product that contains Amazon Web Services Data Exchange datashares, create a database from the datashare within your cluster. You can then query the data in Amazon Redshift directly without extracting, transforming, and loading the data.

For more information on how to get started with Amazon Web Services Data Exchange datashares, see Sharing licensed Amazon Redshift data on Amazon Web Services Data Exchange.

For Amazon Web Services Data Exchange datashares, consider the following:

  • When a producer cluster is deleted, Amazon Redshift deletes the datashares created by the producer cluster. When a producer cluster is backed up and restored, the created datashares still persist on the restored cluster. For data subscribers to be able to continue accessing the data, create the Amazon Web Services Data Exchange datashares again and publish them to the product's data sets. The consumer database on the consumer cluster points to the datashare from the original cluster where the snapshot is taken. To query the shared data from the restored cluster, the consumer cluster administrator creates a different database, or drops and recreates an existing consumer database to use the newly created Amazon Web Services Data Exchange datashare from the newly restored cluster.

  • When a consumer cluster is deleted and restored from a snapshot, the previous access shared to this cluster remains valid and visible. Consumer cluster administrator must drop any stale consumer databases created from the inactive datashares and recreate the consumer database from the datashare after the producer re-grants the permissions. As the cluster namespace GUID is different on a restored cluster from the original cluster, re-grant datashare permissions when the producer cluster is restored from backup.

  • We recommend that you don't delete your cluster if you have any Amazon Web Services Data Exchange datashares. Performing this type of alteration can breach data product terms in Amazon Web Services Data Exchange.

Considerations when using Amazon Web Services Data Exchange for Amazon Redshift

When using Amazon Web Services Data Exchange for Amazon Redshift, consider the following:

  • Both producers and consumers must use the RA3 instance types to use Amazon Redshift datashares. Producers must use the RA3 instance types with the latest Amazon Redshift cluster version.

  • Both the producer and consumer clusters must be encrypted.

  • You must be registered as an Amazon Web Services Data Exchange provider to list products on Amazon Web Services Data Exchange, including products that contain Amazon Web Services Data Exchange datashares. For more information, see Getting started as a provider.

  • You don't need to be a registered Amazon Web Services Data Exchange provider to find, subscribe to, and query Amazon Redshift data through Amazon Web Services Data Exchange.

  • To control access to your data, create Amazon Web Services Data Exchange datashares with the publicly accessible setting turned on. To alter an Amazon Web Services Data Exchange datashare to turn off the publicly accessible setting, set the session variable to allow ALTER DATASHARE SET PUBLICACCESSIBLE FALSE. For more information, see ALTER DATASHARE usage notes.

  • Producers can't manually add or remove consumers from Amazon Web Services Data Exchange datashares because access to the datashares is granted based on having an active subscription to an Amazon Web Services Data Exchange product that contains the Amazon Web Services Data Exchange datashare.

  • Producers can't view the SQL queries that consumers run. They can only view metadata, such as the number of queries or the objects consumers query, through Amazon Redshift tables that only the producer can access. For more information, see Monitoring and auditing data sharing in Amazon Redshift.

  • We recommend that you make your datashares publicly accessible. If you don't, subscribers on Amazon Web Services Data Exchange with publicly accessible consumer clusters won't be able to use your datashare.

  • We recommend that you don't delete an Amazon Web Services Data Exchange datashare shared to other Amazon Web Services accounts using the DROP DATASHARE statement. If you do, the Amazon Web Services accounts that have access to the datashare will lose access. This action is irreversible. Performing this type of alteration can breach data product terms in Amazon Web Services Data Exchange. If you want to delete an Amazon Web Services Data Exchange datashare, see DROP DATASHARE usage notes.

  • For cross-Region data sharing, you can create Amazon Web Services Data Exchange datashares to share licensed data.

  • When consuming data from a different Region, the consumer pays the Cross-Region data transfer fee from the producer Region to the consumer Region.

Amazon Lake Formation-managed datashares

Using Amazon Lake Formation, you can centrally define and enforce database, table, column, and row-level access permissions of Amazon Redshift datashares and restrict user access to objects within a datashare. By sharing data through Lake Formation, you can define permissions in Lake Formation and apply those permissions to any datashare and its objects. For example, if you have a table containing employee information, you can use Lake Formation's column-level filters to prevent employees who don't work in the HR department from seeing personally identifiable information (PII), such as a social security number. For more information about data filters, see Data filtering and cell-level security in Lake Formation in the Amazon Lake Formation Developer Guide.

You can also use tags in Lake Formation to configure permissions on Lake Formation resources. For more information, see Lake Formation Tag-based access control.

Amazon Redshift currently supports data sharing via Lake Formation when sharing within the same account or across accounts. Cross-Region sharing is currently not supported.

The following is a high-level overview of how to use Lake Formation to control datashare permissions:

  1. In Amazon Redshift, the producer cluster or workgroup administrator creates a datashare on the producer cluster or workgroup and grants usage to a Lake Formation account.

  2. The producer cluster or workgroup administrator authorizes the Lake Formation account to access the datashare.

  3. The Lake Formation administrator discovers and registers the datashares. They must also discover the Amazon Glue ARNs they have access to and associate the datashares with an Amazon Glue Data Catalog ARN. If you're using the Amazon CLI you can discover and accept datashares with the Redshift CLI operations describe-data-shares and associate-data-share-consumer. To register a datashare, use the Lake Formation CLI operation register-resource.

  4. The Lake Formation administrator creates a federated database in the Amazon Glue Data Catalog, and configures Lake Formation permissions to control user access to objects within the datashare. For more information about federated databases in Amazon Glue, see Managing permissions for data in an Amazon Redshift datashare.

  5. The Lake Formation administrator discovers the Amazon Glue databases they have access to and associates the datashare with an Amazon Glue Data Catalog ARN.

  6. The Redshift administrator discovers the Amazon Glue database ARNs they have access to, creates an external database in the Amazon Redshift consumer cluster using a Amazon Glue database ARN, and grants usage to database users authenticated with IAM credentials to start querying the Amazon Redshift database.

  7. Database users can use the views SVV_EXTERNAL_TABLES and SVV_EXTERNAL_COLUMNS to find all of the tables or columns within the Amazon Glue database that they have access to, and then they can query the Amazon Glue database’s tables.

  8. When the producer cluster or workgroup administrator decides to no longer share the data with the consumer cluster, the producer cluster administrator can revoke usage, deauthorize, or delete the datashare from Redshift. The associated permissions and objects in Lake Formation are not automatically deleted.

For more information about sharing a datashare with Amazon Lake Formation as a producer cluster or workgroup administrator, see Working with Lake Formation-managed datashares as a producer. To consume the shared data from the producer cluster or workgroup, see Working with Lake Formation-managed datashares as a consumer.

Considerations and limitations when using Amazon Lake Formation with Amazon Redshift

The following are considerations and limitations for sharing Amazon Redshift data via Lake Formation. For information on data sharing considerations and limitations, see Considerations when using data sharing in Amazon Redshift. For information about Lake Formation limitations, see Notes on working with Amazon Redshift datashares in Lake Formation.

  • Sharing a datashare to Lake Formation across Regions is currently unsupported.

  • If column-level filters are defined for a user on a shared relation, performing a SELECT * operation returns only the columns the user has access to.

  • Cell-level filters from Lake Formation are unsupported.

  • If you created and shared a view and its tables to Lake Formation, you can configure filters to manage access of the tables, Amazon Redshift enforces Lake Formation defined policies when consumer cluster users access shared objects. When a user accesses a view shared with Lake Formation, Redshift enforces only the Lake Formation policies defined on the view and not the tables contained within the view. However, when users directly access the table, Redshift enforces the defined Lake Formation policies on the table.

  • You can't create materialized views on the consumer based on a shared table if the table has Lake Formation filters configured.

  • The Lake Formation administrator must have data lake administrator permissions and the required permissions to accept a datashare.

  • The producer consumer cluster must be an RA3 cluster with the latest Amazon Redshift cluster version or a serverless workgroup to share datashares via Lake Formation.

  • Both the producer and consumer clusters must be encrypted.

  • Redshift row-level and column-level access control policies implemented in the producer cluster or workgroup are ignored when the datashare is shared to Lake Formation. The Lake Formation administrator must configure these policies in Lake Formation. The producer cluster or workgroup administrator can turn off RLS for a table by using the ALTER TABLE command.

  • Sharing datashares via Lake Formation is only available to users who have access to both Redshift and Lake Formation.

Datashare producers and consumers

Data producers (also known as data sharing producers or datashare producers) are clusters that you want to share data from. Producer cluster administrators and database owners can create datashares using the CREATE DATASHARE command. You can add objects such as schemas, tables, views, and SQL user-defined functions (UDFs) from a database that you want the producer cluster to share with consumer clusters for read purposes.

Data producers (also known as providers on Amazon Web Services Data Exchange) for Amazon Web Services Data Exchange datashares can license data through Amazon Web Services Data Exchange. Approved providers can add Amazon Web Services Data Exchange datashares to Amazon Web Services Data Exchange products.

When a customer subscribes to a product with Amazon Web Services Data Exchange datashares, Amazon Web Services Data Exchange automatically adds the customer as a data consumer on all Amazon Web Services Data Exchange datashares included with the product. Amazon Web Services Data Exchange also removes all customers from Amazon Web Services Data Exchange datashares when their subscription ends. Amazon Web Services Data Exchange also automatically manages billing, invoicing, payment collection, and payment distribution for paid products with Amazon Web Services Data Exchange datashares. For more information, see Amazon Web Services Data Exchange datashares. To register as an Amazon Web Services Data Exchange data provider, see Getting started as a provider.

Data consumers (also known as data sharing consumers or datashare consumers) are clusters that receive datashares from producer clusters.

Amazon Redshift clusters that share data can be in the same or different Amazon Web Services accounts or different Amazon Web Services Regions, so you can share data across organizations and collaborate with other parties. Consumer cluster administrators receive the datashares that they are granted usage for and review the contents of each datashare. To consume shared data, the consumer cluster administrator creates an Amazon Redshift database from the datashare. The administrator then assigns permissions for the database to users and roles in the consumer cluster. After permissions are granted, users and roles can list the shared objects as part of the standard metadata queries, along with the local data on the consumer cluster. They can start querying immediately.

If you are a consumer with an active Amazon Web Services Data Exchange subscription (also known as subscribers on Amazon Web Services Data Exchange), you can find, subscribe to, and query granular, up-to-date data in Amazon Redshift without the need to extract, transform, and load the data. For more information, see Amazon Web Services Data Exchange datashares.