Prerequisites Preparing Aurora PostgreSQL to be a Knowledge Base Creating a Knowledge Base in the Bedrock console

Using Aurora PostgreSQL as a Knowledge Base for Amazon Bedrock

You can use an Aurora PostgreSQL DB cluster as a Knowledge Base for Amazon Bedrock. For more information, see Create a vector store in Amazon Aurora. A Knowledge Base automatically takes unstructured text data stored in an Amazon S3 bucket, converts it to text chunks and vectors, and stores it in a PostgreSQL database. With the generative AI applications, you can use Agents for Amazon Bedrock to query the data stored in the Knowledge Base and use the results of those queries to augment answers provided by foundational models. This workflow is called Retrieval Augmented Generation (RAG). For more information on RAG, see Retrieval Augmented Generation (RAG).

For detailed information about using Aurora PostgreSQL to build generative AI applications using RAG, see this blog post.

Topics

Prerequisites
Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock
Creating a Knowledge Base in the Bedrock console
Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock

Prerequisites

Familiarize yourself with the following prerequisites to use Aurora PostgreSQL cluster as a Knowledge Base for Amazon Bedrock. At a high level, you need to configure the following services for use with Bedrock:

Amazon Aurora PostgreSQL DB cluster created in any of the following versions:
- 16.1 and all higher versions
- 15.4 and higher versions
- 14.9 and higher versions
- 13.12 and higher versions
- 12.16 and higher versions
Note
You must enable the pgvector extension in your target database and use version 0.5.0 or higher. For more information, see pgvector v0.5.0 with HNSW indexing.
RDS Data API
A user managed in Amazon Secrets Manager. For more information, see Password management with Amazon Aurora and Amazon Secrets Manager.

Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock

Follow the steps explained in the below sections to prepare Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock.

Creating and configuring Aurora PostgreSQL

To configure Amazon Bedrock with an Aurora PostgreSQL DB cluster, you must first create an Aurora PostgreSQL DB cluster and take note of the important fields for configuring it with Amazon Bedrock. For more information about creating Aurora PostgreSQL DB cluster, see Creating and connecting to an Aurora PostgreSQL DB cluster.

Enable Data API while creating Aurora PostgreSQL DB cluster. For more information on the versions supported, see Using the Amazon RDS Data API.
Make sure to note down the Amazon Resource Names (ARN) of your Aurora PostgreSQL DB cluster. You'll need it to configure the DB cluster for use with Amazon Bedrock. For more information, see Amazon Resource Names (ARNs).

Connecting to a database and installing pgvector

You can connect to Aurora PostgreSQL using any of the connection utilities. For more detailed information on these utilities, see Connecting to an Amazon Aurora PostgreSQL DB cluster. Alternatively, you can use the RDS console query editor to run the queries. You need an Aurora DB cluster with the RDS Data API enabled to use the query editor.

Log in to the database with your master user and set up pgvector. Use the following command if the extension is not installed:
```
CREATE EXTENSION IF NOT EXISTS vector;
```
Use pgvector 0.5.0 and higher version that supports HNSW indexing. For more information, see pgvector v0.5.0 with HNSW indexing.
Use the following command to check the version of the pg_vector installed:
```
SELECT extversion FROM pg_extension WHERE extname='vector';
```

Setting up database objects and privileges

Create a specific schema that Bedrock can use to query the data. Use the following command to create a schema:
```
CREATE SCHEMA bedrock_integration;
```
Create a new role that Bedrock can use to query the database. Use the following command to create a new role:
```
CREATE ROLE bedrock_user WITH PASSWORD 'password' LOGIN;
```
Note
Make a note of this password as you will need it later to create a Secrets Manager password.

If you are using psql client, then use the following commands to create a new role:
```
CREATE ROLE bedrock_user LOGIN;
\PASSWORD password;
```
Grant the bedrock_user permissions to manage the bedrock_integration schema. This will provide the ability to create tables or indexes within the schema.
```
GRANT ALL ON SCHEMA bedrock_integration to bedrock_user;
```
Login as the bedrock_user and create a table in the bedrock_integration schema.
```
CREATE TABLE bedrock_integration.bedrock_kb (id uuid PRIMARY KEY, embedding vector(n), chunks text, metadata json, custom_metadata jsonb);
```
This command will create the bedrock_kb table in the bedrock_integration schema with Titan embeddings.

Replace n in the vector(n) data type with the appropriate dimension for the embedding model you are using. Use the recommendations below to help select your dimensions:
- For the Titan v2 model, use vector(1024), or vector(512), or vector (256). To learn more, see Amazon Titan Embeddings Text.
- For the Titan v1.2 model, use vector(1536). To learn more, see Amazon Titan Multimodal Embeddings G1.
- For the Cohere Embed model, use vector(1024). To learn more, see Cohere Embed models.
- For the Cohere Embed Multilingual v3, use vector(1024).
The first four columns are mandatory. For metadata handling, Bedrock writes data from your metadata files to the custom_metadata column. We recommend creating this column if you plan to use metadata and filtering. If you don't create a custom_metadata column, add individual columns for each metadata attribute in your table before you begin ingestion. For more information, see Configure and customize queries and response generation.
Follow these steps to create the required indexes that Bedrock uses to query your data:
- Create an index with the cosine operator which the bedrock can use to query the data.
```
CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);
```
- We recommend you to set the value of ef_construction to 256 for pgvector 0.6.0 and higher version that use parallel index building.
```
CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops) WITH (ef_construction=256);
```
- Create an index which Bedrock can use to query the text data.
```
CREATE INDEX ON bedrock_integration.bedrock_kb USING gin (to_tsvector('simple', chunks));
```
- If you created a column for custom metadata, create an index which Bedrock can use to query the metadata.
```
CREATE INDEX ON bedrock_integration.bedrock_kb USING gin (custom_metadata);
```

Create a secret in Secrets Manager

Secrets Manager lets you store your Aurora credentials so that they can be securely transmitted to applications. If you didn't choose the Amazon secrets manager option when creating Aurora PostgreSQL DB cluster, you can create a secret now. For more information about creating Amazon Secrets Manager database secret, see Amazon Secrets Manager database secret.

Creating a Knowledge Base in the Bedrock console

While preparing Aurora PostgreSQL to be used as the vector store for a Knowledge Base, you must gather the following details that you need to provide to Amazon Bedrock console.

Amazon Aurora DB cluster ARN – The ARN of your DB cluster.
Secret ARN – The ARN of the Amazon Secrets Manager key for your DB cluster.
Database name – The name of your database. For example, you can use the default database postgres.
Table name – We recommend you to provide a schema qualified name while creating the table using the command similar to the following:
```
CREATE TABLE bedrock_integration.bedrock_kb;
```
This command will create the bedrock_kb table in the bedrock_integration schema.

When creating the table, make sure to configure it with the specified columns and data types. You can use your preferred column names instead of those listed in the table. Remember to take a note of the names you chose for reference during the Knowledge Base set up.

Column name	Data type	Description
id	UUID primary key	Contains unique identifiers for each record.
chunks	Text	Contains the chunks of raw text from your data sources.
embedding	Vector	Contains the vector embeddings of the data sources.
metadata	JSON	Contains metadata required to carry out source attribution and to enable data ingestion and querying.
custom_metadata	JSONB	(Optional) Defines the target column where Amazon Bedrock writes metadata details from your data sources.

With these details, you can now create a Knowledge Base in the Bedrock console. For more detailed information on setting up a vector index and creating a Knowledge Base information, see Create a vector store in Amazon Aurora and Create a vector store in Amazon Aurora.

After adding Aurora as your Knowledge Base, you can now ingest your data sources for searching and querying. For more information, see Ingest your data sources into the Knowledge Base.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Monitoring local write forwarding in Aurora PostgreSQL

Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock