Use generative AI with foundation models - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Use generative AI with foundation models

Amazon SageMaker Canvas provides generative AI foundation models that you can use to start conversational chats. These content generation models are trained on large amounts of text data to learn the statistical patterns and relationships between words, and they can produce coherent text that is statistically similar to the text on which they were trained. You can use this capability to increase your productivity by doing the following:

  • Generate content, such as document outlines, reports, and blogs

  • Summarize text from large corpuses of text, such as earnings call transcripts, annual reports, or chapters of user manuals

  • Extract insights and key takeaways from large passages of text, such as meeting notes or narratives

  • Improve text and catch grammatical errors or typos

The foundation models are a combination of Amazon SageMaker JumpStart and Amazon Bedrock large language models (LLMs). Canvas offers the following models:

Model Type Description

Amazon Titan

Amazon Bedrock model

Amazon Titan is a powerful, general-purpose language model that you can use for tasks such as summarization, text generation (such as creating a blog post), classification, open-ended Q&A, and information extraction. It is pretrained on large datasets, making it suitable for complex tasks and reasoning. To continue supporting best practices in the responsible use of AI, Amazon Titan foundation models are built to detect and remove harmful content in the data, reject inappropriate content in the user input, and filter model outputs that contain inappropriate content (such as hate speech, profanity, and violence).

Anthropic Claude Instant

Amazon Bedrock model

Anthropic's Claude Instant is a faster and more cost-effective yet still very capable model. This model can handle a range of tasks including casual dialogue, text analysis, summarization, and document question answering. Just like Claude-2, Claude Instant can support up to 100,000 tokens in each prompt, equivalent to about 200 pages of information.

Anthropic Claude-2

Amazon Bedrock model

Claude-2 is Anthropic's most powerful model, which excels at a wide range of tasks from sophisticated dialogue and creative content generation to detailed instruction following. Claude-2 can take up to 100,000 tokens in each prompt, equivalent to about 200 pages of information. It can generate longer responses compared to its prior version. It supports use cases such as question answering, information extraction, removing PII, content generation, multiple-choice classification, roleplay, comparing text, summarization, and document Q&A with citation.

Falcon-7B-Instruct

SageMaker JumpStart model

Falcon-7B-Instruct has 7 billion parameters and was fine-tuned on a mixture of chat and instruct datasets. It is suitable as a virtual assistant and performs best when following instructions or engaging in conversation. Since the model was trained on large amounts of English-language web data, it carries the stereotypes and biases commonly found online and is not suitable for languages other than English. Compared to Falcon-40B-Instruct, Falcon-7B-Instruct is a slightly smaller and more compact model.

Falcon-40B-Instruct

SageMaker JumpStart model

Falcon-40B-Instruct has 40 billion parameters and was fine-tuned on a mixture of chat and instruct datasets. It is suitable as a virtual assistant and performs best when following instructions or engaging in conversation. Since the model was trained on large amounts of English-language web data, it carries the stereotypes and biases commonly found online and is not suitable for languages other than English. Compared to Falcon-7B-Instruct, Falcon-40B-Instruct is a slightly larger and more powerful model.

Jurassic-2 Mid

Amazon Bedrock model

Jurassic-2 Mid is a high-performance text generation model trained on a massive corpus of text (current up to mid 2022). It is highly versatile, general-purpose, and capable of composing human-like text and solving complex tasks such as question answering, text classification, and many others. This model offers zero-shot instruction capabilities, allowing it to be directed with only natural language and without the use of examples. It performs up to 30% faster than its predecessor, the Jurassic-1 model.

Jurassic-2 Mid is AI21’s mid-sized model, carefully designed to strike the right balance between exceptional quality and affordability.

Jurassic-2 Ultra

Amazon Bedrock model

Jurassic-2 Ultra is a high-performance text generation model trained on a massive corpus of text (current up to mid 2022). It is highly versatile, general-purpose, and capable of composing human-like text and solving complex tasks such as question answering, text classification, and many others. This model offers zero-shot instruction capabilities, allowing it to be directed with only natural language and without the use of examples. It performs up to 30% faster than its predecessor, the Jurassic-1 model.

Compared to Jurassic-2 Mid, Jurassic-2 Ultra is a slightly larger and more powerful model.

Llama-2-7b-Chat

SageMaker JumpStart model

Llama-2-7b-Chat is a foundation model by Meta that is suitable for engaging in meaningful and coherent conversations, generating new content, and extracting answers from existing notes. Since the model was trained on large amounts of English-language internet data, it carries the biases and limitations commonly found online and is best-suited for tasks in English.

Llama-2-13B-Chat

Amazon Bedrock model

Llama-2-13B-Chat by Meta was fine-tuned on conversational data after initial training on internet data. It is optimized for natural dialog and engaging chat abilities, making it well-suited as a conversational agent. Compared to the smaller Llama-2-7b-Chat, Llama-2-13B-Chat has nearly twice as many parameters, allowing it to remember more context and produce more nuanced conversational responses. Like Llama-2-7b-Chat, Llama-2-13B-Chat was trained on English-language data and is best-suited for tasks in English.

Llama-2-70B-Chat

Amazon Bedrock model

Like Llama-2-7b-Chat and Llama-2-13B-Chat, the Llama-2-70B-Chat model by Meta is optimized for engaging in natural and meaningful dialog. With 70 billion parameters, this large conversational model can remember more extensive context and produce highly coherent responses when compared to the more compact model versions. However, this comes at the cost of slower responses and higher resource requirements. Llama-2-70B-Chat was trained on large amounts of English-language internet data and is best-suited for tasks in English.

Mistral-7B

SageMaker JumpStart model

Mistral-7B by Mistral.AI is an excellent general purpose language model suitable for a wide range of natural language (NLP) tasks like text generation, summarization, and question answering. It utilizes grouped-query attention (GQA) which allows for faster inference speeds, making it perform comparably to models with twice or three times as many parameters. It was trained on a mixture of text data including books, websites, and scientific papers in the English language, so it is best-suited for tasks in English.

Mistral-7B-Chat

SageMaker JumpStart model

Mistral-7B-Chat is a conversational model by Mistral.AI based on Mistral-7B. While Mistral-7B is best for general NLP tasks, Mistral-7B-Chat has been further fine-tuned on conversational data to optimize its abilities for natural, engaging chat. As a result, Mistral-7B-Chat generates more human-like responses and remembers the context of previous responses. Like Mistral-7B, this model is best-suited for English language tasks.

MPT-7B-Instruct

SageMaker JumpStart model

MPT-7B-Instruct is a model for long-form instruction following tasks and can assist you with writing tasks including text summarization and question-answering to save you time and effort. This model was trained on large amounts of fine-tuned data and can handle larger inputs, such as complex documents. Use this model when you want to process large bodies of text or want the model to generate long responses.

The foundation models from Amazon Bedrock are currently only available in the US East (N. Virginia) and US West (Oregon) Regions. Additionally, when using foundation models from Amazon Bedrock, you are charged based on the volume of input tokens and output tokens, as specified by each model provider. For more information, see the Amazon Bedrock pricing page. The SageMaker JumpStart foundation models are deployed on SageMaker Hosting instances, and you are charged for the duration of usage based on the instance type used. For more information about the cost of different instance types, see the Amazon SageMaker Hosting: Real-Time Inference section on the SageMaker pricing page.

Document querying is an additional feature that you can use to query and get insights from documents stored in indexes using Amazon Kendra. With this functionality, you can generate content from the context of those documents and receive responses that are specific to your business use case, as opposed to responses that are generic to the large amounts of data on which the foundation models were trained. For more information about indexes in Amazon Kendra, see the Amazon Kendra Developer Guide.

If you would like to get responses from any of the foundation models that are customized to your data and use case, you can fine-tune foundation models. To learn more, see Fine-tune foundation models.

To get started, see the following sections.

Prerequisites

The following sections outline the prerequisites for interacting with foundation models and using the document query feature in Canvas. The rest of the content on this page assumes that you’ve met the prerequisites for foundation models. The document query feature requires additional permissions.

Prerequisites for foundation models

The permissions you need for interacting with models are included in the Canvas Ready-to-use models permissions. To use the generative AI-powered models in Canvas, you must turn on the Canvas Ready-to-use models configuration permissions when setting up your Amazon SageMaker domain. For more information, see Prerequisites for setting up Amazon SageMaker Canvas. The Canvas Ready-to-use models configuration attaches the AmazonSageMakerCanvasAIServicesAccess policy to your Canvas user's Amazon Identity and Access Management (IAM) execution role. If you encounter any issues with granting permissions, see the topic Troubleshooting issues with granting permissions through the SageMaker console.

If you’ve already set up your domain, you can edit your domain settings and turn on the permissions. For instructions on how to edit your domain settings, see View and edit domains. When editing the settings for your domain, go to the Canvas settings and turn on the Enable Canvas Ready-to-use models option.

Certain SageMaker JumpStart foundation models also require that you request a SageMaker instance quota increase. Canvas hosts the models that you’re currently interacting with on these instances, but the default quota for your account may be insufficient. If you run into an error while running any of the following models, request a quota increase for the associated instance types:

  • Falcon-40B – ml.g5.12xlarge, ml.g5.24xlarge

  • Falcon-13B – ml.g5.2xlarge, ml.g5.4xlarge, ml.g5.8xlarge

  • MPT-7B-Instruct – ml.g5.2xlarge, ml.g5.4xlarge, ml.g5.8xlarge

For the preceding instances types, request an increase from 0 to 1 for the endpoint usage quota. For more information about how to increase an instance quota for your account, see Requesting a quota increase in the Service Quotas User Guide.

Prerequisites for document querying

Note

Document querying is supported in the following Amazon Web Services Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Asia Pacific (Mumbai).

The document querying feature requires that you already have an Amazon Kendra index that stores your documents and document metadata. For more information about Amazon Kendra, see the Amazon Kendra Developer Guide. To learn more about the quotas for querying indexes, see Quotas in the Amazon Kendra Developer Guide.

You must also make sure that your Canvas user profile has the necessary permissions for document querying. The AmazonSageMakerCanvasFullAccess policy must be attached to the Amazon IAM execution role for the SageMaker domain that hosts your Canvas application (this policy is attached by default to all new and existing Canvas user profiles). You must also specifically grant document querying permissions and specify access to one or more Amazon Kendra indexes.

If your Canvas administrator is setting up a new domain or user profile, have them set up the domain by following the instructions in Prerequisites for setting up Amazon SageMaker Canvas. While setting up the domain, they can turn on the document querying permissions through the Canvas Ready-to-use models configuration.

The Canvas administrator can manage document querying permissions at the user profile level as well. For example, if the administrator wants to grant document querying permissions to some user profiles but remove permissions for others, they can edit the permissions for a specific user.

The following procedure shows how to turn on document querying permissions for a specific user profile:

  1. Open the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. On the left navigation pane, choose Admin configurations.

  3. Under Admin configurations, choose domains.

  4. From the list of domains, select the user profile’s domain.

  5. On the domain details page, choose the User profile whose permissions you want to edit.

  6. On the User Details page, choose Edit.

  7. In the left navigation pane, choose Canvas settings.

  8. In the Canvas Ready-to-use models configuration section, turn on the Enable document query using Amazon Kendra toggle.

  9. In the dropdown, select one or more Amazon Kendra indexes to which you want to grant access.

  10. Choose Submit to save the changes to your domain settings.

You should now be able to use Canvas foundation models to query documents in the specified Amazon Kendra indexes.

Start a new conversation to generate, extract, or summarize content

To get started with generative AI foundation models in Canvas, you can initiate a new chat session with one of the models. For SageMaker JumpStart models, you are charged while the model is active, so you must start up models when you want to use them and shut them down when you are done interacting. If you do not shut down a SageMaker JumpStart model, Canvas shuts it down after 2 hours of inactivity. For Amazon Bedrock models (such as Amazon Titan), you are charged by prompt; the models are already active and don’t need to be started up or shut down. You are charged directly for use of these models by Amazon Bedrock.

To open a chat with a model, do the following:

  1. Open the SageMaker Canvas application.

  2. In the left navigation pane, choose Ready-to-use models.

  3. Choose Generate, extract and summarize content.

  4. On the welcome page, you’ll receive a recommendation to start up the default model. You can start the recommended model, or you can choose Select another model from the dropdown to choose a different one.

  5. If you selected a SageMaker JumpStart foundation model, you have to start it up before it is available for use. Choose Start up the model, and then the model is deployed to a SageMaker instance. It might take several minutes for this to complete. When the model is ready, you can enter prompts and ask the model questions.

    If you selected a foundation model from Amazon Bedrock, you can start using it instantly by entering a prompt and asking questions.

Depending on the model, you can perform various tasks. For example, you can enter a passage of text and ask the model to summarize it. Or, you can ask the model to come up with a short summary of the market trends in your domain.

The model’s responses in a chat are based on the context of your previous prompts. If you want to ask a new question in the chat that is unrelated to the previous conversation topic, we recommend that you start a new chat with the model.

Extract information from documents with document querying

Note

This section assumes that you’ve completed the section above Prerequisites for document querying.

Document querying is a feature that you can use while interacting with foundation models in Canvas. With document querying, you can access a corpus of documents stored in an Amazon Kendra index, which holds the contents of your documents and is structured in a way to make documents searchable. You can ask specific questions that are targeted to the data in your Amazon Kendra index, and the foundation model returns answers to your questions. For example, you can query an internal knowledge base of IT information and ask questions such as “How do I connect to my company’s network?” For more information about setting up an index, see the Amazon Kendra Developer Guide.

When using the document query feature, the foundation models restrict their responses to the content of the documents in your index with a technique called Retrieval Augmented Generation (RAG). This technique bundles the most relevant information from the index along with the user's prompt and sends it to the foundation model to get a response. Responses are limited to what can be found in your index, preventing the model from giving you incorrect responses based on external data. For more information about this process, see the blog post Quickly build high-accuracy Generative AI applications on enterprise data.

To get started, in a chat with a foundation model in Canvas, turn on the Document query toggle at the top of the page. From the dropdown, select the Amazon Kendra index that you want to query. Then, you can begin asking questions related to the documents in your index.

Important

Document querying supports the Compare model outputs feature. Any existing chat history is overwritten when you start a new chat to compare model outputs.

Model management

Note

The following section describes starting up and shutting down models, which only applies to the SageMaker JumpStart foundation models, such as Falcon-40B-Instruct. You can access Amazon Bedrock models, such as Amazon Titan, instantly at any time.

You can start up as many SageMaker JumpStart models as you like. Each active SageMaker JumpStart model incurs charges on your account, so we recommend that you don’t start up more models than you are currently using.

To start up another model, you can do the following:

  1. On the Generate, extract and summarize content page, choose New chat.

  2. Choose the model from the dropdown menu. If you want to choose a model not displayed in the dropdown, choose Start up another model, and then select the model that you want to start up.

  3. Choose Start up model.

The model should begin starting up, and within a few minutes you can chat with the model.

We highly recommend that you shut down models that you aren’t using. The models automatically shut down after 2 hours of inactivity. However, to manually shut down a model, you can do the following:

  1. On the Generate, extract and summarize content page, open the chat for the model that you want to shut down.

  2. On the chat page, choose the More options icon ( ).

  3. Choose Shut down model.

  4. In the Shut down model confirmation box, choose Shut down.

The model begins shutting down. If your chat compares two or more models, you can shut down an individual model from the chat page by choosing the model’s More options icon ( ) and then choosing Shut down model.

Compare model outputs

You might want to compare the output of different models side by side to see which model output you prefer. This can help you decide which model is best suited to your use case. You can compare up to three models in chats.

Note

Each individual model incurs charges on your account.

You must start a new chat to add models for comparison. To compare the output of models side by side in a chat, do the following:

  1. In a chat, choose New chat.

  2. Choose Compare, and use the dropdown menu to select the model that you want to add. To add a third model, choose Compare again to add another model.

    Note

    If you want to use a SageMaker JumpStart model that isn’t currently active, you are prompted to start up the model.

When the models are active, you see the two models side by side in the chat. You can submit your prompt, and each model responds in the same chat, as shown in the following screenshot.


                Screenshot of the Canvas interface with the output of two models shown
                    side by side.

When you’re done interacting, make sure to shut down any SageMaker JumpStart models individually to avoid incurring further charges.