Supported large language models for fine-tuning - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Supported large language models for fine-tuning

Using Autopilot API, users can fine-tune the following large language models (LLMs). Those models are powered by Amazon SageMaker JumpStart.

Note

For fine-tuning models that require the acceptance of an end-user license agreement, you must explicitly declare EULA acceptance when creating your AutoML job. Note that after fine-tuning a pretrained model, the weights of the original model are changed, so you do not need to later accept a EULA when deploying the fine-tuned model.

For information on how to accept the EULA when creating a fine-tuning job using the AutoML API, see How to set the EULA acceptance when fine-tuning a model using the AutoML API.

You can find the full details of each model by searching for your JumpStart Model ID in the following model table, and then following the link in the Source column. These details might include the languages supported by the model, biases it may exhibit, the datasets employed for fine-tuning, and more.

JumpStart Model ID BaseModelName in API request Description
huggingface-textgeneration-dolly-v2-3b-bf16 Dolly3B

Dolly 3B is a 2.8 billion parameter instruction-following large language model based on pythia-2.8b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.

huggingface-textgeneration-dolly-v2-7b-bf16 Dolly7B

Dolly 7B is a 6.9 billion parameter instruction-following large language model based on pythia-6.9b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.

huggingface-textgeneration-dolly-v2-12b-bf16 Dolly12B

Dolly 12B is a 12 billion parameter instruction-following large language model based on pythia-12b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.

huggingface-llm-falcon-7b-bf16 Falcon7B

Falcon 7B is a 7 billion parameter causal large language model trained on 1,500 billion tokens enhanced with curated corpora. Falcon-7B is trained on English and French data only, and does not generalize appropriately to other languages. Because the model was trained on large amounts of web data, it carries the stereotypes and biases commonly found online.

huggingface-llm-falcon-7b-instruct-bf16 Falcon7BInstruct

Falcon 7B Instruct is a 7 billion parameter causal large language model built on Falcon 7B and fine-tuned on a 250 million tokens mixture of chat/instruct datasets. Falcon 7B Instruct is mostly trained on English data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.

huggingface-llm-falcon-40b-bf16 Falcon40B

Falcon 40B is a 40 billion parameter causal large language model trained on 1,000 billion tokens enhanced with curated corpora. It is trained mostly on English, German, Spanish, and French, with limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish. It does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.

huggingface-llm-falcon-40b-instruct-bf16 Falcon40BInstruct

Falcon 40B Instruct is a 40 billion parameter causal large language model built on Falcon40B and fine-tuned on a mixture of Baize. It is mostly trained on English and French data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.

huggingface-text2text-flan-t5-large FlanT5L

The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 L is a 780 million parameter large language model trained on numerous languages. You can find the list of the languages supported by Flan T5 L in the details of the model retrieved from your search by model ID in JumpStart's model table.

huggingface-text2text-flan-t5-xl FlanT5XL

The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 XL is a 3 billion parameter large language model trained on numerous languages. You can find the list of the languages supported by Flan T5 XL in the details of the model retrieved from your search by model ID in JumpStart's model table.

huggingface-text2text-flan-t5-xxll FlanT5XXL

The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 XXL is a 11 billion parameter model. You can find the list of the languages supported by Flan T5 XXL in the details of the model retrieved from your search by model ID in JumpStart's model table.

meta-textgeneration-llama-2-7b Llama2-7B

Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks.

meta-textgeneration-llama-2-7b-f Llama2-7BChat

Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter chat model that is optimized for dialogue use cases.

meta-textgeneration-llama-2-13b Llama2-13B

Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks.

meta-textgeneration-llama-2-13b-f Llama2-13BChat

Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter chat model that is optimized for dialogue use cases.

huggingface-llm-mistral-7b Mistral7B

Mistral 7B is a seven billion parameters code and general purpose English text generation model. It can be used in a variety of use cases including text summarization, classification, text completion, or code completion.

huggingface-llm-mistral-7b-instruct Mistral7BInstruct

Mistral 7B Instruct is the fine-tuned version of Mistral 7B for conversational use cases. It was specialized using a variety of publicly available conversation datasets in English.

huggingface-textgeneration1-mpt-7b-bf16 MPT7B

MPT 7B is a decoder-style transformer large language model with 6.7 billion parameters, pre-trained from scratch on 1 trillion tokens of English text and code. It is prepared to handle long context lengths.

huggingface-textgeneration1-mpt-7b-instruct-bf16 MPT7BInstruct

MPT 7B Instruct is a model for short-form instruction following tasks. It is built by fine-tuning MPT 7B on a dataset derived from databricks-dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets.