Prepare and upload training data for domain adaptation fine-tuning Create a training job for instruction-based fine-tuning Example notebooks

Fine-tune a large language model (LLM) using domain adaptation

Domain adaptation fine-tuning allows you to leverage pre-trained foundation models and adapt them to specific tasks using limited domain-specific data. If prompt engineering efforts do not provide enough customization, you can use domain adaption fine-tuning to get your model working with domain-specific language, such as industry jargon, technical terms, or other specialized data. This fine-tuning process modifies the weights of the model.

To fine-tune your model on a domain-specific dataset:

Prepare your training data. For instructions, see Prepare and upload training data for domain adaptation fine-tuning.
Create your fine-tuning training job. For instructions, see Create a training job for instruction-based fine-tuning.

You can find end-to-end examples in Example notebooks.

Domain adaptation fine-tuning is available with the following foundation models:

Note

Some JumpStart foundation models, such as Llama 2 7B, require acceptance of an end-user license agreement before fine-tuning and performing inference. For more information, see End-user license agreements.

Bloom 3B
Bloom 7B1
BloomZ 3B FP16
BloomZ 7B1 FP16
GPT-2 XL
GPT-J 6B
GPT-Neo 1.3B
GPT-Neo 125M
GPT-NEO 2.7B
Llama 2 13B
Llama 2 13B Chat
Llama 2 13B Neuron
Llama 2 70B
Llama 2 70B Chat
Llama 2 7B
Llama 2 7B Chat
Llama 2 7B Neuron

Prepare and upload training data for domain adaptation fine-tuning

Training data for domain adaptation fine-tuning can be provided in CSV, JSON, or TXT file format. All training data must be in a single file within a single folder.

The training data is taken from the Text column for CSV or JSON training data files. If no column is labeled Text, then the training data is taken from the first column for CSV or JSON training data files.

The following is an example body of a TXT file to be used for fine-tuning:


This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of ....

Split data for training and testing

You can optionally provide another folder containing validation data. This folder should also include one CSV, JSON, or TXT file. If no validation dataset is provided, then a set amount of the training data is set aside for validation purposes. You can adjust the percentage of training data used for validation when you choose the hyperparameters for fine-tuning your model.

Upload fine-tuning data to Amazon S3

Upload your prepared data to Amazon Simple Storage Service (Amazon S3) to use when fine-tuning a JumpStart foundation model. You can use the following commands to upload your data:


from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.txt"
train_data_location = f"s3://{output_bucket}/training_folder"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Create a training job for instruction-based fine-tuning

After your data is uploaded to Amazon S3, you can fine-tune and deploy your JumpStart foundation model. To fine-tune your model in Studio, see Fine-tune a model in Studio. To fine-tune your model using the SageMaker Python SDK, see Fine-tune publicly available foundation models with the JumpStartEstimator class.

Example notebooks

For more information on domain adaptation fine-tuning, see the following example notebooks:

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Fine-tuning

Fine-tune a model with prompt instructions