Llama 3.2

Comprehensive Guide for Instruction Fine-tuning of LLaMa 3.2 using MonsterAPI

In this blog, we'll teach you how to fine-tune a llama-3.2 model to generate code using the alpaca Python coding dataset. We'll use LORA, which preserves the pre-trained model knowledge while facilitating its seamless learning of new things.

Sparsh Bhasin

Nov 1, 2024 • 3 min read

Llama-3.2 are a series of open source models that excel in both language and vision tasks. These multimodal models exhibit impressive capabilities and their performance can be further enhanced by fine-tuning them for down stream tasks.

In this blog let us see how we can instruction fine-tune a llama-3.2 model to generate code using the alpaca python coding dataset. We will use LORA as LORA preserves the pre-trained model knowledge while facilitating it to learn new things seamlessly.

Step-by-Step Guide to Fine-tuning LLaMa 3.2

The first step is to install the necessary dependencies:

%%capture
%pip install -U transformers 
%pip install -U datasets 
%pip install -U wandb

Once these are installed we can import them and log into our hf account using an access token:

from transformers import AutoTokenizer
from datasets import load_dataset
from huggingface_hub import notebook_login

notebook_login()

The next step is to load the model tokenizer and use it to prepare our dataset for training. The pre-trained model’s native chat template should be used for any form of instruction fine-tuning jobs as deviating from it will yield very poor results.

dataset = load_dataset('RaagulQB/alpaca_coding_dataset_full')
dataset = dataset['train']

My dataset already has a column called text which contains the prompts in another model’s chat template. So let us remove it first.

dataset = dataset.remove_columns('text')
dataset

Now let us shuffle the dataset

dataset = dataset.shuffle(seed=65)

Next step is to apply the chat template and push the dataset to hub so that we can call the fine-tuner to fine-tune the model.

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")

Make sure you are loading the instruct model’s tokenizer the base LLM’s tokenizer will not have any chat template as it just completes the sentences.

def format_chat_template(row):
    row_json = [{"role": "user", "content": row["instruction"]+row["input"]},
               {"role": "assistant", "content": row["output"]}]
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

The above function applies the chat template to the instruction and response and saves it under a column named text which we can simply pass to the fine-tuner. Just customize the column names inside the format_chat_template for your use case.

Now let us map this to our dataset and push to Huggingface:

dataset = dataset.map(format_chat_template,num_proc=4,)
dataset.push_to_hub("RaagulQB/alpaca_coding_dataset_llama_3.2")

Once this step is done you can simply call the MonsterAPI’s Fine tuning service to fine-tune the model. Our fine-tuner ensures the hardware and software requirements are met for the fine-tuning internally you can just simply make a request and relax as it will take care of fine-tuning and uploading the tuned model to the hub. Here is an example call you can make!

import requests

url = "https://api.monsterapi.ai/v1/finetune/llm"

payload = {
    "deployment_name": "Null",
    "pretrainedmodel_config": {
        "model_path": "meta-llama/Llama-3.2-1B-Instruct",
        "use_lora": True,
        "lora_r": 8,
        "lora_alpha": 16,
        "lora_dropout": 0,
        "lora_bias": "none",
        "use_quantization": False,
        "use_unsloth": False,
        "use_gradient_checkpointing": False,
        "parallelization": "nmp"
    },
    "data_config": {
        "data_path": "RaagulQB/alpaca_coding_dataset_llama_3.2",
        "data_subset": "default",
        "data_source_type": "hub_link",
        "prompt_template": "{text}",
        "cutoff_len": 15000,
        "prevalidated": False
    },
    "training_config": {
        "early_stopping_patience": 5,
        "num_train_epochs": 1,
        "gradient_accumulation_steps": 1,
        "warmup_steps": 50,
        "learning_rate": 0.001,
        "lr_scheduler_type": "reduce_lr_on_plateau",
        "group_by_length": False,
        "preference_optimization": "DONT",
        "optimizer": "adamw_hf"
    },
    "logging_config": { "use_wandb": False },
    "hf_config": {
        "hf_token": "<Your HF Token>",
        "hf_model_path": "RaagulQB/llama3.2-1B-instuct-code"
    },
    "accessorytasks_config": {
        "run_eval_report": False,
        "run_quantize_merge": False
    }
}
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Bearer <your monster api token>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

Feel free to customize the settings as you like it and start fine-tuning. If you feel like making a request is difficult just log into your account and trigger the job using our simplistic UI.

Conclusion

In this blog we have seen how you can fine-tune llama-3.2–8B using MonsterAPI on coding dataset. To sum it up we can load any dataset and prepare it using the base model’s chat template to instruction tune it. In case you wish to use custom chat templates make sure you are using the base model instead of the instruct model to achieve good results. We encourage you to try out various combinations of fine-tuning using MonsterAPI’s Fine-Tuning engine !