Skip to main content

A Tool for adaptation Larger Transfomer-Based model and Quantization built top on libraries LoRa and LoRa-Torch.

Project description

Adapter-LoRa for Quantization

LoRa-Logo

Made With Love GitHub issues GitHub forks GitHub stars GitHub license

Comparative Features of "loralib" and "loratorch" Implementations

Distinguishing the "loralib" and "loratorch" Approaches for Implementation

The implementations of "loralib" and "loratorch" exhibit distinct methodologies, particularly when using the example of nn.Linear. The underlying mathematical representations are as follows:

    • loralib Approach

The computation is defined as:

[ h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top, ]

where:

  • x is an input matrix of dimensions (k \times n),
  • W_0 is a pre-trained weight matrix of dimensions (m \times n),
  • r is a predefined LoRA rank,
  • B and A are LoRA matrices of dimensions (m \times r) and (r \times n) respectively,
  • \alpha is a hyper-parameter.
  1. For loralib, $h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top,$

where $x\in\mathbb{R}^{k\times n}$ is the input matrix, $W_0\in\mathbb{R}^{m\times n}$ is the pre-trained weight matrix, $r$ is the predefined LoRA rank, $B\in\mathbb{R}^{m\times r}$ and $A\in \mathbb{R}^{r\times n}$ are the LoRA matrixes, and $\alpha$ is a hyper-parameter.

  1. For loratorch, $h = x (W_0 + \frac{\alpha}{r} BA)^\top.$

loralib computes $xW_0^\top$ and $x(BA)^\top$ respectively and then merges the results. While loratorch merges pre-trained weight $W_0$ and its LoRA weight $BA$ and then computes the results by simply using nn.Linear.forward(). There is no difference between loralib and loratorch in the linear layers. But in some no-linear or complex layers, we are no sure whether this layer satisfies $L(x, W_0)+L(x, BA) = L(x, W_0+BA)$. Hence, it is difficult to extend LoRA to some complex layers by using loralib. On the contrary, the idea of merging weights first in loratorch is more general and extensible. You just call merge_lora_param() in loratorch to merge weights and then call forward() in the original layer to compute the results. With the help of loratorch, you can easily implement LoRA to any type of layer of torch.nn.

Supported Layers

loralib loratorch
nn.Linear linear.ipynb
nn.Embedding embedding.ipynb
nn.Conv1d
nn.Conv2d
nn.Conv3d
nn.MultiheadAttention
MergedLinear ✓ (Error) mergedlinear.ipynb
$\cdots$ hard to extend easy to extend

We compare the results of loralib and loratorch in examples to demonstrate the correctness of the implementation in loratorch.

Quick Start

The usage of AdapterLoRa

  1. Install AdapterLoRa.

    pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
    
pip install AdapterLoRa

Usage Tool AdpaterLoRa

import torch.nn as nn
import torch
from core.Quantized import AdapterLoRa

model = nn.TransformerEncoderLayer(d_model=512, nhead=8)

Adpate_model = AdapterLoRa(model , method="LoRa", Rank=4)

"""
adding Linear Layer built Self.attention 
Replace the layers where you would like to use AdapterLoRa by using  add_layer function.
"""

Adpate_model.add_layer("self_attn") 
Adpate_model.add_layer("linear1")
Adpate_model.add_layer("linear2")

# reconstruct model Quantized 
Adpate_model.reconstruct_model()

# Iplmented LoRa Method
model = Adpate_model.implement_lora(verbose=True)
# Total trainable parameters before LoRA: 3176960
# Total trainable parameters after LoRA: 24576

# This sets requires_grad to False for all parameters without the string "lora_" in their names

# Training loop
for batch in dataloader:
    model.train()

Saving Wieghts model

  • Save LoRA model (only the LoRA matrixes will be saved).
import loralib as lora 
# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch.save(lora.lora_state_dict(model), checkpoint_path)

Loading the Pre-Trained Model

  • Load LoRA model (need to load the pre-trained model first).
import loralib as lora 
# Load the pre-trained checkpoint first
model.load_state_dict(torch.load('ckpt_pretrained.pt'), strict=False)
# Then load the LoRA checkpoint
model.load_state_dict(torch.load('ckpt_lora.pt'), strict=False)
  • Quantized Model

  • Time to Train

  • Cost to Train

What's in it for you?

For each of the above four pillars, we are sharing our codebase and insights to:

  • Assist you to leverage Transfomer-Based Model for your machines needs and challenges

  • Boost reproducibility efforts which are becoming increasingly difficult with Transfomers

i am providing Tool that are ready-to-use for Quantize the model:

  • Finetuning Transfomer-Based on your proprietary dataset via PeFT methodologies such as LoRA and QLoRa

  • Performing hyperparameter optimization to get the maximum performance out of these models

What's the best way to use this repository?

Go over to the Transfomer-Based-specific directory that you are interested in, and open the README.md. We have included details about the LLMs, followed by performance results on open-source datasets!

Roadmap

Our plan is to perform these experiments on all the Transformer-Based model below. To that end, this is a tentative roadmap of the LLMs that we aim to cover:

  • TransfomerEncoder
  • TransfomerDecoder
  • Vision-Transfomer
  • minGPT
  • OpenAI GPT-2
  • Inflection Pi Under Progress

Correspondence

Contributor

AdapterLoRa is developed and maintained by ''Youness ELbrag'' (Email | LinkedIn)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AdapterLoRa-2.0.0.tar.gz (7.3 kB view hashes)

Uploaded Source

Built Distribution

AdapterLoRa-2.0.0-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page