A Coding Implementation on Introduction to Weight Quantization: Key Aspect in Enhancing Efficiency in Deep Learning and LLMs

0 3 minutes read

1744548850 A Coding Implementation on Introduction to Weight Quantization Key Aspect.png

In the deep learning scene today, improving models for publication in resource -bound environments is more important than ever. Weight estimation treats this need by reducing the accuracy of the model parameters, usually from the values of floating points 32 bits to low bittack representations, which leads to smaller models that can work faster on limited resources devices. This tutorial provides the concept of weight estimation using Pytorch’s dynamic measurement technology on the pre -trained Resnet18 model. The tutorial will explore how to check weight distributions, apply dynamic measurement to the main layers (such as fully connected layers), comparing models sizes, and depicting the resulting changes. This tutorial will prepare you with the theoretical background and the practical skills required to spread deep learning models.

import torch
import torch.nn as nn
import torch.quantization
import torchvision.models as models
import matplotlib.pyplot as plt
import numpy as np
import os


print("Torch version:", torch.__version__)

We import the required libraries such as Pytorch, Torchvision and Matplotlib, and print the Pytorch version, ensuring that all the necessary units are ready to manipulate the model and perception.

model_fp32 = models.resnet18(pretrained=True)
model_fp32.eval()  


print("Pretrained ResNet18 (FP32) model loaded.")

FP32 pre -loaded resin18 (floating) is loaded, evaluation mode, prepared for further treatment and supplementary treatment.

fc_weights_fp32 = model_fp32.fc.weight.data.cpu().numpy().flatten()


plt.figure(figsize=(8, 4))
plt.hist(fc_weights_fp32, bins=50, color="skyblue", edgecolor="black")
plt.title("FP32 - FC Layer Weight Distribution")
plt.xlabel("Weight values")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()

In this mass, the weights are extracted from the fully connected FP32 and flattening, then a graph is drawn to visualize its distribution before applying any amount.

Take out the mass above

quantized_model = torch.quantization.quantize_dynamic(model_fp32, {nn.Linear}, dtype=torch.qint8)
quantized_model.eval()  


print("Dynamic quantization applied to the model.")

We apply the dynamic measurement to the model, specifically targeting the linear layers-to convert them into low accuracy formats, indicating a major technique to reduce the size of the model and conclude inference time.

def get_model_size(model, filename="temp.p"):
    torch.save(model.state_dict(), filename)
    size = os.path.getsize(filename) / 1e6
    os.remove(filename)
    return size


fp32_size = get_model_size(model_fp32, "fp32_model.p")
quant_size = get_model_size(quantized_model, "quant_model.p")


print(f"FP32 Model Size: {fp32_size:.2f} MB")
print(f"Quantized Model Size: {quant_size:.2f} MB")

The function of the assistant is defined to save and verify the size of the form on the disk; Next, it is used to measure and compare the sizes of the original FP32 model and the quantitative model, with the impact of the measurement pressure.

dummy_input = torch.randn(1, 3, 224, 224)


with torch.no_grad():
    output_fp32 = model_fp32(dummy_input)
    output_quant = quantized_model(dummy_input)


print("Output from FP32 model (first 5 elements):", output_fp32[0][:5])
print("Output from Quantized model (first 5 elements):", output_quant[0][:5])

A fake entry tensioner is created to simulate an image, and the FP32 and quantitative models are operated on this input so that you can compare its outputs and verify that quantitative measurement does not significantly change predictions.

if hasattr(quantized_model.fc, 'weight'):
    fc_weights_quant = quantized_model.fc.weight().dequantize().cpu().numpy().flatten()
else:
    fc_weights_quant = quantized_model.fc._packed_params._packed_weight.dequantize().cpu().numpy().flatten()


plt.figure(figsize=(14, 5))


plt.subplot(1, 2, 1)
plt.hist(fc_weights_fp32, bins=50, color="skyblue", edgecolor="black")
plt.title("FP32 - FC Layer Weight Distribution")
plt.xlabel("Weight values")
plt.ylabel("Frequency")
plt.grid(True)


plt.subplot(1, 2, 2)
plt.hist(fc_weights_quant, bins=50, color="salmon", edgecolor="black")
plt.title("Quantized - FC Layer Weight Distribution")
plt.xlabel("Weight values")
plt.ylabel("Frequency")
plt.grid(True)


plt.tight_layout()
plt.show()

In this mass, quantitative weights (after getting rid of the disposal) of the entirely connected layer and compare them via the original fees versus the original FP32 weights to clarify the changes in weight distribution due to the quantitative measurement.

In conclusion, the educational program provided a step -by -step guide to understand and implement weight estimation, highlighting its effect on the size and performance of the model. By identifying the pre -trained Resnet18 model, we have noticed the transformations in weight distributions, the tangible benefits of the form of the form, and the improvement of possible inference speed. This exploration paves the stage for more experiments, such as the implementation of the quantities perception (QAT), which can improve performance in quantitative models.

Here is Clap notebook. Also, do not forget to follow us twitter And join us Telegram channel and LinkedIn GrOup. Don’t forget to join 85k+ ml subreddit.

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically sound and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.