AI

A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorch

In this tutorial, we demonstrate a realistic data poisoning attack by manipulating labels in a CIFAR-10 dataset and observing their impact on model behavior. We build a clean and poisoned training pipeline side by side, using a ResNet-style convolutional network to ensure stable and comparable learning dynamics. By selectively converting a portion of samples from a target class to a malicious class during training, we show how subtle corruption in the data pipeline can propagate into systematic misclassification at inference time. verify Full codes here.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report


CONFIG = {
   "batch_size": 128,
   "epochs": 10,
   "lr": 0.001,
   "target_class": 1,
   "malicious_label": 9,
   "poison_ratio": 0.4,
}


torch.manual_seed(42)
np.random.seed(42)

We set up the basic environment required for the experiment and define all the global configuration parameters in one place. We ensure reproducibility by installing random seeds via PyTorch and NumPy. We also explicitly specify the computing device so that the tutorial runs efficiently on both the CPU and GPU. verify Full codes here.

class PoisonedCIFAR10(Dataset):
   def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
       self.dataset = original_dataset
       self.targets = np.array(original_dataset.targets)
       self.is_train = is_train
       if is_train and ratio > 0:
           indices = np.where(self.targets == target_class)[0]
           n_poison = int(len(indices) * ratio)
           poison_indices = np.random.choice(indices, n_poison, replace=False)
           self.targets[poison_indices] = malicious_label


   def __getitem__(self, index):
       img, _ = self.dataset[index]
       return img, self.targets[index]


   def __len__(self):
       return len(self.dataset)

We implement a custom dataset wrapper that enables label poisoning to be controlled during training. We selectively move a configurable portion of samples from the target class to the malicious class while keeping the test data unchanged. We retain the original image data so only the integrity of the sticker is compromised. verify Full codes here.

def get_model():
   model = torchvision.models.resnet18(num_classes=10)
   model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
   model.maxpool = nn.Identity()
   return model.to(CONFIG["device"])


def train_and_evaluate(train_loader, description):
   model = get_model()
   optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
   criterion = nn.CrossEntropyLoss()
   for _ in range(CONFIG["epochs"]):
       model.train()
       for images, labels in train_loader:
           images = images.to(CONFIG["device"])
           labels = labels.to(CONFIG["device"])
           optimizer.zero_grad()
           outputs = model(images)
           loss = criterion(outputs, labels)
           loss.backward()
           optimizer.step()
   return model

We define a lightweight ResNet-based model specifically designed for CIFAR-10 and implement the full training loop. We train the network using standard entropy loss and Adam optimization to ensure stable convergence. We keep the training logic identical on clean and poisoned data to isolate the effect of data poisoning. verify Full codes here.

def get_predictions(model, loader):
   model.eval()
   preds, labels_all = [], []
   with torch.no_grad():
       for images, labels in loader:
           images = images.to(CONFIG["device"])
           outputs = model(images)
           _, predicted = torch.max(outputs, 1)
           preds.extend(predicted.cpu().numpy())
           labels_all.extend(labels.numpy())
   return np.array(preds), np.array(labels_all)


def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, classes):
   fig, ax = plt.subplots(1, 2, figsize=(16, 6))
   for i, (preds, labels, title) in enumerate([
       (clean_preds, clean_labels, "Clean Model Confusion Matrix"),
       (poisoned_preds, poisoned_labels, "Poisoned Model Confusion Matrix")
   ]):
       cm = confusion_matrix(labels, preds)
       sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax[i],
                   xticklabels=classes, yticklabels=classes)
       ax[i].set_title(title)
   plt.tight_layout()
   plt.show()

We perform inference on the test set and collect predictions for quantitative analysis. We calculate confusion matrices to visualize the class behavior of both clean and poisoned models. We use these visual diagnostics to highlight target misclassification patterns introduced by the attack. verify Full codes here.

transform = transforms.Compose([
   transforms.RandomHorizontalFlip(),
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465),
                        (0.2023, 0.1994, 0.2010))
])


base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
base_test = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)


clean_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=0)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])


clean_loader = DataLoader(clean_ds, batch_size=CONFIG["batch_size"], shuffle=True)
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
test_loader = DataLoader(base_test, batch_size=CONFIG["batch_size"], shuffle=False)


clean_model = train_and_evaluate(clean_loader, "Clean Training")
poisoned_model = train_and_evaluate(poison_loader, "Poisoned Training")


c_preds, c_true = get_predictions(clean_model, test_loader)
p_preds, p_true = get_predictions(poisoned_model, test_loader)


plot_results(c_preds, c_true, p_preds, p_true, classes)


print(classification_report(c_true, c_preds, target_names=classes, labels=[1]))
print(classification_report(p_true, p_preds, target_names=classes, labels=[1]))

We prepare the CIFAR-10 dataset, create clean and poisoned data loaders, and implement both end-to-end training pipelines. We evaluate the trained models on a common test set to ensure a fair comparison. We conclude the analysis by reporting category-specific precision and recall to reveal the effect of toxicity on the target category.

In conclusion, we observed how label-level data poisoning degrades class-specific performance without necessarily destroying overall accuracy. We analyzed this behavior using confusion matrices and classification reports for each category, which reveal the target failure modes introduced by the attack. This experience reinforces the importance of data provenance, validation, and monitoring in real-world machine learning systems, especially in safety-critical domains.


verify Full codes here. Also, feel free to follow us on twitter Don’t forget to join us 100k+ mil SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.

Check out our latest version of ai2025.deva 2025-focused analytics platform that turns model launches, performance benchmarks, and ecosystem activity into a structured data set that you can filter, compare, and export.


Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of AI for social good. His most recent endeavor is the launch of the AI ​​media platform, Marktechpost, which features in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand by a broad audience. The platform has more than 2 million views per month, which shows its popularity among the masses.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2026-01-11 15:47:00

Related Articles

Back to top button