AI/ML Supply Chain¶

Machine learning adds new dimensions to the supply chain: models, datasets, training pipelines, and inference frameworks. Each is a dependency that can fail, be compromised, or disappear.

Downloading Weights from Strangers

You're downloading model weights from strangers on the internet and running them on your machine. Think about that for a moment. That Hugging Face model could have been trained on anything by anyone. Pickle files can execute arbitrary code. Datasets might contain poisoned examples designed to create backdoors. ML supply chain is the Wild West—all the software supply chain problems plus entirely new categories of risk.

The ML Dependency Stack¶

Traditional software has code dependencies. ML has more:

Layer	Examples	Risks
Code	PyTorch, TensorFlow, scikit-learn	Same as any software
Models	GPT-2, ResNet, BERT	Provenance, poisoning, bias
Datasets	ImageNet, Common Crawl	Bias, licensing, privacy
Compute	CUDA, cuDNN, hardware	Availability, reproducibility
Weights	Pre-trained parameters	Integrity, versioning

Each layer can introduce vulnerabilities, biases, and failure modes.

Model Supply Chain¶

Where Models Come From¶

Hugging Face Hub: Largest collection of open models. Anyone can upload.

Official releases: Model authors publish directly (OpenAI, Meta, Google).

Third-party training: Someone else trained it on unknown data.

Model Risks¶

Provenance unknown: Who trained this? On what data? With what objectives?

Poisoning attacks: Models can be trained to behave maliciously on specific inputs while appearing normal on others.¹

Backdoors: Hidden behaviors triggered by specific patterns.

Weight tampering: Modified weights after training.

Bias: Models inherit biases from training data.

Model Verification¶

Checksums: Verify downloaded weights match expected hashes.

import hashlib

def verify_model(filepath, expected_hash):
    sha256 = hashlib.sha256()
    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(4096), b''):
            sha256.update(chunk)
    return sha256.hexdigest() == expected_hash

Provenance documentation: Require model cards with training details.

# Model Card (modelcard.yaml)
model_name: my-model
training_data:
  - dataset: imagenet-1k
    version: "2012"
    license: custom-academic
training_compute: 8x A100, 72 hours
carbon_footprint: estimated 450 kg CO2

Behavioral testing: Test models against known-good outputs.

Dataset Supply Chain¶

Dataset Risks¶

Privacy leakage: Training data may contain personal information that models memorize.²

License violations: Dataset may include copyrighted material.

Label poisoning: Incorrect labels degrade model quality.

Data drift: Real-world distribution differs from training data.

Dataset Best Practices¶

Version your datasets:

# Using DVC
dvc add data/training/
git add data/training.dvc
git commit -m "Dataset v2.0"

Document provenance:

# data/MANIFEST.yaml
dataset:
  name: customer-transactions
  version: "2.0.0"
  created: "2024-01-15"
  sources:
    - system: transactions-db
      query: "SELECT * FROM transactions WHERE year >= 2023"
      date_extracted: "2024-01-15"
  preprocessing:
    - removed PII columns: [email, phone, address]
    - aggregated to daily level
  validation:
    row_count: 1_250_000
    date_range: "2023-01-01 to 2023-12-31"

Hash datasets:

import hashlib
import pandas as pd

def hash_dataframe(df):
    return hashlib.sha256(
        pd.util.hash_pandas_object(df).values.tobytes()
    ).hexdigest()

Framework Dependencies¶

The Heavy Dependency Problem¶

ML frameworks have enormous dependency trees:

# Installing PyTorch can pull in hundreds of packages
pip install torch
# Plus CUDA, cuDNN, NCCL for GPU support

Framework Versioning¶

# requirements.txt - pin precisely
torch==2.1.0
torchvision==0.16.0
torchaudio==2.1.0
# These versions must be compatible

PyTorch/CUDA compatibility matrix:

PyTorch	CUDA	cuDNN
2.1.x	11.8, 12.1	8.x
2.0.x	11.7, 11.8	8.x
1.13.x	11.6, 11.7	8.x

Version mismatches cause silent failures or cryptic errors.

Native Dependencies¶

ML often requires native libraries:

# Dockerfile for ML workloads
FROM nvidia/cuda:12.1-cudnn8-devel-ubuntu22.04

# System dependencies
RUN apt-get update && apt-get install -y \
    python3.11 \
    python3-pip \
    libopenblas-dev \
    libomp-dev

# Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

Reproducibility Challenges¶

Sources of Non-Reproducibility¶

Hardware differences: GPU vs CPU, different GPU architectures.

# Results differ between devices
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.use_deterministic_algorithms(True)

Non-deterministic operations: Some operations are non-deterministic by design for performance.

# Set seeds everywhere
import random
import numpy as np
import torch

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    # For complete reproducibility
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'

Floating point non-associativity: (a + b) + c ≠ a + (b + c) in floating point.

Experiment Tracking¶

Track everything:

import mlflow

with mlflow.start_run():
    # Log parameters
    mlflow.log_params({
        'learning_rate': 0.001,
        'batch_size': 32,
        'epochs': 100,
        'seed': 42,
    })

    # Log environment
    mlflow.log_artifact('requirements.txt')
    mlflow.log_param('torch_version', torch.__version__)
    mlflow.log_param('cuda_version', torch.version.cuda)

    # Train...

    # Log metrics
    mlflow.log_metrics({
        'accuracy': 0.95,
        'loss': 0.12,
    })

    # Log model
    mlflow.pytorch.log_model(model, 'model')

Security Considerations¶

Model Serialization¶

Pickle is dangerous:

# Models are often saved with pickle
torch.save(model, 'model.pt')  # Uses pickle internally

# Pickle can execute arbitrary code on load
# Only load models from trusted sources

Safer alternatives:

# Save only weights (safer)
torch.save(model.state_dict(), 'weights.pt')

# Load into known architecture
model = MyModel()
model.load_state_dict(torch.load('weights.pt'))

Model Scanning¶

Some tools scan models for malicious payloads:

# Fickling - analyze pickle files for code execution
pip install fickling
fickling model.pt

# ModelScan - broader model security scanning
pip install modelscan
modelscan model.pt

Inference Security¶

Input validation:

def predict(input_data):
    # Validate input shape
    if input_data.shape != expected_shape:
        raise ValueError(f"Expected shape {expected_shape}")

    # Validate input range
    if input_data.min() < -1 or input_data.max() > 1:
        raise ValueError("Input must be normalized to [-1, 1]")

    return model(input_data)

Resource limits:

# Prevent DoS via large inputs
MAX_INPUT_SIZE = 1024 * 1024  # 1MB

def predict_safe(input_data):
    if len(input_data) > MAX_INPUT_SIZE:
        raise ValueError("Input too large")
    # Process...

Model Cards and Documentation¶

What to Document¶

# Model Card: sentiment-classifier-v1

## Model Description
- **Model type:** DistilBERT fine-tuned for sentiment classification
- **Training data:** IMDb reviews dataset (50k examples)
- **Intended use:** English movie review sentiment analysis

## Training Details
- **Framework:** PyTorch 2.1.0
- **Training compute:** 1x A100 GPU, 4 hours
- **Hyperparameters:** learning_rate=2e-5, batch_size=16, epochs=3

## Evaluation
| Metric | Value |
|--------|-------|
| Accuracy | 92.5% |
| F1 Score | 0.92 |

## Limitations
- English only
- Trained on movie reviews; may not generalize to other domains
- May exhibit biases present in IMDb reviews

## Ethical Considerations
- Should not be used for decisions affecting individuals
- Review sentiment is subjective; model reflects training data biases

Provenance Chain¶

Document the full lineage:

# provenance.yaml
model:
  name: sentiment-classifier-v1
  version: "1.0.0"
  created: "2024-01-15"

base_model:
  name: distilbert-base-uncased
  source: huggingface
  version: "1.0"
  sha256: abc123...

training_data:
  name: imdb-reviews
  version: "1.0"
  source: huggingface/datasets
  sha256: def456...

dependencies:
  torch: "2.1.0"
  transformers: "4.35.0"
  datasets: "2.15.0"

training:
  script: train.py
  script_sha256: ghi789...
  random_seed: 42
  gpu: "NVIDIA A100"
  duration_hours: 4

Silent Failures

ML failures are often silent. A poisoned model still produces outputs—just wrong ones for certain inputs. A biased dataset produces a biased model that seems to work fine until it doesn't. You won't get an error message. You'll get wrong answers that look plausible. Treat ML artifacts with the same skepticism you'd apply to running code from the internet—because that's exactly what they are.

Quick Reference¶

ML Dependency Checklist¶

Framework versions pinned (PyTorch, TensorFlow, etc.)
CUDA/cuDNN versions documented
Model weights verified with checksums
Dataset provenance documented
Random seeds set and documented
Training script versioned
Model card created
Experiment tracked (MLflow, W&B, etc.)

Security Checklist¶

Models from trusted sources only
Avoid loading pickled objects from untrusted sources
Input validation for inference
Resource limits for inference
Model scanned for malicious payloads