The Build Environment¶

"Works on my machine" isn't good enough. This chapter covers containerization, reproducibility, and the fundamentals of build environments—not the specific tools, but the principles that make them work.

Five Hundred Times

"Works on my machine." I've heard this sentence five hundred times. The answer is always the same: the environment is different. And the solution is always the same: stop treating your laptop like it's the canonical build environment. It's not. It never was.

"Works on My Machine" Is Not a Build System¶

You've heard this conversation:

Developer A: "The tests pass." Developer B: "They fail for me." Developer A: "Works on my machine."

The code is identical. The tests are identical. Why different results?

The environment is different:

Operating system (macOS vs Linux vs Windows)
Runtime version (Python 3.9 vs 3.11)
System libraries (different libssl versions)
Environment variables (different PATH, different configs)
Installed packages (different global packages)
File system (case-sensitive vs case-insensitive)
Line endings (LF vs CRLF)
Locale settings (UTF-8 vs ASCII defaults)

Code doesn't run in isolation. It runs in an environment. If the environments differ, behavior can differ.

The Reproducibility Problem¶

For a build to be reproducible, you need to control:

Layer	Examples	Typical Variability
Hardware	CPU architecture, memory	Usually consistent (x86_64)
OS	Linux, macOS, Windows	Often varies
OS version	Ubuntu 20.04 vs 22.04	Varies, matters more than you'd think
System packages	openssl, libffi, compilers	Varies by OS and version
Runtime	Python, Node, Ruby version	Varies unless controlled
Dependencies	Your lock file	Controlled if you have lock files
Code	Your source	Controlled by git
Configuration	Environment variables	Often varies

Most developers control the bottom two (code and dependencies) and leave the rest to chance.

Tool Version Management¶

Before containers, manage runtime versions explicitly.

The Problem¶

# Developer A
$ python --version
Python 3.9.7

# Developer B
$ python --version
Python 3.11.4

# Same code, different behavior

The Solutions¶

pyenv (Python):

# Install specific version
pyenv install 3.11.4

# Set for this project
pyenv local 3.11.4

# .python-version file created
cat .python-version
3.11.4

nvm (Node):

# Install specific version
nvm install 18.17.0

# Use for this project
nvm use 18.17.0

# .nvmrc file
echo "18.17.0" > .nvmrc

mise (multi-language):

# Install tools
mise install python@3.11.4
mise install node@18.17.0

# .mise.toml for project
[tools]
python = "3.11.4"
node = "18.17.0"

asdf (multi-language):

# .tool-versions
python 3.11.4
nodejs 18.17.0

Commit Your Version Files¶

git add .python-version  # or .nvmrc, .mise.toml, .tool-versions
git commit -m "Pin Python version to 3.11.4"

Now everyone on the team (and CI) uses the same version.

Containerization Fundamentals¶

Containers solve the "works on my machine" problem by packaging the entire environment.

What Containers Actually Are¶

Containers aren't virtual machines. They're isolated processes with their own view of the system:

Namespaces — Isolated view of processes, network, filesystem
cgroups — Resource limits (CPU, memory)
Union filesystems — Layered, efficient image storage

The key insight: it's still Linux running Linux processes. There's no hypervisor, no emulated hardware. Just isolation.

What Containers Give You¶

Reproducible environments: The Dockerfile defines exactly what's in the environment. Same Dockerfile → same environment.

Isolation: Dependencies inside the container don't conflict with your system. Multiple projects can have different requirements.

Portability: If it runs in the container locally, it runs in the container on the server. (Mostly. ARM vs x86 is still a thing.)

"Works on my machine" that actually transfers: The machine is the container, and everyone has the same one.

What Containers Don't Give You¶

Security (by default): Containers share a kernel with the host. Container escapes are possible. Don't run untrusted code in containers expecting safety.

Simplicity: You've added a layer, not removed complexity. Now you have container problems in addition to application problems.

Magic: If you don't understand the Dockerfile, you don't understand your build. The complexity is still there—it's just in a different file.

Understanding: AI-generated Dockerfiles are particularly dangerous here. The AI can produce a working Dockerfile without you understanding what it does. You've containerized your confusion. The build works, but you can't debug it when it fails, can't optimize it, can't secure it. See Vibe Coding for more on AI as a complexity-hiding tool.

Dockerfile Fundamentals¶

A Dockerfile defines how to build a container image.

Basic Structure¶

# Start from a base image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy dependency files first (layer caching)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Define how to run
CMD ["python", "main.py"]

Layer Caching¶

Docker caches layers. Order matters:

# GOOD: Dependencies change less often than code
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

# BAD: Any code change invalidates dependency cache
COPY . .
RUN pip install -r requirements.txt

Put things that change rarely first, things that change often last.

Base Image Selection¶

Use Case	Base Image	Size
Development	`python:3.11`	~900MB
Production	`python:3.11-slim`	~150MB
Minimal	`python:3.11-alpine`	~50MB
Ultra-minimal	`distroless`	~20MB

Smaller images have less attack surface and faster transfers. But alpine can have compatibility issues (musl vs glibc).

Security Basics¶

# Pin the base image with digest (not just tag)
FROM python:3.11-slim@sha256:abc123...

# Don't run as root
RUN useradd --create-home appuser
USER appuser

# Don't install unnecessary packages
RUN pip install --no-cache-dir -r requirements.txt

# Don't leave secrets in layers
# (Use build secrets or runtime injection instead)

Multi-Stage Builds¶

Separate build-time dependencies from runtime:

# Build stage
FROM python:3.11 AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

COPY . .
RUN python setup.py build

# Runtime stage
FROM python:3.11-slim

# Copy only what's needed
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app/dist /app

WORKDIR /app
CMD ["python", "main.py"]

Build tools, compilers, and dev dependencies stay in the build stage. Only runtime requirements go in the final image.

Docker Compose for Development¶

For multi-service development, docker-compose defines the stack:

# docker-compose.yml
version: '3.8'

services:
  app:
    build: .
    volumes:
      - .:/app  # Mount code for live reload
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgres://db:5432/app
    depends_on:
      - db

  db:
    image: postgres:15
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=app
      - POSTGRES_PASSWORD=devpassword

volumes:
  pgdata:

docker-compose up  # Start everything
docker-compose down  # Stop everything

When You Don't Need Containers¶

Containers add complexity. Not every project needs them.

Skip Containers When:¶

Single-language, simple dependencies — Virtual environments are enough
Solo development — The reproducibility benefit is smaller
Quick scripts — Overhead isn't worth it
You don't understand them yet — Learn the fundamentals first

Use Containers When:¶

Multiple people/environments — "Works on my machine" keeps happening
CI/CD pipelines — Consistent build environments
Complex system dependencies — Native libraries, specific OS requirements
Production deployment — Immutable, versioned artifacts
Polyglot stacks — Multiple languages with conflicting requirements

The Hierarchy of Deployment Needs¶

Not everything needs Kubernetes:

Level	Solution	When to Use
0	Script on a server	Single-use, one server
1	Virtual environment + systemd	Python app, dedicated server
2	Docker + docker-compose	Multi-service, single host
3	Managed containers (ECS, Cloud Run)	Need scaling, don't want K8s
4	Kubernetes	True orchestration needs

Most research software never needs to leave Level 2.

The right question isn't "should I use Kubernetes?" It's "what's the simplest thing that solves my actual problem?"

Not Magic

I've watched containerization go from "weird Linux thing" to "default assumption." That's mostly good—the reproducibility benefits are real.

But I've also watched teams adopt containers because it's what you're supposed to do, without understanding what problems containers solve. They end up with all the complexity and none of the benefits.

Containers aren't magic. They're a way to package an environment. If you understand your environment—what versions of what software you need, how they're configured, where the state lives—containers are a convenient way to express that understanding. If you don't understand your environment, containers just hide your confusion in a Dockerfile. The confusion is still there. Now it's just harder to debug.

Start by understanding what your code needs to run. Document that. Version that. Then decide if containers are the right tool to enforce it.

Quick Reference¶

Dockerfile Checklist¶

Pinned base image (with digest for production)
Non-root user
Minimal base image (slim, alpine, distroless)
Multi-stage build if applicable
Layer ordering optimized for caching
No secrets in image
.dockerignore configured

Environment Reproducibility Layers¶

What	How to Control
OS/Architecture	Base image
System packages	Dockerfile RUN apt-get...
Runtime version	Base image tag
Dependencies	Lock file
Code	Git
Configuration	Environment variables, config files

Development vs Production¶

Aspect	Development	Production
Base image	Full	Slim/distroless
Volume mounts	Yes (live reload)	No
Debugging tools	Yes	No
Security hardening	Less critical	Essential
Image size	Less important	Minimize