15  Virtual Environments

Prerequisites (read first if unfamiliar): Chapter 14, Chapter 11.

See also: Chapter 16, Chapter 17.

Purpose

Galaxy Brain Meme: Install Packages with sudo pip, Use a venv, Use conda, Install Python from Scratch.

The single most common source of the “it worked yesterday” bug in Python data science is a confused environment. You install a package, try to import it, and Python says “no such module.” You upgrade pandas for one project and suddenly a different project starts throwing errors. You fix the error in your terminal and the notebook still fails. All of these are symptoms of the same underlying problem: your computer has several Pythons, each with its own set of packages, and it is not obvious which one is running at any given time.

A virtual environment is the fix. A venv is a self-contained Python installation scoped to one project, with its own interpreter and its own packages. When you “activate” it, your shell and your tools point at that project’s Python instead of the system-wide one. Different projects live in different venvs and do not interfere with each other.

This chapter explains what a venv is (and is not), how to create and activate one with python -m venv, how this compares to conda, how to diagnose the “which Python is running?” question on any machine, and how to make sure Jupyter and VS Code are using the environment you think they are. Chapter 14 covers conda in depth; this chapter focuses on venv and on the diagnostic skills that apply to both.

Learning objectives

By the end of this chapter, you should be able to:

  1. Explain what a virtual environment is, what problem it solves, and why every real project should have one.
  2. Create a venv with python -m venv .venv, activate it on macOS/Linux and Windows, and deactivate it.
  3. Install packages into the active venv with pip install and freeze them into requirements.txt.
  4. Diagnose “which Python is running?” using which python, python -c "import sys; print(sys.executable)", and pip show <pkg>.
  5. Compare venv and conda and pick the right one for a project.
  6. Make Jupyter and VS Code use your project’s venv by registering a kernel and setting the interpreter.
  7. Recognize and fix the three most common venv failure modes: not activated, wrong interpreter, stale activation in a new shell.

Running theme: activate early, verify always

A venv that is not activated is not helping you. Make activation the first thing you do when you open a terminal for a project, and verify with a one-liner that the right Python is in use before you install anything.

15.1 What a virtual environment actually is

A venv is, mechanically, a directory on disk (typically named .venv or venv inside your project folder) that contains:

  • A copy or symlink of a Python interpreter (for example .venv/bin/python on macOS/Linux, .venv\Scripts\python.exe on Windows).
  • A lib/python3.11/site-packages/ directory where packages installed into this environment live.
  • A small activate script that, when sourced, modifies the shell’s PATH so that typing python runs the venv’s interpreter instead of the system one.

That is the whole trick. There is no sandboxing and no virtualization. The venv’s Python is just a different binary on disk; “activating” just puts it at the front of PATH. When you pip install pandas inside an activated venv, pip looks up which python is first on the PATH, finds the venv’s interpreter, and installs pandas into its site-packages. Your system-wide Python is untouched.

The benefits are enormous:

  • Isolation. One project’s pandas==1.5 does not collide with another project’s pandas==2.2.
  • Reproducibility. You can record exactly which packages your project needs in a requirements.txt file and a collaborator can recreate the same environment in 30 seconds.
  • Cleanup. If an environment gets corrupted, you delete the directory and make a new one. No system-wide uninstalls.

15.2 Creating and activating a venv

The standard Python library ships with the venv module. There is no install step; if you have Python 3, you have venv.

Create a venv in the current project directory:

python -m venv .venv

The most common failure is python is not on your PATH, which shows up as command not found or as your terminal launching the wrong interpreter. On macOS and Linux, try python3 -m venv .venv instead — some systems ship python3 without a plain python. On Windows, if python opens the Microsoft Store, go to Settings → Apps → Advanced app settings → App execution aliases and turn off the python.exe and python3.exe aliases, then reinstall Python from python.org.

If the command runs but creates an empty folder, check that the directory is writable (ls -la on macOS/Linux, right-click → Properties on Windows). Venv creation silently fails on synced cloud folders where Python cannot write fast enough — move the project off OneDrive or iCloud Drive and try again.

Still stuck? See Chapter 2 for how to gather the evidence a helper will need.

This creates a .venv/ folder. Add .venv/ to your .gitignore — you never commit a venv to git. See Chapter 31.

Activate it (the command differs by OS and shell):

# macOS / Linux, bash or zsh
source .venv/bin/activate

# Windows, PowerShell
.venv\Scripts\Activate.ps1

# Windows, cmd
.venv\Scripts\activate.bat

After activation your shell prompt usually changes to show the env name, e.g. (.venv) you@host:~/project$. If you don’t see it, that is a hint something is off.

Figure 15.1: ALT: Terminal prompt before and after activating a virtual environment. The “after” prompt shows a (.venv) prefix at the start of the line, confirming the environment is active.

Install packages into the active venv:

python -m pip install pandas numpy matplotlib
# https://pip.pypa.io/en/stable/cli/pip_install/

Using python -m pip (rather than just pip) is a small but valuable habit — it guarantees you are installing into the same Python that python runs. If pip and python ever get out of sync (a classic source of confusion), python -m pip sidesteps the problem entirely.

Record your dependencies so others can reproduce the environment:

python -m pip freeze > requirements.txt

Commit requirements.txt to git. A collaborator then recreates the env with:

python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Deactivate when you are done:

deactivate

Deactivation just removes the venv from your PATH; it does not touch the files.

15.3 Which Python am I running?

This is the single most valuable diagnostic in this chapter. Paste it into your shell any time you are confused:

which python          # macOS/Linux
where python          # Windows

Or, cross-platform, ask Python itself:

python -c "import sys; print(sys.executable)"

The output should be the path to your venv’s interpreter (for example /home/you/project/.venv/bin/python). If instead you see something like /usr/bin/python3 or /opt/homebrew/bin/python3, your venv is not active and the next pip install will pollute your system Python.

You can also ask which Python a specific package is installed into:

python -m pip show pandas

Look at the Location: line. If it says .../.venv/lib/python3.11/site-packages, pandas lives in your venv. If it says .../.local/lib/python3.11/site-packages or /usr/lib/python3/dist-packages, the install leaked out of the venv.

15.4 venv vs. conda

Both venv and conda create isolated environments. They differ in a few important ways:

Feature venv + pip conda
Ships with Python yes no (install Miniconda or Anaconda)
Manages non-Python packages no yes (e.g., CUDA, R, compilers)
Package source PyPI conda-forge, bioconda, defaults, PyPI
Environment directory inside project (.venv/) centralized (~/miniconda3/envs/name/)
Good for web/data science with pure-Python deps data science with heavy native deps, cross-language stacks

As a rough rule: start with venv for most projects. It is simpler, ships with Python, keeps the environment next to the code, and works identically everywhere. Use conda when you need compiled packages that are hard to install via pip (CUDA-enabled PyTorch, GDAL, certain bioinformatics tools) or when a course or collaborator has standardized on it. Mixing conda and pip in the same environment works most of the time but is a source of occasional pain — see Chapter 14 for the rules.

You can have both on the same machine — they do not conflict. Just do not try to “activate” a venv while a conda env is also active. Activate one or the other.

15.5 Jupyter and venvs

Jupyter does not automatically see your venv. A Jupyter notebook runs Python through a kernel, and by default it may be pointed at your system Python. This causes the classic bug: you pip install pandas in your venv, open a notebook, and import pandas fails.

The fix is to register your venv as a Jupyter kernel. From inside the activated venv:

python -m pip install ipykernel
python -m ipykernel install --user --name project-venv --display-name "Python (project-venv)"

Now when you open a notebook in Jupyter Lab or VS Code, you can pick Python (project-venv) from the kernel dropdown. Inside the notebook, verify you are on the right kernel by running:

import sys
print(sys.executable)

If that path points at your .venv/bin/python, you are good. If it points at /usr/bin/python3, switch the kernel. See Chapter 16 for more on kernel management.

15.6 VS Code and venvs

VS Code auto-detects venvs in your project directory. When you open a workspace with a .venv/ folder inside, VS Code prompts: “We noticed a new virtual environment. Would you like to select it?” Click yes and VS Code will use that interpreter for IntelliSense, linting, and the integrated terminal.

If it does not auto-detect, you can pick manually:

  1. Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the Command Palette.
  2. Type “Python: Select Interpreter” and press Enter.
  3. Pick the entry whose path contains your project’s .venv.

The selected interpreter shows in the bottom-right of the VS Code status bar. Click it any time to switch.

15.7 When venvs are not enough: containers

A venv solves one specific problem: isolating Python packages across projects. It does not isolate the Python interpreter version itself reliably across machines, it does not isolate non-Python system libraries, and it does not isolate services your code depends on like Postgres or Redis. For most coursework, that is fine — a requirements.txt plus the right Python version on every collaborator’s laptop is enough. For projects that touch native libraries, span operating systems, or need to deploy somewhere other than your own machine, you eventually need something stronger: a container.

A container is a packaged operating-system environment that runs the same way everywhere. Conceptually, a venv pins your Python packages; a container pins your entire userspace — the OS distribution, the system libraries, the Python interpreter, the installed packages, and the project code. A container that runs on your MacBook also runs on your collaborator’s Windows laptop, on a Linux server, and on a cloud platform, byte-for-byte the same. The most common implementation you will encounter as a student is Docker.

A minimal Dockerfile

A Dockerfile is a recipe — a text file at the project root that describes how to build the image. For a simple Python project it is short:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "src/pipeline.py"]

Five lines do real work: FROM picks a base image (Python 3.11 on a minimal Debian); WORKDIR sets the in-container directory; the first COPY and RUN install dependencies; the second COPY brings in your code; CMD says what to run when the container starts.

Build the image and run it:

docker build -t my-project .
docker run --rm my-project

The --rm flag deletes the container after it exits, which is what you want for one-shot runs. To get a shell inside the container instead — the rough equivalent of “activating” the environment — add -it and override the command:

docker run --rm -it my-project bash

Two patterns matter for development. To edit code on your host and run it inside the container, mount your project directory as a volume so changes show up live: docker run --rm -it -v "$PWD":/app my-project bash. To expose a port (for a Jupyter server, a Flask app, a database), add -p host:container: docker run --rm -p 8888:8888 my-project jupyter lab --ip=0.0.0.0.

Docker Compose for multi-service projects

Most non-trivial projects need more than just Python. They need a database, maybe a message queue, sometimes a separate worker process. Docker Compose lets you describe all of those services in one docker-compose.yml and start them together with docker compose up:

services:
  app:
    build: .
    depends_on: [db]
    environment:
      DATABASE_URL: postgres://user:pass@db:5432/app
  db:
    image: postgres:16
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: app

Two services start, the app waits for the database, and the whole pipeline reproduces on any machine that has Docker installed. This is genuinely magical the first time it works on a teammate’s laptop with no setup other than docker compose up.

When to actually reach for a container

Containers are powerful but they are also a layer of complexity, and reaching for them too early creates more friction than it removes. A useful checklist of “yes, you actually need a container”:

  • You need to reproduce the exact same environment on a different operating system — venv pins Python packages but not the OS or system libraries beneath them, and some packages behave differently on macOS vs Linux.
  • Your code depends on system-level binaries that are tedious to install — PostGIS, FFmpeg, headless Chromium for scraping, specific CUDA versions for GPU work.
  • You need to run alongside a database or queue that should be reproducible on every machine.
  • You are deploying to a cloud target — every major cloud platform runs containers natively, so packaging your project as a container is the lingua franca of “ship this somewhere.”

A useful checklist of “no, a venv is fine”:

  • Pure-Python coursework with a small requirements.txt.
  • Notebooks for analysis that read local files and produce figures.
  • Anything you only ever run on your own laptop for one semester.

If you are unsure, default to venv. You can always graduate a project to a container later by adding a Dockerfile; you cannot easily undo the complexity once it is in.

15.8 Stakes and politics

Virtual environments solve a real and unavoidable problem: different projects need different versions of the same package, and “the system Python” cannot serve them all at once. The political dimension is what the alternatives — and the absence of alternatives — look like for different students.

Two things to notice. First, what venvs assume about your machine. Creating, activating, and populating a venv from a requirements.txt requires an unmetered internet connection (the first install can be hundreds of megabytes), an editor that can be pointed at a custom interpreter, and a working python command on a system where you have write access to your home folder. Students using shared lab workstations, Chromebooks locked down by a school district, or capped mobile data plans hit different walls at different points in that workflow. Second, what containers concentrate. Docker, the dominant container runtime, is now owned by Docker, Inc. and gated by Docker Desktop’s commercial terms for larger organizations; the underlying open standards (OCI) exist, but the convenient tooling is increasingly behind a corporate license. Adopting containers as the default isolation layer trades “skill that runs on your laptop” for “infrastructure dependency on a third party.”

See Chapter 8 for the broader framework. The concrete prompt to carry forward: when a tutorial says “just create a venv” or “just run it in a container,” ask which prerequisites — bandwidth, disk, admin rights, paid licenses — that just is hiding.

15.9 Worked examples

Starting a new project from scratch

You are starting a term project that will use pandas and matplotlib.

mkdir term-project && cd term-project
python -m venv .venv
source .venv/bin/activate      # or .venv\Scripts\Activate.ps1 on Windows
python -m pip install --upgrade pip
python -m pip install pandas matplotlib jupyter ipykernel
python -m pip freeze > requirements.txt
echo ".venv/" >> .gitignore
git init && git add requirements.txt .gitignore
git commit -m "Initial environment"

Register the kernel so Jupyter can find it:

python -m ipykernel install --user --name term-project --display-name "Python (term-project)"
jupyter lab

In Jupyter, the new kernel appears in the launcher. Open a notebook with it and you are good to go.

Diagnosing “it worked yesterday”

You open your terminal the next morning, run python analysis.py, and get:

ModuleNotFoundError: No module named 'pandas'

Step 1: check whether the venv is active. Look at your shell prompt. Does it show (.venv)? No? Activate it:

source .venv/bin/activate

Rerun your script. Usually this fixes it — you forgot to activate.

Step 2: if still failing, check which Python.

which python
python -c "import sys; print(sys.executable)"

If the path does not contain .venv, activation silently failed (rare but happens with nested shells, tmux, or VS Code’s integrated terminal on certain configurations). Deactivate, re-activate, and recheck.

Step 3: confirm the package is installed in this venv.

python -m pip show pandas

If this says WARNING: Package(s) not found: pandas, reinstall:

python -m pip install -r requirements.txt

A collaborator clones your project

git clone https://github.com/you/term-project.git
cd term-project
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

That is the whole setup. Five commands and any machine is running your project with the exact package versions you committed.

15.10 Templates

.gitignore entries for a venv project:

.venv/
__pycache__/
*.pyc
.ipynb_checkpoints/
.pytest_cache/

requirements.txt (generated by pip freeze):

matplotlib==3.8.3
numpy==1.26.4
pandas==2.2.1

A version-pinned requirements.txt is reproducible but brittle. For libraries (code you publish for others to install), prefer loose constraints (pandas>=2.0). For application projects (your course work), prefer pins.

15.11 Exercises

  1. Create a fresh venv in an empty directory, activate it, and install pandas. Run python -m pip show pandas and confirm the Location: line is inside your venv.
  2. Write down the output of which python (or where python) before and after activating a venv. Notice the path change.
  3. Deliberately break the “which Python?” question: open a new terminal (do not activate the venv) and run python -c "import pandas". Read the traceback (see Chapter 7). Then activate the venv and run it again.
  4. Pick an existing project of yours that does not use a venv. Create one, pip-install everything it needs, and freeze a requirements.txt. Commit to git.
  5. Register a Jupyter kernel for a venv, open a notebook, and verify with import sys; print(sys.executable) that the notebook is running your venv’s Python and not the system one.
  6. Clone a classmate’s project (or your own on another machine) and get it running from scratch using only git clone, python -m venv, activate, and pip install -r requirements.txt. Time yourself — it should take under two minutes.
  7. Delete your .venv/ directory and recreate it from requirements.txt. Confirm the project still runs. This is the key reproducibility test.

15.12 One-page checklist

  • Every project gets its own venv: python -m venv .venv at the project root.
  • Add .venv/ to .gitignore. Never commit a venv.
  • Activate before installing anything: source .venv/bin/activate (or the Windows equivalent).
  • Verify with which python or python -c "import sys; print(sys.executable)" before any pip install.
  • Use python -m pip install instead of bare pip install.
  • Freeze dependencies to requirements.txt and commit that file.
  • Register the venv as a Jupyter kernel if you will use notebooks.
  • Point your editor (VS Code, PyCharm, etc.) at the venv interpreter.
  • If something is weird, the first diagnostic is always “which Python is running?”
Note📚 Further reading