3  Technical Documentation

Prerequisites: none. This chapter stands on its own.

See also: Chapter 2, Chapter 31, Chapter 32, Chapter 25, Chapter 29.

Purpose

Put It Somewhere Else Patrick Meme: Why don’t we take everything we know and write it down somewhere?

Computing courses often teach you what to type, but they do not always teach you how to learn what to type next. In real projects, the fastest path forward is rarely a memorized recipe. It is the ability to locate authoritative documentation, interpret it, test it against your situation, and—when needed—write documentation that helps other people reproduce, maintain, and extend your work.

This chapter treats documentation as a practical tool and a collaborative artifact. You will learn how to (1) find documentation efficiently, (2) read it with the right level of skepticism and precision, and (3) write documentation that is usable by people who do not share your context. Along the way, we will separate different kinds of documentation (reference, tutorials, how-to guides, explanations), show how they support different questions, and give you templates you can reuse throughout the course.

Learning objectives

By the end of this chapter, you should be able to:

  1. Identify common documentation genres and choose the right one for your task.

  2. Locate primary sources of truth (official docs, READMEs, man pages, API references) and distinguish them from secondary commentary.

  3. Read documentation actively: extract assumptions, constraints, inputs/outputs, and failure modes.

  4. Reconcile documentation with your local context (OS, version, environment, permissions) and confirm behavior with small experiments.

  5. Write documentation that enables others to set up, run, and verify your work.

  6. Maintain documentation as an ongoing practice: update it when code changes and record decisions that affect reproducibility.

  7. Use AI tools to assist documentation work while maintaining verification, citations, and security hygiene.

Running theme: documentation is an interface

Documentation is an interface between your intent and someone else’s understanding. The “someone else” might be a teammate, a future version of you, or even a tool like a build system or continuous integration job. Good documentation lowers the cost of correct action and raises the cost of confusion.

3.1 A beginner mental model of documentation

A common misconception is that documentation is a single thing: “the docs.” In practice, documentation is a stack of artifacts produced by different actors and optimized for different questions. When you learn to navigate that stack, you stop treating confusion as personal failure and start treating it as a routing problem: “Which source should answer this question?”

Two complementary roles: learning and coordination

Documentation does two jobs at the same time. The first is learning support: teaching you how a tool works and how to use it. The second is coordination support: aligning multiple people on how a project should be run, changed, and evaluated. The same document rarely does both jobs well, which is why so many frustrations come from asking the wrong artifact to do the wrong job. An API reference is a poor tutorial because it assumes you already know what you are looking for. A tutorial is a poor specification because it focuses on one path through the tool, not on every behavior. A README is not a substitute for a change log, because it describes the project as it is now, not how it changed. Recognizing which role you need is half the battle.

A simple hierarchy of authority

When sources conflict, it helps to rank them by authority and proximity:

  1. Primary sources: official documentation, man pages, upstream source code, published specifications.

  2. Project-local sources: your repository README, CONTRIBUTING guide, code comments, and issues.

  3. Secondary sources: blog posts, videos, forum answers, AI-generated suggestions.

Secondary sources can be valuable, especially for novices, but they are more likely to be outdated or context-specific. Your goal is to use them as navigation aids to primary sources, not as final authority.

3.2 Documentation genres and the questions they answer

A useful classification (popularized by the Diátaxis framework and multiple documentation communities) separates four genres. You do not need to memorize the labels; you need to recognize the different user needs they serve.

Reference documentation

Reference docs answer the question “what are the exact inputs, outputs, options, and behaviors?” They include API references for functions, classes, and parameters; command-line help and man pages; and configuration file schemas. Reference is dense, structured, and typically not narrative — you do not read it cover to cover, you dip into it for one precise answer and then leave. When you already know what function you want and just need to confirm an argument, reference is the right genre.

# Examples of reference docs you can read locally
git help merge          # man-page-style git reference (https://git-scm.com/docs)
ls --help               # GNU ls option reference
python -c "help(dict)"  # Python built-in reference (https://docs.python.org/3/)

Tutorials

Tutorials answer the question “how do I learn this from scratch in a guided way?” They are linear and scaffolded, and they assume you have the time to follow a sequence step by step. A good tutorial gets you to a working example quickly, even if you do not fully understand every piece on the first pass. The pandas “Getting Started” guide and the official Python tutorial are both examples of this genre.

How-to guides

How-to guides answer the question “how do I accomplish a specific task?” They are goal-oriented and practical, and the best ones are short, focused, and opinionated about the steps. They assume you already know roughly what you want — “I need to read a CSV with a custom delimiter” — and they hand you the recipe without padding it with conceptual background.

Explanations and concepts

Explanations answer the questions “why does this work the way it does?” and “what is the right mental model for this thing?” Concept docs clarify terminology, architecture, trade-offs, and design rationale. They are often where you learn what not to do — why iterrows() is slow in pandas, why mutable default arguments are a trap in Python, why HTTP POST is not idempotent. Reading good explanation docs is one of the highest-leverage activities in technical learning, even though it produces nothing immediately runnable.

Why this classification matters

When a student says “I read the docs and still don’t get it,” the hidden question is almost always which kind of docs they read versus which kind they actually needed. A quick mental diagnostic helps: if you cannot even find the right function name for what you want to do, you need a tutorial or a how-to to point you at it. If you know the function name and just need to look up the right parameter, you need reference. And if you keep making the same conceptual mistake — the same join going wrong, the same kind of KeyError — you need an explanation page that fixes the mental model, not another how-to that fixes one symptom.

3.3 Finding documentation efficiently

Finding documentation is a skill because the web contains too much material, and search engines optimize for popularity rather than correctness. You should develop a repeatable workflow that reliably gets you to primary sources.

Start with “what is the object?”

The first move is always to figure out what kind of thing you are dealing with, because the answer determines where the “official” docs live. Are you trying to understand a language feature (Python list slicing, generator expressions), a library (pandas, numpy, scikit-learn), a tool (Git, conda, Jupyter), a platform (GitHub Actions, Windows Task Scheduler), or a file format (CSV, JSON, Parquet)? Each of those categories has a different home: language features live in the language reference, libraries live in their own documentation sites, tools live in their man pages and project sites, platforms live in their vendor docs, and file formats live in their public specifications. Once you know the category, you almost always know the URL pattern.

Search queries that route you to primary sources

A common novice mistake is to type “how do I do X” into Google and end up in a sea of secondary sources — blog posts, Stack Overflow answers, and YouTube videos of varying age and accuracy. A small change in search query routes you to primary sources directly. Instead of “how do I write a requirements.txt”, search for pip documentation requirements.txt. Instead of “how do I use SSH ProxyJump”, search for ssh man page ProxyJump. Instead of “pandas merge tutorial”, search for pandas API merge. The pattern is always the same: name the tool, name the topic, and add a word like documentation, API, or man page to bias the results toward authoritative sources.

# Instead of                          search for
# "how to use pandas merge"    →     "pandas API merge"
# "git rebase tutorial"         →     "git documentation rebase"
# "fix requests timeout"        →     "requests timeout parameter site:requests.readthedocs.io"

When you do land on a blog or Stack Overflow post, treat it as a navigation aid: use it to extract the function names, flags, and concepts you should look up next, and then jump immediately to the official docs to verify them against your installed version.

Know the “home bases”

A small set of sites recur in data science work:

Python Packaging Authority. n.d. Python Packaging User Guide. Documentation. https://packaging.python.org/.
pip developers. n.d. User Guide. Pip documentation. https://pip.pypa.io/en/stable/user_guide/.
Conda Project. n.d. Managing Environments. Conda documentation. https://docs.conda.io/docs/user-guide/tasks/manage-environments.html.
Project Jupyter. n.d.-b. Project Jupyter Documentation. Documentation. https://docs.jupyter.org/.
Project Jupyter. n.d.-a. JupyterLab Documentation. Documentation. https://jupyterlab.readthedocs.io/.
Chacon, Scott, and Ben Straub. 2014. Pro Git. 2nd ed. Apress. https://doi.org/10.1007/978-1-4842-0076-6.

You do not need to memorize URLs. You do need to recognize which domains and organizations produce the authoritative references for a given tool.

Use built-in documentation before the web

Before you open a browser, check what is already on your machine. Almost every tool you use ships with its own reference, and that reference has the enormous advantage of being aligned with your installed version — which is exactly what you need when version differences matter. From the command line, the patterns are tool --help, man tool, or help tool. From Python, you can call help(obj) on any function or class, and inside Jupyter or IPython you can append ? to a name to read its docstring or ?? to see its source code. R users have ?function and help(function). Git’s subcommands all have their own help pages, accessible as git help merge, git help commit, and so on.

git help rebase                  # full man page for git rebase
ls --help | head -20             # short reference for ls flags
python -c "help(dict.get)"       # built-in help for dict.get

The local help is fast, offline, and version-correct. Reach for it before you reach for a search engine.

3.4 Reading documentation actively

Reading technical documentation is not like reading a novel. You do not “consume” it from start to finish. You interrogate it.

The “extract five facts” method

When you are reading the docs for a function, command, or tool behavior, you are looking for five specific facts: the inputs (required arguments and the types they accept), the outputs (return types, files written, side effects on global state), the defaults (what happens if you do not pass anything special), the constraints (version requirements, OS restrictions, permissions needed), and the failure modes (which errors the function can raise and what they usually mean). If you can find all five for a given function, you know almost everything you need to use it confidently. If you cannot find them — for example, if a tutorial only shows you one usage and never enumerates the parameters — you are reading the wrong genre and should switch to the reference page.

Identify assumptions and hidden context

Almost every set of docs is written for a specific imagined reader, and that reader is usually in a slightly different situation than you are. Docs routinely assume that you are in the right working directory, that your environment is already activated, that you have network access, that you have read/write permissions to wherever the example writes, and that your input data match a schema the docs never quite specify. As you read, train yourself to ask “what must be true for these steps to work?” — and then compare each of those assumptions to your actual situation. The first place they diverge is usually the place where the docs and your reality will collide.

Examples are executable contracts

Examples in official docs are not decoration. They are the closest thing documentation has to a contract: the maintainer is asserting that this code, running in this version of the library, produces this result. The most useful thing you can do with an example is take it seriously: copy it into a scratch file, run it as-is on your installed version, and confirm it works before you change anything. Then modify it one variable at a time toward your real use case. If the example fails on your machine, that is itself a piece of evidence — either the docs are out of date, you are on a different version than the docs assume, or one of the hidden prerequisites is missing in your environment. Either way, you have learned something concrete instead of staring at the docs in confusion.

Beware the “happy path”

Tutorials almost always show the happy path: clean input data, correct permissions, a fresh environment, no network blips, and no edge cases. Real work involves the messy versions of all of those — missing files, corrupted data, half-installed packages, version conflicts between tools that used to coexist, and intermittent network failures that look like bugs but are not. When you are reading docs, deliberately look for the sections that handle the unhappy path: “Troubleshooting,” “Common issues,” “FAQ,” “Compatibility,” and “Upgrading.” Most projects have these, and they are often the highest-density information in the entire site, because they describe exactly the situations that send people looking for help.

3.5 Reconciling documentation with versions and environments

A high percentage of “the docs are wrong” complaints are actually version mismatches. The documentation might be correct for some version of the tool, just not yours.

Always capture version context

When you are stuck on something that the docs say should work, the cheapest first move is to record exactly which versions of everything you have installed. The five fields that solve most version mysteries are the tool or library version, your operating system, your Python version (if Python is involved), the environment manager you are using (conda, pip, venv), and the path of the executable you are actually running. Capture all five before you start changing anything, because every fix you try is going to be interpreted in light of those values.

python --version
python -c "import sys; print(sys.executable)"
python -c "import pandas; print(pandas.__version__)"
uname -a   # macOS / Linux  (Windows: 'systeminfo' or 'ver')

This habit aligns with the broader reproducibility guidance from the literature (Wilson et al. 2017; The Turing Way Community 2025), and as a practical bonus it also makes your questions answerable when you need to ask for help (see Chapter 2).

Doc version selectors and release notes

Many doc sites include version selectors. If your library has major versions, explicitly pick the correct one. When behavior changes, consult release notes or change logs. (You do not need to read every change; you need to confirm whether your symptom matches a known change.)

The “pin or upgrade” decision

When documentation and behavior mismatch, you often face a decision:

  • Pin: keep your environment as-is and use docs for your version.

  • Upgrade: move to the current version and follow the latest docs.

For class projects, upgrading is often fine if it does not break the assignment. For collaborative projects, pinning is often safer to preserve team consistency. Either way, write the choice down in your project documentation.

3.6 Project-local documentation: READMEs and runbooks

A project README is the most important document in your repository. It is the front door. It should answer: “What is this?” and “How do I run it?”

Minimum viable README

A minimal README for a class project has six sections, and they should appear in roughly this order. First, a one-paragraph purpose describing what the project does and why anyone should care. Second, a setup section that walks through environment creation and dependency installation. Third, a how to run section with the exact commands a reader should type. Fourth, an expected outputs section that says what files or results will appear when the run succeeds — this is what lets a reader confirm the run actually worked. Fifth, a project structure section explaining what each top-level folder contains. Sixth, a data notes section saying where the data came from and any restrictions on its use.

# Q3 Sales Analysis

A short pipeline that loads quarterly sales data, cleans it, and
produces a report.

## Setup
    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt

## Run
    python src/run_pipeline.py

## Expected outputs
- outputs/figures/quarterly.png
- outputs/tables/top_customers.csv

## Structure
- data/raw/        # original CSVs (read-only)
- src/             # pipeline code
- outputs/         # generated artifacts (gitignored)

## Data
Source: /shared/sales-data/2026-q3.csv (internal use only).

This is not busywork. It is an investment that pays back the first time you return to the project after a week away — or the first time a teammate clones the repo and tries to run it on their own machine.

Runbooks and operational docs

When a project has recurring operational tasks (refresh data, rebuild figures, re-run pipeline), create a short runbook:

  • What the task is

  • When to run it

  • The command to run it

  • Where logs and outputs go

  • What to do when it fails

Runbooks are especially valuable when tasks become automated (see the automation chapter in your handbook).

3.7 Writing documentation that people actually use

The hard part of writing documentation is not typing words. It is choosing what to include and what to omit.

Write for a specific reader

Before you write, decide:

  • Who is the reader (classmate, TA, future you)?

  • What do they already know?

  • What do they need to accomplish?

Documentation that tries to serve everyone often serves no one. Prefer clear, scoped docs over universal docs.

Structure beats cleverness

Use predictable headings and consistent patterns. For procedural docs, a reliable structure is:

  1. Purpose

  2. Prerequisites

  3. Steps (numbered)

  4. Verification (what “success” looks like)

  5. Troubleshooting (common failures)

If your reader can skim the headings and understand the shape, you have already reduced friction.

Make commands copyable

When documentation includes commands:

  • use code blocks,

  • include the working directory context,

  • avoid placeholders unless you explain them,

  • show expected output when useful.

A common novice mistake is to document “run the script” without specifying which environment, from which folder, and with which parameters.

Document intent, not just mechanics

Good docs explain why key steps exist:

  • Why you pin versions

  • Why you never edit raw data in place

  • Why you run formatting checks before committing

A short rationale prevents future collaborators from “optimizing away” the steps that preserve correctness.

Use examples as anchors

For conceptual documentation, include one example that is fully worked:

  • a real command invocation,

  • a real function call with realistic inputs,

  • a sample configuration file.

Examples turn abstract descriptions into something testable.

3.8 Documentation maintenance as a habit

Documentation is not a one-time deliverable. It is a living layer that must move with your code.

Docs are part of “done”

Adopt a simple policy: if a change affects how someone uses the project, update the docs in the same pull request. This prevents drift and reduces the cognitive load of “remembering to update later.”

When docs drift, treat it as a bug

If instructions stop working, file an issue. Drift is not shameful; it is normal. The problem is leaving drift invisible.

Decision logs and “why we chose this”

Some choices are not obvious from code:

  • why you selected a dataset,

  • why you used a specific model or parameter,

  • why you excluded certain records,

  • why you used one workflow rather than another.

A lightweight decision log prevents repeated debates and helps future readers interpret results. This practice aligns with reproducible project guidance (The Turing Way Community 2025; Wilson et al. 2017).

The Turing Way Community. 2025. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. https://zenodo.org/records/15213042.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” PLOS Computational Biology 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.

3.9 AI tools in documentation workflows

AI tools can help with documentation, but they also introduce risks: hallucinated facts, mismatched versions, and accidental leakage of sensitive data. The goal is to use AI as an assistant for drafting and structuring, not as an authority.

High-value uses

AI tools can be effective for:

  • drafting a README skeleton from a project structure,

  • rewriting rough notes into a coherent how-to guide,

  • generating consistent templates (issue forms, PR checklists),

  • proposing likely troubleshooting steps (to be verified),

  • improving clarity and concision for a novice audience.

Guardrails

Use the following guardrails:

  1. Never paste secrets: tokens, keys, private data.

  2. Treat outputs as drafts: verify against official docs and by running commands.

  3. Prefer local evidence: your logs, your versions, your environment.

  4. Cite primary sources: do not let AI replace authoritative references.

A practical rule: if the AI suggests a command that could be destructive (rm, sudo, permissions changes), stop and verify in official documentation or with an instructor.

3.10 Stakes and politics

Documentation decides who can use a tool. Every “just run pip install” or “open a terminal” smuggles in assumptions: that you have administrator access on your machine, that your network does not block PyPI, that English is not a barrier, that your screen reader can navigate the docs site, and that you have unbroken time to follow a multi-step setup. When those assumptions are wrong, the tool is effectively unavailable — not because the technology cannot serve the reader, but because the documentation drew a boundary they could not cross. The professional shorthand “RTFM” treats this as the reader’s problem; the design habit you should be building treats it as the writer’s problem.

Two decisions to notice. First, whose problems the docs anticipate: the example commands, the operating system shown, the network connection assumed, the device on which screenshots were taken. These choices encode an imagined user — usually English-speaking, on a recent laptop, with broadband — and a reader who does not match the imagined user pays the difference in time and frustration. Second, whose labor maintains the docs: most large open-source projects rely on a small group of unpaid contributors to keep documentation current, with predictable consequences for translations, accessibility features, and the edge cases that fall outside the contributors’ own situations. The docs are as inclusive as the people who write them have time and motivation to make them.

See Chapter 8 for the broader framework. The concrete prompt to carry forward when you write a README or a how-to: name your imagined reader explicitly, then add one sentence for the reader you assumed away.

3.11 Worked examples

This section demonstrates how documentation practices appear in typical novice workflows.

Turning a notebook into a runnable project

A student’s initial workflow often begins in a Jupyter notebook. That is fine, but as the project grows, collaborators need a stable way to reproduce results.

A documentation upgrade path:

  1. Create a README with a one-sentence purpose and “How to run.”

  2. Add an environment section (conda or pip).

  3. Add an “Outputs” section describing what files appear after running.

  4. Add a “Project structure” section describing folders.

This minimal documentation shift turns a private artifact into a collaborative artifact.

Documenting a data intake pipeline

Even a simple “load a CSV and clean it” pipeline has assumptions:

  • file encoding,

  • delimiter,

  • missing value markers,

  • expected columns,

  • row-level integrity.

A good intake document includes:

  1. dataset source and date,

  2. schema and column meanings,

  3. known quirks (e.g., numbers stored as text),

  4. checks you run before analysis.

This bridges the gap between raw files and analysis claims.

Writing troubleshooting notes

Troubleshooting notes become valuable when they are specific. Compare:

Weak.

“If conda doesn’t work, reinstall it.”

Better.

“If conda activate fails with ‘command not found,’ confirm your shell initialization includes conda. On macOS, check your .zshrc and restart the terminal. If you installed Anaconda/Miniconda after opening the terminal, the current session may not have the PATH update.”

The “better” version specifies symptoms, a hypothesis, and a verification step.

3.12 Templates

Template A: How-to guide

# Title: How to <do a specific task>

## Purpose
One sentence describing what this accomplishes.

## Prerequisites
- Required software / permissions
- Required files or environment

## Steps
1.
2.
3.

## Verify
What success looks like (expected output / files / screenshots).

## Troubleshooting
- Symptom -> likely cause -> fix

Template B: README skeleton (student project)

# Project Title

## Purpose
What this project does (2-4 sentences).

## Quickstart
Commands to set up and run.

## Environment
- How to create/activate env
- Key dependencies

## Data
- Source
- How to obtain
- Restrictions (if any)

## How to run
Exact command(s) and expected outputs.

## Project structure
- data/
- src/
- notebooks/
- outputs/

## Reproducibility notes
Versions, seeds, and known pitfalls.

Template C: Decision log entry

Decision:
Date:
Context:
Options considered:
Chosen option:
Rationale:
Consequences / follow-ups:

3.13 Exercises

  1. Pick a tool you used this week (Git, conda, pandas, Jupyter). Identify one example each of reference docs, a tutorial, and a how-to guide. Write one sentence describing what question each one answers best.

  2. Write a one-page README for a class assignment repository. Swap with a classmate and attempt to run each other’s project using only the README. Record what was missing and revise.

  3. Take a confusing error you encountered and create a “troubleshooting note” that includes: symptom, likely cause, verification step, fix.

  4. Rewrite a set of rough notes into a how-to guide using Template A. Include a verification step that a reader can perform.

  5. Identify one assumption in your current project that is not written down (paths, versions, parameters, data quirks). Document it in the README and add a short rationale.

3.14 One-page checklist

  • I can identify whether I need reference docs, a tutorial, a how-to, or a conceptual explanation.

  • I can route from secondary sources to primary sources.

  • When reading docs, I extract inputs, outputs, defaults, constraints, and failure modes.

  • I record version/environment details when behavior is surprising.

  • My README contains purpose, setup, run instructions, and verification.

  • I update docs when code changes affect usage.

  • I keep a short decision log for consequential choices.

  • If I use AI tools, I verify outputs against official docs and local experiments, and I never paste secrets.

Note📚 Further reading