35 Using AI Tools

Prerequisites and see-also

Prerequisites (read first if unfamiliar): Chapter 2.

See also: Chapter 6, Chapter 36, Chapter 3, Chapter 25, Chapter 26, Chapter 29.

Purpose

Astronaut Meme: Wait, it’s just autocomplete? Always has been. Me. Every ML paper since 2017.

AI assistants can reduce friction in programming and data science. They can draft documentation, summarize error messages, propose alternative implementations, and generate boilerplate. They can also produce plausible but incorrect guidance. This chapter treats AI assistance as one input to a disciplined workflow. The goal is not to get answers faster. The goal is to produce correct, reproducible, and safe work with less unnecessary effort.

The governing rule is simple:

AI can propose. You must verify.

Verification means using primary documentation, running small experiments, and adding tests. Those practices remain your responsibility, even if an assistant produced the first draft.

Learning objectives

By the end of this chapter, you should be able to:

Identify high-value AI uses and common failure modes.
Apply a risk-based policy to decide how much verification is required.
Write prompts that produce usable how-to guidance, not vague advice.
Validate AI outputs using primary sources, minimal experiments, and automated tests.
Avoid privacy and security mistakes when sharing context.
Use AI in coursework and collaboration without undermining learning or integrity.

Running theme: AI can propose; you must verify

A large language model produces plausible text that fits patterns in its training data and your prompt. That makes it useful for drafting and transformation, but it does not guarantee correctness. Verification — primary documentation, small experiments, tests — remains your responsibility no matter how confident the assistant sounds.

35.1 What an AI assistant is doing

Most AI assistants in technical settings are built on large language models (LLMs) — see Chapter 36 for a deeper look at how they work. An LLM generates text that fits patterns in its training data and in your prompt. That makes it useful for drafting and transformation tasks, such as turning rough notes into a README or converting a stack trace into a troubleshooting plan. It does not guarantee factual correctness.

A practical interpretation is to treat AI output as a draft produced by a fast collaborator who can be wrong. Your job is to turn drafts into reliable artifacts.

What AI is usually good at

LLM assistance is strongest when the task is primarily about language, structure, or recall of common patterns. It is reliably useful for drafting an outline, a README, a docstring, or an issue template — anything where the shape is conventional and you mostly need a starting point you can edit. It is good at rewriting an explanation in a different register, such as taking a paragraph from a research paper and making it readable for a novice. It is good at producing boilerplate that you would otherwise have to re-derive every time, like the standard argparse skeleton for a command-line script or the standard logging configuration for a Python module. It is good at suggesting search terms, the names of likely documentation sections, and common failure modes for a problem you describe. And it is good at generating long checklists that you can then shorten and validate against your own situation.

The common thread is that all of these are tasks where being “approximately right” gets you 80% of the way to a finished artifact, and your editing then takes you the rest of the way.

What AI is not reliable at

The other side of that pattern: LLMs are unreliable in exactly the ways you would expect a fluent writer with no working memory of your machine to be unreliable. The most common failure is hallucinated details, where the model produces a flag, API call, or citation that sounds plausible but does not exist. The second is version mismatch: the advice is correct for some version of pandas, scikit-learn, or git — just not the version you have installed. The third, and the one most likely to bite you, is hidden prerequisites: the assistant assumes you have already activated the right environment, changed into the right directory, or installed a system dependency, and never says so out loud. A fourth class is silent logic bugs, where the suggested code runs without error on the example in the prompt but is wrong on edge cases the prompt did not mention. A fifth is unsafe defaults, where the model proposes a solution that technically works but weakens security or increases destructive blast radius.

You should assume any of these errors are possible at any time. That assumption shapes good habits: ask for primary sources, run small tests on real inputs, and prefer minimal changes you can reason about over large changes you cannot.

35.2 A risk-based verification policy

Not all tasks have the same consequences if something goes wrong. Use risk level to decide how careful you need to be.

Low risk: drafting and formatting

Low-risk work is anything where an error is cheap to spot and cheap to correct: rewriting a paragraph for clarity, drafting a README skeleton, producing a checklist or a template. The cost of being wrong is reading the output, noticing the mistake, and editing it. For these tasks, your verification step is to read what the model produced for accuracy and completeness, and check that it matches the assignment or project you are actually trying to deliver.

Medium risk: technical guidance you can test quickly

Medium-risk work involves real technical claims, but the claims can be validated with small experiments. Examples include interpreting a stack trace and proposing checks to run, suggesting a sequence of commands to inspect your environment, drafting a unit-test scaffold, or proposing a refactor that your test suite can validate. The cost of being wrong is wasted time on a check that does not apply, which is annoying but not damaging. Your verification is to actually run the proposed checks, look up the relevant function in the official docs to confirm the assistant has not invented anything, and lock in the result with a test or an assertion so the same fix does not silently regress later.

High risk: destructive commands, security changes, and sensitive data

High-risk work is anything whose mistakes cannot be easily undone or whose mistakes can hurt other people. The clearest examples are commands that delete, overwrite, or recursively move files; any use of privilege escalation (sudo) or broad permission changes such as chmod -R 777; any change to network or authentication configuration like SSH keys, VPNs, or firewall rules; and any handling of confidential, protected, or legally regulated data.

For high-risk work, the verification bar is much higher. You should consult primary documentation and not just the assistant’s summary of it. You should prefer supervised help — instructor, TA, or an experienced colleague — over executing the suggestion alone. When possible, test the change in a sandbox (a throwaway VM, a scratch directory, a forked branch) before running it for real. And the firmest rule is the simplest one: never execute a command you do not understand, no matter how confident the assistant sounds.

# Pause and verify before any of these
sudo rm -rf /var/log/old           # high-risk: privilege + recursive delete
chmod -R 777 ~/project             # high-risk: broad permission change
git push --force origin main       # high-risk: rewriting shared history

A non-negotiable warning for destructive commands

Treat these as stop signs until you have confirmed paths, backups, and intent:

# Examples of destructive patterns (do not run blindly)
# rm -rf <path>              # macOS/Linux: recursive delete
# del /s <path>              # Windows: recursive delete
# mv <source> <dest>         # can overwrite depending on flags/context
# chmod -R 777 <path>        # broad permissions (usually wrong)
# sudo <anything>            # elevated privileges

If an assistant suggests one of these as a fix, pause. Ask: What exactly will this change? What evidence says this is the right change? How do I undo it?

35.3 The assistive loop: a workflow that forces evidence

A reliable workflow is an assistive loop: you use AI to propose an approach, then you validate it with evidence.

Step 1: write a short specification

Start by writing down what you are trying to accomplish in a way the model can act on. A good specification includes a one-sentence goal, the context the assistant cannot see (OS, Python version, environment name, relevant package versions), the inputs you are working with (a code snippet, a command, a file path, a sample of the data), the behavior you observed (the exact error or output text), the behavior you expected, and the things you have already tried and what happened when you tried them. This is the same shape as a good technical question for a human helper, and that is not a coincidence: the better you can describe the situation to a person, the better you can describe it to a model.

Step 2: request alternatives and checkpoints

Avoid prompts that ask for a single answer. Single-answer prompts encourage the model to commit to one path and you to follow it without thinking. Instead, ask for two to four plausible root causes, a short decision tree of checks you can run to discriminate between them, and a step-by-step plan that has verification checkpoints along the way. This forces the model to externalize the reasoning you would otherwise do in your head, and it gives you escape hatches if the first hypothesis turns out to be wrong.

Step 3: validate claims in primary documentation

Whenever the output depends on external truth — what a function returns, what a flag does, what version a feature was introduced in — verify it in official documentation for the version you have installed. If the assistant cannot point to a primary source, or if you cannot find one yourself, treat the claim as tentative until you can.

Step 4: run a minimal experiment

Once you have a hypothesis you trust enough to test, convert it into the smallest experiment that can succeed or fail. Change one factor at a time so that the result is interpretable, and write down what you ran and what happened. The goal is not to prove the assistant right but to find out whether the explanation matches reality.

# A minimal experiment: change one factor, record the result
python -c "import sys; print(sys.executable)"   # which Python is active?
python -c "import pandas; print(pandas.__version__)"   # which version?

Step 5: lock in the outcome

After the fix works, add something that will catch the same problem next time. The lightest-weight options are a unit test that exercises the previously broken behavior, an assertion that encodes the invariant you just discovered (shape, type, range), or a one-line entry in your project’s checklist or README. Each of these turns one round of debugging into a permanent improvement. Without it, the same bug will come back the next time someone touches that area of the code.

35.4 Prompt patterns that produce usable how-to guidance

Prompts work best when they specify output format and constraints. The goal is to be unambiguous.

Pattern A: decision tree diagnosis

# "I am seeing: <symptom or exact error>. Provide a decision tree with 8 checks.
# Each check must be a concrete command or observation, and you must say what
# each outcome implies. Keep it specific to <OS> and <tool/version>."

Pattern B: minimal reproducible example reduction

# "Here is my current minimal failing snippet. Reduce it further if possible.
# Replace real data with synthetic data. Output:
# (1) reduced code, (2) what you removed and why, (3) how to verify failure."

Pattern C: test-first scaffold

# "Write 4 pytest tests for the intended behavior below (include 2 edge cases).
# Then propose an implementation that satisfies the tests. State assumptions."

Pattern D: documentation rewrite without inventing steps

# "Rewrite these notes into a how-to guide with sections: Purpose, Prerequisites,
# Steps, Verify, Troubleshooting. Do not invent commands I did not provide.
# If prerequisites are missing, list questions under 'Prerequisites'."

Pattern E: safe command review

Use when you want to understand a command before running it.

# "Explain what this command does, what it changes on disk, and how to undo it:
# <command>
# Then give a safer alternative or a dry run option if available."

Pattern F: request a patch instead of a rewrite

For code changes, request a minimal diff rather than a full replacement.

# "Here is the current function. Provide a minimal patch (diff-style) to fix
# <specific bug>. Do not refactor unrelated code. Explain how to test the fix."

35.5 Using AI to improve technical questions

Good questions produce good answers. AI can help you edit and structure questions before you ask a human.

Rewrite into a standard template

A useful first move is to ask the assistant to convert your messy first description into a standard structure: goal, expected behavior, actual behavior, reproduction steps, context (versions and OS), and what you tried. This is the same shape as a good question for a human helper, and getting the model to fill in the structure forces you to notice what is missing — usually it is the context or the reproduction steps. Audit the result before you send it anywhere; in particular, do not let the assistant improve your question by inventing details that were not in your original description. If you cannot remember exactly what error you saw, the right thing for the assistant to do is leave a placeholder, not make one up.

Generate a context checklist

If you do not know what context matters, ask the assistant for a checklist. The same six or seven fields come up over and over for Python data work: the operating system and version, the Python version, the path of the interpreter that is currently active, the environment manager you are using (conda, venv) and the name of the active environment, the version of the relevant packages (pip show pandas, conda list scikit-learn), and your working directory plus the paths of any files the program is trying to read or write. Fill in everything you can; for fields you genuinely do not know, write “unknown” rather than guessing — the unknowns are often where the bug is hiding.

35.6 Using AI in debugging: hypotheses, checks, and minimal diffs

Use AI to propose hypotheses and checks. Keep control of the debugging loop.

Import errors: identify the interpreter and environment

Most import problems are environment problems. Confirm which Python is running and where packages are installed.

# Interpreter and environment checks
python --version
python -c "import sys; print(sys.executable)"

# macOS/Linux:
which python

# Windows:
where python

# Package presence:
pip show <package>
conda list <package>

Interpretation:

If sys.executable is not in the environment you expect, you are using the wrong interpreter.
If pip show reports a package under a different Python than sys.executable, you installed into the wrong environment.

Ask the assistant to propose a diagnosis tree, but do not accept environment-changing commands without understanding them.

Data ingestion bugs: inspect the intermediate

For many data issues, the right move is to inspect intermediate representations.

Example: a CSV loads into one column instead of many. Plausible causes include the wrong delimiter, quoting, or encoding artifacts. A useful debugging sequence is:

print the first few lines of the file,
test delimiters explicitly,
check column names and types after load.

# Quick inspection approach (Python)
import pandas as pd

path = "data/input.csv"
with open(path, "r", encoding="utf-8", errors="replace") as f:
for _ in range(5):
print(f.readline().rstrip(""))

df = pd.read_csv(path) # default delimiter is comma
print(df.shape)
print(df.columns.tolist())

df_tab = pd.read_csv(path, sep=""͡)
print(df_tab.shape)

Use AI to suggest likely causes and checks, but interpret the output yourself. A model cannot see your file contents unless you provide them, and you should avoid pasting sensitive data.

Path and working directory confusion

Many errors are simple path mistakes. Validate where you are before changing code.

# Working directory checks
# macOS/Linux:
pwd
ls -la

# Windows:
cd
dir

If AI suggests changing or deleting files to solve a path problem, stop. Verify the path and fix the invocation first.

Prefer minimal diffs over large rewrites

A common anti-pattern is accepting a large rewrite because it makes an error disappear. Large changes reduce your ability to identify cause and increase the chance of introducing new bugs. If an assistant proposes a rewrite:

ask for the smallest change that preserves intent,
apply one change at a time,
verify with a test or a checkpoint.

35.7 Using AI for documentation and project hygiene

Documentation is a strong target for AI assistance because outputs are reviewable and verifiable.

Draft a README that can be executed

A README is good only if a reader can follow it. Use AI to draft structure, then validate it by running it from scratch.

Minimum sections:

Purpose
Setup (environment creation)
How to run
Verify (what success looks like)
Troubleshooting (common failures)

Provide the actual commands you use. Ask the assistant to format them. Then test them in a fresh environment. This is often where missing steps are discovered.

Troubleshooting sections: symptom, cause, check, fix

Ask the assistant to propose troubleshooting entries in a strict format:

Symptom
Likely cause
Check (a command or observable)
Fix (a minimal action)

Then validate each fix on your system. Remove entries you cannot verify.

Runbooks for repetitive tasks

For recurring tasks (refreshing a dataset, rebuilding outputs), create a runbook. Include:

prerequisites and inputs,
step-by-step commands,
verification checks,
rollback or recovery steps.

AI can draft a runbook; you must execute it to confirm it works.

35.8 Using AI for code: constraints, review, and tests

AI-generated code is useful when you control scope and require verification.

Request small, testable units

Instead of asking for an entire pipeline, request one function at a time, with a clear contract attached: input types and any constraints they must satisfy, the output type and the invariants it should preserve, at least one realistic example, and a small set of tests including the edge cases you can think of. A function described this way is small enough to read top to bottom, small enough to test in isolation, and small enough that if the assistant gets it wrong, you can throw it away and try again without having lost much. A pipeline drafted in one prompt is none of those things — it is tempting because it feels efficient, but the inevitable bug is much harder to find than the equivalent bug in any one of its functions would have been.

Verification ladder

Use this ladder for any AI-generated code you plan to keep:

Read it line by line. You should be able to explain it.
Run a smoke test on tiny inputs.
Add assertions for invariants (shape, type, ranges).
Add unit tests (including edge cases).
Compare outputs to a baseline if replacing existing code.

If you cannot explain the code, do not ship it. “It works” is not a sufficient reason to keep code you do not understand, especially in an educational context.

Ask for tests before implementations

A reliable pattern is to ask for tests first. Tests clarify the contract and catch silent errors.

For example, if you are cleaning a column:

define how empty strings should be treated,
define how whitespace is handled,
define whether case is preserved,
define how non-string values should behave.

Then request tests that lock that behavior in.

Warnings for file operations in code

Code that writes files can destroy work. When an assistant proposes code that deletes or overwrites:

request a dry run mode,
request explicit confirmation before destructive operations,
request that outputs go to a dedicated outputs/ directory,
confirm the code uses explicit paths, not implicit working-directory assumptions.

35.9 Security and privacy hygiene

Security mistakes often begin as convenience: pasting too much context.

Never paste secrets

The simplest rule is the strictest one: do not paste passwords, API keys or tokens, private SSH keys, institutional credentials, or confidential datasets into any prompt. Once a secret has been pasted into a service you do not control, you have to assume it is no longer secret, even if the interface promises otherwise. If you need help with code that uses a secret, describe the structure (which environment variable it comes from, what the call signature looks like) and the error message — never the value. If you need an example, use a synthetic value:

# Safe: describe shape with a fake value
api_key = "sk-EXAMPLEEXAMPLEEXAMPLE"   # placeholder, not a real key
response = client.get(url, headers={"Authorization": f"Bearer {api_key}"})

Prefer synthetic or summarized data

When you need help with a data issue, share the shape of the data instead of the data itself. The most useful things to provide are the schema (column names and types), three to five synthetic rows that match the structure of the real data, and a few aggregate statistics like counts, missingness rates, or value ranges. Avoid copying raw records whenever they include personal or sensitive information; if course policy, IRB rules, or any law applies, assume raw disclosure is not allowed. The synthetic version is almost always sufficient for the assistant to give you the same advice it would have given for the real data.

Treat privilege escalation as high risk

Commands that request elevated privileges deserve a stop-and-verify reflex. If the assistant suggests sudo, a recursive chmod, or any change to system-wide configuration, do not run it before you confirm three things in primary documentation: what exactly the command will change on disk, how broadly it will apply, and how to undo it. Whenever a least-privilege alternative exists — installing into your user environment, writing to a folder you own, using a project-local config file — prefer it. And when you are unsure whether you actually need elevated privileges, ask a human who has used the system before.

35.10 Academic integrity and collaboration

Courses and teams differ in their policies. Follow the policy that governs your work.

Disclose AI assistance when required

When disclosure is expected, keep it concrete:

“Used AI assistance to draft README structure and suggest unit test scaffolding; verified commands by running locally; edited output for accuracy.”

Avoid vague statements that do not clarify what was assisted and what was verified.

Use AI to support learning

If you receive an answer, convert it into learning:

Ask for a smaller example.
Ask for assumptions and edge cases.
Predict what will happen if an input changes.
Test those predictions.

A useful personal rule is: if you cannot explain it, you do not own it.

Collaboration: keep humans in the loop

In team settings, AI can accelerate drafting, but humans should review:

Pull requests should explain intent and include tests.
Code review comments should reference evidence (tests, logs, docs).
Generated code should be treated like any other contribution: reviewed, tested, and justified.

35.11 When AI advice conflicts with your observations

Conflicts are common. Resolve them with evidence.

Resolution protocol

Trust local evidence: what happened when you ran the code.
Confirm versions and environment.
Consult primary documentation for that version.
If still unclear, ask a human with a structured question.

If the assistant cannot provide a verifiable primary source, treat its claim as tentative.

35.12 Stakes and politics

The advice in this chapter — use AI tools as drafting aids, verify their outputs, do not paste secrets — is good practical guidance. It is also a workflow built on top of an industry whose costs and externalities are largely paid by people other than the user.

Three things to notice. First, training data and consent. Modern LLMs are trained on text and code scraped from the open web, much of it without explicit permission from the authors. Books, blog posts, GitHub repositories under restrictive licenses, news articles, Stack Overflow answers — all of it has been ingested, with the legal questions still being litigated. When you use an AI tool, the value it delivers to you is partly the labor of millions of writers and developers who were not asked. Second, labeling labor. The reinforcement-learning-from-human-feedback that makes modern chat assistants usable is performed by underpaid contract workers, often in Kenya, the Philippines, Venezuela, and India, reviewing prompts and outputs that are sometimes traumatic. Mary Gray and Siddharth Suri’s Ghost Work (cited in Chapter 8) documents the broader pattern; the AI version is its current peak.

Third, the verification responsibility is yours, not theirs. The framing this chapter teaches — “AI can propose, you must verify” — is correct, and it is also a quiet transfer of liability. When an AI assistant suggests an unsafe command, a hallucinated citation, or a security-weakening pattern, the consequences fall on the person who ran the command, not on the company that built the model. That is a defensible workflow for individual users. It is also a remarkable arrangement at the level of an industry: an unprecedented tool is shipped to billions of people with the legal and practical responsibility for its failures pushed onto each of those people individually.

See Chapter 8 for the broader framework, and Chapter 36 for what the model is actually doing under the hood. The concrete prompt to carry forward: when you accept an AI suggestion, ask whose labor produced it and who pays when it is wrong.

35.13 Worked examples

These worked examples illustrate how AI fits into a disciplined workflow.

A notebook kernel mismatch

Suppose you can import pandas from the terminal, but the same import fails in your Jupyter notebook with ModuleNotFoundError: No module named 'pandas'. The assistant’s job is to propose hypotheses; yours is to gather evidence.

You ask the assistant for plausible causes and it suggests the most common one first: the notebook kernel is pointing at a different Python interpreter than the terminal. You verify this directly inside the notebook:

import sys
print(sys.executable)
# /Users/alex/anaconda3/bin/python    # the system Python — wrong!

Compared to what which python reports in the terminal (your project’s .venv/bin/python), this confirms the hypothesis. The fix is to switch the notebook’s kernel to your project’s environment, or install pandas into the kernel that is actually running. The lasting fix is to add one line to the project README — “Kernel must be proj-venv” — so the next person to clone the repo doesn’t repeat the same loop. The assistant proposed a check; your local evidence determined the fix.

A CSV that loads into one column

A familiar version of “the data is wrong” is pd.read_csv("data.csv") returning a DataFrame with exactly one column whose name is the entire header line. You ask the assistant for plausible causes and it lists three: the file is tab-separated, the delimiter has been quoted, or the file has an unusual encoding. Rather than blindly trying fixes, you inspect the file:

with open("data.csv", "r", encoding="utf-8") as f:
    for _ in range(3):
        print(repr(f.readline()))
# 'name\tage\tcity\n'   <- tabs, not commas

Now the cause is unambiguous. You update the load to pd.read_csv("data.csv", sep="\t") and add an assertion that the expected columns are present so the same problem cannot silently return:

df = pd.read_csv("data.csv", sep="\t")
assert {"name", "age", "city"}.issubset(df.columns), df.columns.tolist()

The fix is reliable because it is backed by direct inspection and a runtime check, not by hope.

A merge conflict in a notebook

The symptom is a merge conflict in analysis.ipynb after you git pull. You ask the assistant for the options: you can resolve the conflict by hand in a text editor (painful, because notebooks are JSON with embedded base64 outputs), you can use a notebook-aware diff tool like nbdime, or you can throw away one side and rerun the cells to regenerate the outputs. The right answer depends on your team’s policy and whether your notebook outputs are themselves part of the deliverable. After resolving, the long-term fix is upstream: configure pre-commit (see Chapter 33) to strip notebook outputs on commit, so this kind of conflict cannot happen again. See Chapter 31 for the broader version-control workflow.

A high-risk fix you should not run

Sometimes the assistant suggests a “fix” you should refuse. A common case: you get a permission error writing to a directory, and the assistant suggests sudo chmod -R 777 ~/project or sudo your script. Both are stop signs.

The correct response is to step back and ask why the program cannot write where you told it to. Almost always the answer is that the path is wrong — you are trying to write into /usr/local/share/... instead of into a folder you own — and the right fix is to change the path, not the permissions. Prefer writing into a user-owned folder under your home directory (a project-local outputs/ works fine). If you genuinely think you need system permissions, that is the moment to ask a TA or instructor before running anything. High-risk suggestions are exactly the situations where the assistant has the least context and the consequences of being wrong are largest.

35.14 Exercises

Take a recent error. Ask an assistant for three hypotheses and two checks per hypothesis. Run the checks and record which hypotheses you eliminated.
Draft a structured help request (Goal/Expected/Actual/Repro/Context/What I tried). Ask the assistant to improve clarity without changing facts. Audit the result and then post it.
Ask for a README skeleton for your project. Verify each command in a fresh environment. Add a Verify section and one troubleshooting entry.
Ask for a small function plus unit tests. Run the tests. If any fail, revise until they pass.
Create a personal “do not paste” list (credentials, tokens, private data). Keep it near your workstation.

35.15 One-page checklist

I treat AI output as a proposal and require evidence.
I scale verification to risk.
I request alternatives and checkpoints, not a single answer.
I confirm commands and API behavior in primary documentation.
I run minimal experiments and change one thing at a time.
I add tests or assertions after fixes.
I do not paste secrets or sensitive data.
I disclose AI assistance when required and keep it specific.

📚 Further reading

Anthropic, Prompt engineering overview — Anthropic’s guide to structured, verifiable prompts.
OpenAI, Prompt engineering — OpenAI’s parallel guide, with patterns for chat and API workflows.
Google, Generative AI prompt guide — a beginner-friendly walk-through from Gemini’s documentation.
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, On the Dangers of Stochastic Parrots (FAccT, 2021) — the canonical critical paper on large language models; required reading for the “Stakes and politics” framing above.
Arvind Narayanan and Sayash Kapoor, AI Snake Oil — a calm, evidence-driven book on what AI can and cannot do; an excellent counterweight to both hype and panic.
Mary L. Gray and Siddharth Suri, Ghost Work — the foundational book on the hidden human labor behind “automated” systems; the AI labeling industry is its current peak.
Distributed AI Research Institute (DAIR) — Timnit Gebru’s research institute; a current and ongoing source on AI labor, bias, and accountability.
Karen Hao, Empire of AI — long-form journalism on the labor and resource flows behind modern AI; pairs with the labeling-labor framing above.