5 Reading Official Documentation

Prerequisites and see-also

Prerequisites (read first if unfamiliar): Chapter 2.

See also: Chapter 3, Chapter 6, Chapter 35.

Purpose

Success Kid Meme: Need To Learn About A New Function, The Docs Have A Perfect Example.

A huge fraction of the frustration novices experience with programming comes from the same moment: you are stuck, you search the error online, you land on a tutorial blog post that almost-but-not-quite matches your situation, you copy a line, and it breaks something else. After two hours of this you feel like the tools are hostile. Meanwhile, the answer was three clicks away on the official documentation — but you either did not know the docs existed, did not know how to navigate them, or tried to read them like a novel and bounced off.

This chapter teaches you to read official documentation — pandas docs, Python docs, library README files, docstrings — in a way that actually answers your question. It is about the skill, not about any one library. Documentation is a reference medium: you dip into it for the answer to a specific question and then you leave. Once you learn to do that, the official docs become the fastest source of help you have, faster than Stack Overflow and much more reliable than a tutorial blog.

This chapter complements Chapter 3, which is about writing documentation. This one is about reading it.

Learning objectives

By the end of this chapter, you should be able to:

Explain the four genres of documentation (tutorials, how-tos, references, explanations) and pick the right one for your question.
Navigate the pandas and Python standard library docs to find a function’s signature, parameters, return value, and examples.
Read a function signature, including type hints, default values, and *args / **kwargs.
Read a docstring in Python with help(), ?, and ??.
Extract a minimal working example from a reference page.
Recognize when a blog post or tutorial is misleading and fall back to the official docs.
Read release notes / changelogs to diagnose “it worked last year” problems.

Running theme: start from the docs, not from the search results

For any question about a library function, the official docs should be your first stop, not your last. Blog posts are for discovery; docs are for correctness.

5.1 The four genres of documentation

Not all documentation is for the same kind of question. The Diátaxis framework (worth a bookmark) divides docs into four genres:

Genre	Purpose	When you want it
Tutorials	learning-oriented	“I am new; show me the basics hands-on.”
How-to guides	task-oriented	“I know roughly what I want; tell me the steps.”
Reference	information-oriented	“I know what function I want; tell me the exact arguments.”
Explanations	understanding-oriented	“Why does this work the way it does?”

Most novices reach for tutorials first and stop there. The real superpower is knowing when to switch. If you want pd.merge to treat nulls as matching, reading a tutorial on merges will waste your time — jump to the reference page for pandas.DataFrame.merge and find the parameter. If you want to know whether pandas can ever drop columns silently, an explanation-oriented essay will teach you more than ten tutorials.

The official docs of major libraries usually include all four genres. pandas has a “Getting Started” (tutorials), “User Guide” (explanations), “Cookbook” (how-tos), and “API Reference.” Python itself has a tutorial, a library reference, a language reference, and a “HOWTOs” section. Learn where each lives for the libraries you use most.

5.2 Reading a function signature

Every library reference page starts with a signature. Read it slowly — it tells you most of what you need to know without ever scrolling further.

From the pandas docs, here is the signature of pandas.read_csv (abridged):

pandas.read_csv(
    filepath_or_buffer,
    *,
    sep=',',
    header='infer',
    names=None,
    index_col=None,
    usecols=None,
    dtype=None,
    parse_dates=None,
    na_values=None,
    encoding=None,
    ...
)

The first thing to notice is the lone * partway through the argument list. In modern Python, that comma after the bare * is a marker meaning “everything after this point is keyword-only.” So filepath_or_buffer can be passed positionally, but sep, header, names, and the rest must be passed by name. If you try to write pd.read_csv("data.csv", ";") thinking the second argument is the separator, Python will raise a TypeError because the function refuses to accept ";" as a positional argument. The fix is to write it as a keyword:

df = pd.read_csv("data.csv", sep=";")

The next thing to notice is the default values attached to each parameter. sep=',' is more than a placeholder — it tells you that if you don’t pass sep, pandas will use a comma. Defaults are part of the function’s contract; whenever you skip a parameter, you are implicitly accepting its default.

A subtler trap is what we might call sentinel defaults: values that look harmless but have special meaning. header='infer' is not the same as header=None and not the same as header=0. The string 'infer' tells pandas to guess based on what the file looks like; None tells pandas the file has no header at all and to make up integer column names; 0 tells pandas to use the first row as column names unconditionally. Three nearby behaviors, three different outcomes. The only way to keep them straight is to read the parameter description in the reference, not to guess from the name.

Closely related: None in a Python signature almost never means “nothing.” It usually means “use the library’s default behavior,” which might be “no filter,” “infer from data,” or “do not pass through to the underlying call.” For usecols=None it means “load every column.” For dtype=None it means “infer types from the data.” Whenever you see None as a default, treat it as a hint to read the description before you assume.

Each parameter has its own description below the signature. Read the ones you actually pass; skim the rest until you need them.

5.3 Reading a return value

Every reference page tells you what the function returns — its type and shape. For a pandas DataFrame method, the return is usually another DataFrame (or Series, or scalar). The important question is always: is this operation in-place, or does it return a new object?

df.sort_values("date")        # returns a new DataFrame; df is unchanged
df.sort_values("date", inplace=True)   # modifies df directly, returns None

Many pandas operations historically had an inplace parameter; the modern convention is to avoid it and use the return value. If your code sorts a DataFrame and then operates on the original unsorted one, you probably forgot to assign: df = df.sort_values("date").

5.4 Reading docstrings from Python itself

You do not have to open a browser to read documentation. Every function, method, and class has a docstring you can read inline.

In the REPL or a script:

help(pd.read_csv)

In Jupyter or IPython, append a ?:

pd.read_csv?

To see the source of the function, append ??:

pd.read_csv??

Use dir(obj) to list all the attributes and methods on an object — a good way to discover what you can do with a DataFrame if you don’t remember:

import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3]})
[m for m in dir(df) if not m.startswith("_")][:20]
# ['a', 'abs', 'add', 'add_prefix', 'add_suffix', 'agg', 'aggregate', 'align',
#  'all', 'any', 'apply', 'applymap', 'asfreq', 'asof', 'assign', 'astype', ...]

For built-in functions and methods, help is fast and always available, even on an airplane.

5.5 Reading a docstring: the canonical shape

Most Python docstrings follow a structured format with the same sections:

def resample(self, rule, ...):
    """Resample time-series data.

    Convenience method for frequency conversion and resampling of time
    series. The object must have a datetime-like index (DatetimeIndex,
    PeriodIndex, or TimedeltaIndex).

    Parameters
    ----------
    rule : DateOffset, Timedelta or str
        The offset string or object representing target conversion.

    Returns
    -------
    pandas.api.typing.Resampler
        An object that can be used to perform resampling operations.

    See Also
    --------
    groupby : Group by mapping, function, label, or list of labels.

    Examples
    --------
    Start by creating a series with 9 one minute timestamps.

    >>> index = pd.date_range('1/1/2000', periods=9, freq='min')
    >>> series = pd.Series(range(9), index=index)
    >>> series.resample('3min').sum()
    ...
    """

The sections you will read most often are the summary line, the Parameters block, the Returns block, and the Examples. The summary line is one sentence at the very top, and reading it is almost free — half the time it directly answers your question and you can stop. If it does not, skim the Parameters block for the specific argument you care about; each parameter is named, typed, and described, so you can search inside the docstring with your editor’s find tool to jump straight to it. The Returns block tells you what shape of object comes back, which is what you need to know to chain the call into the rest of your code. And the Examples section is, in practice, the fastest way to understand any non-trivial function: it shows the function being called on real data with real output, which is something no English description can match.

If you only have time to read two of those four sections, read the summary line and the examples.

5.6 Extracting a minimal example from the docs

The examples in a reference page are designed to be self-contained. They usually import everything they need, create any sample data inline, and run top to bottom — meaning you can paste them anywhere and they should work. The workflow for a new function is to find the example that looks closest to what you want, copy it into a notebook cell or a throwaway script, run it as-is, and confirm that it works before you change anything. Then modify one variable at a time toward your real use case, running again after each change so that if something breaks you know exactly which change broke it.

# Step 1: paste the official example unchanged
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df.melt(id_vars=["a"], value_vars=["b"])

# Step 2: substitute your real DataFrame
df = pd.read_csv("survey.csv")
df.melt(id_vars=["a"], value_vars=["b"])   # will fail — wrong column names

# Step 3: substitute the right column names
df.melt(id_vars=["respondent_id"], value_vars=["q1", "q2", "q3"])

This iterative workflow beats “try to guess the right incantation” every time, and it gives you something better than a working call: a clear explanation, in your own head, of why it works.

5.7 Release notes and changelogs

Sometimes the docs on the site match a newer version than the one installed in your environment. Most libraries publish a release note page (sometimes called “What’s New” or “Changelog”) that lists the behavior changes in each release. There are three symptoms that almost always mean “go check the changelog”: code that worked last year stops working now, a tutorial that uses an argument your version does not seem to have, and a deprecation warning whose meaning is opaque. All three are version-mismatch problems, and the changelog is the cheapest way to confirm and resolve them. For pandas the page is at pandas.pydata.org/docs/whatsnew; for Python itself it is at docs.python.org/3/whatsnew.

Check your installed version first:

import pandas as pd
print(pd.__version__)

Then go to the “What’s new in X.Y” page for that version and scan for the function you are using.

5.8 When a blog post is wrong

Blog posts are wonderful for discovery — “I didn’t know I could do that” — and dangerous for reference. Two warning signs should make you suspicious of any blog code you are about to copy. The first is no date, or a date older than about two years. APIs change; an answer that was right in 2018 may be silently wrong in 2026. The second, and more diagnostic, is that the code in the post does not match the function signature in the official docs. That mismatch is almost always a sign that the post was written against a different version, and continuing to follow it will land you in a confusing place.

A useful workflow is to treat blog posts as discovery aids and the official docs as the authoritative reference. Discover an approach from a blog or Stack Overflow, then open the docs for every function in the snippet and verify that the signature matches your installed version:

# saw on a blog: pd.read_csv("file.csv", skipfooter=1, engine="python")
# verify it locally before trusting it:
import pandas as pd
help(pd.read_csv)             # search for "skipfooter" in the output
# or in Jupyter:
pd.read_csv?

If skipfooter shows up in your version’s docstring with the same meaning the blog assumes, you are fine. If your version’s docstring is silent on it, or the description is different, trust the docs and not the blog.

5.9 Stakes and politics

The instruction “read the official documentation” assumes there is official documentation worth reading. That assumption holds for a small number of well-funded projects — pandas, NumPy, the Python standard library, the major cloud SDKs — and breaks for almost everything else. Open-source maintenance is overwhelmingly unpaid, English-language, and concentrated in wealthy countries; the gap between tier-one docs and the rest is roughly the gap between projects that have a paid technical writer and projects that do not. When a tutorial is missing, when a reference page lists half the parameters, when no language other than English is supported, the cost of figuring it out anyway falls on the reader, and that cost is not evenly distributed.

Two decisions to notice. First, who funds the docs you rely on: pandas documentation is partly funded by NumFOCUS donations and corporate sponsorship, and many of its contributors are paid by large companies to work on it; smaller libraries usually depend on a single maintainer with a day job. Second, whose conventions count as “professional”: the Diátaxis framework, the Sphinx/reStructuredText pipeline, and the dominant docstring styles are themselves artifacts with histories — produced by particular communities at particular times — and reading docs fluently means learning those communities’ assumptions about what a good explanation looks like.

See Chapter 8 for the broader framework. The concrete prompt to carry forward: when the docs you need are bad, ask why they are bad before you blame yourself for not understanding them.

5.10 Worked examples

Looking up a parameter: what does `how='outer'` do in `pd.merge`?

Wrong approach: Google “pandas merge outer example,” read four blog posts, get confused by inconsistent examples.

Right approach: Open help(pd.merge) or the pandas docs for pandas.DataFrame.merge. Find the how parameter. Read:

how : {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Type of merge to be performed.

left: use only keys from left frame, similar to a SQL left outer join;

right: use only keys from right frame, similar to a SQL right outer join;

outer: use union of keys from both frames, similar to a SQL full outer join;

inner: use intersection of keys from both frames, similar to a SQL inner join.

Thirty seconds, definitive answer.

Routing to the right genre: why is my `.apply` so slow?

Wrong approach: find a tutorial on speeding up pandas, try several unrelated tricks.

Right approach: go to the pandas User Guide → “Enhancing performance.” That’s an explanation page, and it directly answers “why is .apply slow and what should I do.” It will recommend vectorized operations, .map, numba, or pyarrow depending on your case.

Reading a docstring inline: what arguments does `requests.get` actually take?

From within a Jupyter cell:

import requests   # https://requests.readthedocs.io/en/latest/
requests.get?

You get a docstring listing every keyword argument (params, headers, timeout, auth, cookies, allow_redirects, verify, stream, cert) with descriptions — no browser required.

Asking a precise question: does Python’s `in` operator work on a dict?

If you find yourself wondering “does Python’s in operator work on a dict?”, the fastest answer is the Python Language Reference (distinct from the tutorial). The reference for in lives under “Expressions → Comparisons → Membership test operations” and tells you that in on a dict checks the keys, not the values.

The language reference is dense — it is for people who want precise answers to precise questions. You will not read it cover to cover, but you will increasingly rely on it as you get comfortable.

5.11 Templates

A “read the docs before asking for help” pre-flight checklist:

Can I find the function in the official reference?
Did I read the summary line?
Did I read the description of the parameter I am using?
Did I read the Returns section?
Did I copy the official Example and run it?
Did I check which version I have installed vs. the version the docs describe?

If yes to all five, then it is time to ask a question (see Chapter 2).

5.12 Exercises

Using help() in a Python REPL, read the docstring for sorted. What does the key parameter do? What does reverse=True do?
In a Jupyter notebook, type pd.read_csv? and scroll through every parameter. Pick three you have never used and read their descriptions.
Open the pandas “API reference” and navigate to pandas.DataFrame.merge. Find the indicator parameter. Write one sentence explaining what it does.
Find a stale blog post (older than three years) that uses a pandas function you know. Read the code and find one thing it does that is now discouraged or deprecated. Cross-check against the current docs.
Check your installed pandas version with pd.__version__. Open the “What’s new in X.Y” page for that version. List two behaviors that changed in that release.
For a library you use often but have not read the docs of (e.g., matplotlib, scikit-learn), find the entry page for the four Diátaxis genres: tutorial, how-to, reference, explanation. Bookmark each.
The next time you hit a confusing error message, before searching online, open the docs for the function that raised the error. Read the Parameters and Examples sections. Time how long it took to find the answer.

5.13 One-page checklist

Docs first, blog posts second. Official documentation is almost always faster and more reliable.
Know the four genres: tutorials (learn), how-tos (do), references (look up), explanations (understand).
Use help(), ?, and ?? in Python to read docstrings inline.
Read function signatures carefully: defaults, keyword-only arguments, and return types matter.
Copy the reference-page example first, then modify toward your use case.
Always check library version against the docs version: pd.__version__.
Check the “What’s new” / changelog for release-to-release behavior changes.
Docs are reference, not novels — dip in and out, don’t try to read top to bottom.
Bookmark the four main entry points of your favorite libraries for future speed.

📚 Further reading

Python documentation — the canonical reference for the standard library and language; learn its tutorial, library reference, language reference, and HOWTOs as four different entry points.
pandas API reference — every DataFrame and Series method with signatures and examples; the model for what excellent reference docs look like.
Daniele Procida, Diátaxis framework — the model that distinguishes tutorials, how-tos, reference, and explanation; the single most useful mental tool for navigating any documentation site.
numpydoc style guide — the docstring conventions used across the scientific Python stack; learning them speeds up every ? lookup you will ever do.
Read the Docs — the hosting platform for most open-source Python documentation; understanding how it builds and versions docs explains a lot of the navigation patterns you will encounter.
Stripe, API reference — a widely cited example of commercial-grade reference documentation with side-by-side request/response panels; useful as a benchmark for what funded docs can look like.