This post is the start of a series on the Python packaging ecosystem. It is targeted at the version of me that existed before I learned all this stuff. It is targeted at anyone who knows Python, but has little or no knowledge of how to get their code into a form that can be installed on someones computer by running pip install {mypackage}. It is targeted at anyone who wants to know what that command actually does, where the package goes, and how Python finds it.

When I started trying to learn how to write a setup.py script, I found the answers to the questions that came up frustratingly scattered across many different websites, rife with anachronisms and contradictions. Much of it seemed to be targeted at someone who already knew how some version of the packaging system worked. The subject also turns out to be simply much bigger than I had imagined.

At time of writing, the Python packaging ecosystem is in a state of rapid flux. It seems that it is tending towards increasing modularity and standardisation. Still, we aren’t there yet, and new package creators will have to wade through a swamp of legacy APIs and outdated documentation to inform their decisions about which tools to use and how to configure them.

Summary

As a package author, somewhere between you finishing up the actual code for your project and it being integrated into a user’s system and executed, it has to pass through the following pipeline:

graph TD origin([tracker/repo]) --discovery--> source(source/sdist) --build--> wheel(wheel) --install--> package("site package") --entry--> exec("interpreter") source -.->|dependency|origin

(This diagram is somewhat simplified compared to what actually happens. For example, the “discovery” phase can often obtain wheels directly from a tracker.)

It is the package author’s task to include scripts or configuration files into the source code that tell the various tools implementing each stage of this process what to do. In order to write these files, then, the author must know something about which software is going to see them and how it will interpret it.

At the moment, most Python users looking for a package will invoke pip to handle every stage of this pipeline (although certain tasks are outsourced). This process is that pip runs rather opaque and is one thing I hope to elucidate in this series.

Through this series, I intend to explore the following questions:

  • Building. What is the definition of a build? What tools are available, and what do they do? What about extension modules?
  • Installation. What is the definition of an install? What tools are available, and what do they do? Where does all the stuff go? What about in a virtual environment?
  • Entry. With what commands can users access the contents of your installed package? How does Python find them?
  • Which aspects of this process can I configure as a package author, and how?

There are some big questions buried in here, and the answers do not seem to be completely settled. Other big and hairy topics that I might get to (but I don’t want to promise anything) are discovery, dependency resolution, isolated/virtual environments, and authoring compiled Python extensions.

As we go along, I will also draw attention to places where the official documentation or received wisdom does not seem to match the actual status quo, and dip toes into the farrago of packages, both core and 3rd party, in-development and legacy (pip, setuptools, distutils, poetry, wheel, site, build, flit, distlib, importlib, sysconfig, packaging, venv, virtualenv, pyenv, pipenv, pew…) which one encounters as one gets deeper into the subject.

The objective of these posts is not to provide a “tutorial” — there are already plenty of those — but to find definitions and complete specifications of behaviour, where available.

In this post, I’ll give a high-level overview of the pipeline, introducing key terminology, and describe two ways to create a minimal installable package.

The pipeline

Let’s introduce some of the main players in this game:

Distribution

The input to this process doesn’t seem to have a completely fixed name; I will call it a distribution. It consists of the source code for all the files you want to end up on the user’s computer (which if your distribution is pure Python, are the same as the actual files that will end up there), along with information on how to build them and metadata.

sdist

There is also a notion called an sdist, which is the files of a distribution stored in a gzip-compressed tarball in a vaguely specified directory structure. Archives in this format are one of the types of thing one retrieves from a package index like PyPI. For a specification, see here, but note that the pyproject.toml file they say is mandatory is actually a relatively recent addition to the Python packaging ecosystem (PEP 517), and many (most?) packages do not actually have it yet.

Wheel

The endpoint of the build process is a zip archive in a standardised directory structure called a wheel. Wheels are the other type of thing one retrieves from package indices.

Schemes and sites

The top-level directories of a wheel come equipped with mappings to a standard list of install location names. Your Python installation (or virtual environment) comes with various install schemes, which map these names to install locations associated to a particular site; namely, user or system-wide. A package installer uses this mapping to copy the wheel contents to the appropriate install locations (and does a couple of other things, like wrap scripts and write metadata somewhere suitable).

You can discover the available install schemes on your Python install using its version of the standard library sysconfig module.

Installation

Outputs that are discoverable by Python are packages, modules, extension packages and modules, entry points, data files (deprecated and with no stable plan for future implementation, but currently still usable), and metadata. The minimal install consists of trivial metadata (with UNKNOWN in all mandatory fields) and nothing else.

A typical implementation

The standard implementation of this pipeline consists of two main packages:

  • setuptools (with backend wheel) for building.

  • pip for everything else.

Neither of these are part of the Python standard library, yet nearly everyone uses them. The end user only interacts with pip, by invoking pip install.

The build phase is outsourced to a script named setup.py in the root of the distribution source tree. This script invokes commands from setuptools.

Minimal working example

To make a minimal installable package, create a script named setup.py with the following contents:

from setuptools import setup
setup()

Running pip install . from the directory containing this script will install an distribution named UNKNOWN consisting of no packages, no modules, and a few dist-info metadata files (with all package-dependent entries UNKNOWN or empty). More precisely, it will create a folder

UNKNOWN-0.0.0.dist-info/

containing a few files that are included with every distribution, and put it into the purelib directory of the default site of your Python install (usually /something/site-packages). You can see the install location and a list of the files by running

pip show -f UNKNOWN

from your shell. Check out their contents and see how they make you feel.

A “modern” MWE

Here is a more “modern” version of the preceding MWE: instead of setup.py, simply create an empty file called pyproject.toml in the root. Running pip install . will have the same effect as the above example.

The secret here is that pip sees the pyproject.toml file and, finding it empty, treats it as though it has the default contents:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

If you go down this route, it is of course nicer if you specify these data explicitly. Even if you do use a pyproject.toml, with setuptools you currently still need a setup.py or setup.cfg file if you want to do anything non-trivial.

You are probably going to want users of your package to install something more interesting than this; I’ll get to that, along with some more precise definitions, in later posts.

Further reading

My gripes at the top of this post notwithstanding, there is a lot of useful information on the Python packaging user guide. But beware: some of what is written there is variously misleading, inaccurate, anachronistic, or incomplete. Many of the discrepancies stem from the fact that implementations have not yet caught up with standards. The people at PyPA are aware of these problems, and they expose a list of issues that need funding.