Effective Python

This sample is from a previous version of the book. See the new second edition here.

Building larger and more complex programs often leads you to rely on various packages from the Python community. You’ll find yourself running pip to install packages like pytz, numpy, and many others.

The problem is that by default pip installs new packages in a global location. That causes all Python programs on your system to be affected by these installed modules. In theory, this shouldn’t be an issue. If you install a package and never import it, how could it affect your programs?

The trouble comes from transitive dependencies: the packages that the packages you install depend on. For example, you can see what the Sphinx package depends on after installing it by asking pip.

$ pip3 show Sphinx
---
Name: Sphinx
Version: 1.2.2
Location: /usr/local/lib/python3.4/site-packages
Requires: docutils, Jinja2, Pygments

If you install another package like flask, you can see that it too depends on the Jinja2 package.

$ pip3 show flask
---
Name: Flask
Version: 0.10.1
Location: /usr/local/lib/python3.4/site-packages
Requires: Werkzeug, Jinja2, itsdangerous

The conflict arises as Sphinx and flask diverge over time. Perhaps right now they both require the same version of Jinja2 and everything is fine. But six months or a year from now, Jinja2 may release a new version that makes breaking changes to users of the library. If you update your global version of Jinja2 with pip install --upgrade, you may find that Sphinx breaks while flask keeps working.

The cause of this breakage is that Python can only have a single global version of a module installed at a time. If one of your installed packages must use the new version and another package must use the old version, your system isn’t going to work properly.

Such breakage can even happen when package maintainers try their best to preserve API compatibility between releases. New versions of a library can subtly change behaviors that API consuming code relies on. Users on a system may upgrade one package to a new version but not others, breaking dependencies. There’s a constant risk of the ground moving beneath your feet.

These difficulties are magnified when you collaborate with other developers who do their work on separate computers. It’s reasonable to assume that the versions of Python and global packages they have installed on their machines will be slightly different than your own. This can cause frustrating situations where a codebase works perfectly on one programmer’s machine and is completely broken on another’s.

The solution to all of these problems is a tool called pyvenv, which provides virtual environments. Since Python 3.4, the pyvenv command-line tool is available by default along with the Python installation (it’s also accessible with python -m venv). Prior versions of Python require installing a separate package (with pip install virtualenv) and using a command-line tool called virtualenv.

pyvenv allows you to create isolated versions of the Python environment. Using pyvenv, you can have many different versions of the same package installed on the same system at the same time without conflicts. This lets you work on many different projects and use many different tools on the same computer.

pyvenv does this by installing explicit versions of packages and their dependencies into completely separate directory structures. This makes it possible to reproduce a Python environment that you know will work with your code. It’s a reliable way to avoid surprising breakages.

The `pyvenv` Command

Here’s a quick tutorial on how to use pyvenv effectively. Before using the tool, it’s important to note the meaning of the python3 command-line on your system. On my computer, python3 is located in the /usr/local/bin directory and evaluates to version 3.4.2.

$ which python3
/usr/local/bin/python3
$ python3 --version
Python 3.4.2

To demonstrate the setup of my environment, I can test that running a command to import the pytz module doesn’t cause an error. This works because I already have the pytz package installed as a global module.

$ python3 -c 'import pytz'
$

Now I use pyvenv to create a new virtual environment called myproject. Each virtual environment must live in its own unique directory. The result of the command is a tree of directories and files.

$ pyvenv /tmp/myproject
$ cd /tmp/myproject
$ ls
bin     include     lib     pyvenv.cfg

To start using the virtual environment, I use the source command from my shell on the bin/activate script. activate modifies all of my environment variables to match the virtual environment. It also updates my command-line prompt to include the virtual environment name ('myproject') to make it extremely clear what I’m working on.

$ source bin/activate
(myproject)$

After activation, you can see that the path to the python3 command-line tool has moved to within the virtual environment directory.

(myproject)$ which python3
/tmp/myproject/bin/python3
(myproject)$ ls -l /tmp/myproject/bin/python3
... -> /tmp/myproject/bin/python3.4
(myproject)$ ls -l /tmp/myproject/bin/python3.4
... -> /usr/local/bin/python3.4

This ensures that changes to the outside system will not affect the virtual environment. Even if the outer system upgrades its default python3 to version 3.5, my virtual environment will still explicitly point at version 3.4.

The virtual environment I created with pyvenv starts with no packages installed except for pip and setuptools. Trying to use the pytz package that was installed as a global module in the outside system will fail because it’s unknown to the virtual environment.

(myproject)$ python3 -c 'import pytz'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'pytz'

I can use pip to install the pytz module into my virtual environment.

(myproject)$ pip3 install pytz

Once it’s installed, I can verify it’s working with the same test import command.

(myproject)$ python3 -c 'import pytz'
(myproject)$

When you’re done with a virtual environment and want to go back to your default system, you use the deactivate command. This restores your environment to the system defaults, including the location of the python3 command-line tool.

(myproject)$ deactivate
$ which python3
/usr/local/bin/python3

If you ever want to work in the myproject environment again, you can just run source bin/activate in the directory like before.

Reproducing Dependencies

Once you have a virtual environment, you can continue installing packages with pip as you need them. Eventually, you may want to copy your environment somewhere else. For example, say you want to reproduce your development environment on a production server. Or maybe you want to clone someone else’s environment on your own machine so you can run their code.

pyvenv makes these situations easy. You can use the pip freeze command to save all of your explicit package dependencies into a file. By convention this file is named requirements.txt.

(myproject)$ pip3 freeze > requirements.txt
(myproject)$ cat requirements.txt
numpy==1.8.2
pytz==2014.4
requests==2.3.0

Now imagine you’d like to have another virtual environment that matches the myproject environment. You can create a new directory like before using pyvenv and activate it.

$ pyvenv /tmp/otherproject
$ cd /tmp/otherproject
$ source bin/activate
(otherproject)$

The new environment will have no extra packages installed.

(otherproject)$ pip3 list
pip (1.5.6)
setuptools (2.1)

You can install all of the packages from the first environment by running pip install on the requirements.txt that you generated with the pip freeze command.

(otherproject)$ pip3 install -r /tmp/myproject/requirements.txt

This command will crank along for a little while as it retrieves and installs all of the packages required to reproduce the first environment. Once it’s done, listing the set of installed packages in the second virtual environment will produce the same list of dependencies found in the first virtual environment.

(otherproject)$ pip list
numpy (1.8.2)
pip (1.5.6)
pytz (2014.4)
requests (2.3.0)
setuptools (2.1)

Using a requirements.txt file is ideal for collaborating with others through a revision control system. You can commit changes to your code at the same time you update your list of package dependencies, ensuring they move in lockstep.

The gotcha with virtual environments is that moving them breaks everything because all of the paths, like python3, are hard-coded to the environment’s install directory. But that doesn’t matter. The whole purpose of virtual environments is to make it easy to reproduce the same setup. Instead of moving a virtual environment directory, just freeze the old one, create a new one somewhere else, and reinstall everything from the requirements.txt file.

Things to Remember

Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
Virtual environments are created with pyvenv, enabled with source bin/activate, and disabled with deactivate.
You can dump all of the requirements of an environment with pip freeze. You can reproduce the environment by supplying the requirements.txt file to pip install -r.
In versions of Python before 3.4, the pyvenv tool must be downloaded and installed separately. The command-line tool is called virtualenv instead of pyvenv.

Item 53: Use Virtual Environments for Isolated and Reproducible Dependencies

The pyvenv Command

Reproducing Dependencies

Things to Remember

The `pyvenv` Command