Many new Python programmers rely on their system install of Python to run their scripts. There are several good reasons to stop using the system Python. First, it’s probably an old version of Python. Secondly, if you install 3rd party packages with pip, every package is installed into the same globally accessible directory. While this may sound convenient, it causes problems if you (1) install different packages with the same name (2) need to use different versions of the same package (3) upgrade your operating system (OS X will delete all the packages you have installed).
For many years, best practice for Python developers was to use virtualenv to create a sandbox-ed environment for each project. If you use virtualenv, each project you work on can have its own version of Python with its own 3rd party packages (hopefully specified in an
requirements.txt file). In my experience, getting started with virtualenv is cumbersome and confusing; to this day, I have to look up the command to create a Python 3 virtualenv.1
In 2015, I have almost exclusively used Python installations provided through Continuum Analytics’s Conda/Anaconda platform. I have also switched from using virtualenvs to using conda environments, and I am loving it.
Before explaining my workflow, here’s a quick glossary of the similarly-named products that Continuum offers.
- conda: “Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.“2 A conda install provides a whole suite of command line tools for installing and managing packages and environments. Because conda works for any software, it can even install different versions of Python (unlike pip).
- Anaconda: “Anaconda is a completely free Python distribution (including for commercial use and redistribution). It includes more than 300 of the most popular Python packages for science, math, engineering, and data analysis.” It is available across platforms and installable through a binary.
- Anaconda Cloud: Also known as Anaconda.org and formerly known as Binstar, “Anaconda Cloud is a package management service where you can host software packages of all kinds.” Anaconda Cloud is a package repository analogous to PyPI. Packages are installed via the conda command line tool instead of Pip. By default, the
conda installcommand installs packages from a curated collection of packages (a superset of those in Anaconda). Continuum allows users to host their own packages on Anaconda Cloud; these packages can also be installed through
conda installusing the
-nflag with the username.
Conda, Anaconda, and Anaconda cloud are distinct but interrelated tools; keeping them straight can be hard, but is helpful.
Conda (the package manager) can be installed in two ways. Through the Miniconda installer or the Anaconda installer. Both install the package manager, but the latter also installs the 300+ packages for scientific Python. (Installing Anaconda is equivalent to installing Miniconda and then running
conda install anaconda.)
Conda Environment Files
It has become standard for pip users to create a
requirements.txt file for specifying dependencies for a particular project. Often, a developer working a project will (1) create and activate a virtual environment (2) run
pip install -r requirements.txt to build an isolated development environment with the needed packages.
Conda provides an analogous (but more powerful) file:
environment.yml file might look like this:
name: numpy-env dependencies: - python=3 - numpy
If you are in a directory containing this file, you can run
$ conda env create to create a Conda environment named
numpy-env that runs Python 3 and has numpy installed4. Run
$ source activate numpy-env to activate this environment. Once activated, running
$ python will run Python 3 from your environment instead of the globally installed Python for your system. Moreover, you will be able to
import numpy but not any of the 3rd party packages installed globally.
environment.yml can also install packages via pip with this syntax:
name: pip-env dependencies: - python - pip - pip: - pypi-package-name
environment.yml files as a positive development from
requirements.txt files for several reasons. Foremost, they allow you to specify the version of Python you want to use. At Pydata NYC 2015, many presenters provided their code in Github repositories without specifying anywhere whether they were using Python 2 or 3. Because I included a YAML file, attendees could see exactly what version I was using and quickly install it with
conda env create. I also like being able to specify the name of the environment in the file; this is particularly helpful when working with others. Finally, because conda can install from PyPI via pip,
environment.yml files provide no less functionality than a
requirements.txt file provides.
My Python Environment Workflow
Lately, whenever I am working on a new project (however big or small), I follow the following steps:
- Create a project folder in the
~/repos/directory on my computer.
- Create an
environment.ymlfile in the directory. Typically the environment name will be the same as the folder name. At minimum, it will specify the version of Python I want to use; it will often include
anacondaas a dependency.5
- Create the conda environment with
$ conda env create.
- Activate the conda environment with
$ source activate ENV_NAME.
- Create a
.envfile containing the line
source activate ENV_NAME. Because I have autoenv installed, this file will be run every time I navigate to the project folder in the Terminal. Therefore, my conda environment will be activated as soon as I navigate to the folder.
$ git initto make the folder a Git repository. I then run
$ git add environment.yml && git commit -m 'initial commit'to add the YAML file to the repository.
- If I want to push the repository to Github, I use
$ git createusing Github’s hub commands. I then push the master branch with
$ git push -u origin master.
As I add dependencies to my project, I try to be sure I add them to my
A major benefit of all this is how easily reproducible a development environment becomes. If a colleague or conference attendee wants to run my code, they can setup the dependencies (including Python version) by (1) cloning the repository, (2) running
$ conda env create, (3) running
$ source activate ENV_NAME. It’s easy enough for me to drop those instructions and further instructions for running the code in a README file. If I’m feeling especially helpful, I’ll create a Makefile or Fabfile to encapsulate commands for core functionality of the code.
An even larger benefit is that I can return to a project after, days, months, or years and quickly start developing without first having to hunt for
I’ve come to love
environment.yml files, and I think you might too.
- virtualenv also provides no helping in actually managing Python versions. You have to install each version yourself and then tell virtualenv to use it. [return]
- From the conda docs. [return]
- Though there is currently a pull request for adding
requirements.txtsupport to conda: https://github.com/conda/conda-env/pull/172. [return]
- Numpy will be installed from a binary from Anaconda Cloud, not built from source. [return]
- I created a bash command
conda-env-fileto automatically create an
environment.ymlfile named after the current directory. [return]