B9. Packages and modules


The Python Standard Library has lots of built-in modules that contain useful functions and data types for doing specific tasks. You can also use modules from outside the standard library. And you will undoubtedly write your own modules!

A module is contained in a file that ends with .py. This file can have classes, functions, and other objects.

A package contains several related modules that are all grouped together under one name. We will extensively use the NumPy, SciPy, Bokeh, and biocircuits packages. As such, the first module we will consider is NumPy. We will revisit NumPy with its own section.

Example: You want to compute the mean and median of a list of numbers

Say you have a list of numbers and you want to compute the mean. This happens all the time; you repeat a measurement multiple times and you want to compute the mean. We could write a function to do this.

[1]:
def mean(values):
    """Compute the mean of a sequence of numbers."""
    return sum(values) / len(values)

And it works as expected.

[2]:
print(mean([1, 2, 3, 4, 5]))
print(mean((4.5, 1.2, -1.6, 9.0)))
3.0
3.275

In addition to the mean, we might also want to compute the median, the standard deviation, etc. These seem like really common tasks. Remember my advice: if you want to do something that seems really common, a good programmer (or a team of them) probably already wrote something to do that. Means, medians, standard deviations, and lots and lots and lots of other numerical things are included in the Numpy package. To get access to it, we have to import it.

[3]:
import numpy

That’s it! We now have the numpy module available for use. Remember, in Python everything is an object, so if we want to access the methods and attributes available in the numpy module, we use dot syntax. In a Jupyter you can type

numpy.

(note the dot) and hit tab, and we will see what is available. For Numpy, there is a huge number of options!

So, let’s try to use Numpy’s numpy.mean() function to compute a mean.

[4]:
print(numpy.mean([1, 2, 3, 4, 5]))
print(numpy.mean((4.5, 1.2, -1.6, 9.0)))
3.0
3.275

Great! We get the same values! Now, we can use the numpy.median() function to compute the median.

[5]:
print(numpy.median([1, 2, 3, 4, 5]))
print(numpy.median((4.5, 1.2, -1.6, 9.0)))
3.0
2.85

This is nice. It gives the median, including when we have an even number of elements in the sequence of numbers, in which case it automatically interpolates. It is really important to know that it does this interpolation, since if you are not expecting it, it can give unexpected results. So, here is an important piece of advice:

Always check the doc strings of functions.

We can access the doc string of the numpy.median() function in JupyterLab by typing

numpy.median?

and looking at the output. An important part of that output:

Notes
-----
Given a vector ``V`` of length ``N``, the median of ``V`` is the
middle value of a sorted copy of ``V``, ``V_sorted`` - i
e., ``V_sorted[(N-1)/2]``, when ``N`` is odd, and the average of the
two middle values of ``V_sorted`` when ``N`` is even.

This is where the documentation tells you that the median will be reported as the average of two middle values when the number of elements is even. Note that you could also read the documentation here, which is a bit easier to read.

The as keyword

We use Numpy all the time. Typing numpy over and over again can get annoying. So, it is common practice to use the as keyword to import a module with an alias. Numpy’s alias is traditionally np, and this is the only alias you should ever use for Numpy.

[6]:
import numpy as np

np.median((4.5, 1.2, -1.6, 9.0))
[6]:
2.85

We do things this way, though some purists differ. We will use traditional aliases for major packages like Numpy (np) and Pandas (pd), a package we will encounter later.

Third party packages

Standard Python installations come with the standard library. Numpy and other useful packages are not in the standard library. Outside of the standard library, there are several packages available. Several. Ha! There are currently (March 26, 2023) about 442,000 packages available through the Python Package Index, PyPI. Usually, you can Google what you are trying to do, and there is often a third party module to help you do it. The most useful (for scientific computing) and thoroughly tested packages and modules are available using conda. Others can be installed using pip.

Computing environment

[7]:
%load_ext watermark
%watermark -v -p numpy,jupyterlab
Python implementation: CPython
Python version       : 3.10.9
IPython version      : 8.10.0

numpy     : 1.23.5
jupyterlab: 3.5.3