# Python Tutorial 4: Common Tasks ¶

## Plotting ¶

Plotting with matplotlib is generally very straightforward, and you can
get most of the documentation you'll ever need from the front page of
the matplotlib website at http://matplotlib.sourceforge.net/. If you can't find the answer on the main page, **be sure to check out the excellent matplotlib examples** at http://matplotlib.sourceforge.net/examples/index.html.

Perhaps the best way to pick-up matplotlib on the fly is to peruse the gallery ( http://matplotlib.sourceforge.net/gallery.html ) and see what's the closest match to what you are looking to do.

The best way to use matplotlib interactively is by running `ipython -pylab`, which loads a special environment into ipython designed for use with matplotlib.

`pylab` and `pyplot` ¶

In the most recent version of matplotlib (0.98, as of this writing),
the name of the module containing the main plotting routines has
changed from `pylab` to `matplotlib.pyplot`. `pylab` still exists as an alias, and it's still `ipython -pylab` (though that might change someday too), but now the "proper" way to access the main plotting interface is `from matplotlib import pyplot` rather than just `import pylab`.

And `matplotlib.pyplot` is where you'll find the routines in the online documentation.

#### Non-interactive plotting ¶

If you're making a complex plot or one you may need to revise a lot, it
can be very useful to put your plotting commands in a script or
function in a module (well, both, by using script-calls-a-function). To
get access to the commands you'd use from within `ipython -pylab`, just `from matplotlib import pyplot`, and put a `pyplot.` in front of all the usual commands.

When plotting non-interactively, you'll need to use `pylab.show()`
to make the plotting windows appear on the screen. Since the actual
drawing is delayed until this is actually called, this can make
non-interactive plotting much faster than interactive plotting if there
are a lot of plot elements.

Instead of making the plots appear in windows and saving them
through the GUI, you can also save figures directly from Python using `pylab.savefig()`.

#### Plotting 2D Intensity Data ¶

There are three main ways of plotting image-like data (there are even more if the image is a 2D histogram):

* ** imshow**. This plots regularly gridded
image data directly, and doesn't accept arrays specifying the x and y
axes (instead it can take a bounding box in the

`extent`keyword argument. Also be aware that the default origin is the

*upper*left corner, so you'll usually want to pass

`origin="lower"`. It also does automatic interpolation by default, so if you want to see the actual pixelization, you'll need to set

`interpolation="nearest"`. This is by far the fastest image plotting routine; if you're actually trying to plot a large image, it's the only way to go.

* ** pcolor**. This may look very similar to the results of

`imshow`for regularly gridded data, but it takes arrays that indicate the x and y positions of each data point, which means it doesn't have to be regularly gridded (in fact, it doesn't even have to be rectangular, but I have no idea how to pass the data in in that case).

`pcolor`actually draws tons of tiny rectangles, so it scales and stretches nicely (but

`imshow`does that too when you turn off interpolation). Note that the x and y arrays you pass

`pcolor`indicate the edges of the data grid - that means they should each be one element longer than the shape of the data array in that dimension.

* ** contour** /

**. These draw contours (unfilled and filled, respectively). They also take x and y arrays to specify the grid points, but unlike**

`contourf``pcolor`these refer to the centers of the grid points, so they should have the same length as the shape of the data array in that dimension.

You may also be interested in ensuring that your axis ratio is 1:1. You can do that with `pyplot.axis("scaled")`.

#### Computing and plotting histograms (1d and 2d) ¶

* For simple 1D histograms, `pyplot.hist` does a pretty good
job. If you want more control, or you don't want a plot made up of
boxes, you can create the histogram array itself with `numpy.histogram` and plot it with the regular `pylab.plot` function.

* For 2D histograms, one option is to use `numpy.histogram2d` to create the histogram array, and then use `pcolor`, `imshow`, or `contour[f]` to plot it. Note that the x and y axes arrays it returns are suitable for use with `pcolor`, and hence to use them with `contour[f]` you have to use the following trick:

importnumpyfrommatplotlibimportpyplot h,hx,hy = numpy.histogram2d(xdata,ydata) cx = 0.5*(hx[:-1] + hx[1:]) cy = 0.5*(hy[:-1] + hy[1:]) pyplot.contourf(cx,cy,h.transpose())

Note that `histogram2d` unfortunately returns an array with dimensions `(x,y)`, instead of the expected `(y,x)`, so you have to transpose it.

* In matplotlib 0.98, there's another option, `pyplot.hexbin`, which computes a 2d histogram and plots it on a hexagonal grid all in one step. It's pretty cool looking, but a little slow.

* A simple scatter plot can also be an excellent stand-in for a true 2d histogram. While you can use `pyplot.scatter` for this, you'll find it's much slower than using `pyplot.plot`
with a dot marker (","). That's because `pyplot.scatter`
has a lot more functionality you generally don't need - it can make all
the scatter points have different colors, sizes, or symbols based data
from additional dimensions. **Really neat trick:** you
might notice when making scatter plots that the dense parts can get
horribly saturated. You can make this a lot better by specifying a low
alpha value (say, `alpha=0.1`) as a keyword to `plot`.
This makes all the points transparent, so they only saturate when many
of them are on top of each other (for alpha=0.1, that number is 10).

#### Colors and Color Maps ¶

All of the 2d plotting functions discussed above use colormaps. You can set the default colormap with functions like `pyplot.hot()` or `pyplot.jet()`, but you can also use a much wider selection by passing one of the objects in `pyplot.cm` as the `cmap` argument of the plot functions. One very useful feature is that for every color map of the form `pyplot.cm.<name>`, there is a reversed colormap `pyplot.cm.<name>_r`.

For other plotting commands, like `plot`, it's most common
to pass in single characters to specify the color:
"b"=blue,"r"=red,"g"=green,"k"=black,"w"=white,"y"=yellow,"m"=magenta,"c"=cyan.
However,
you can also pass in tuples of (R,G,B), where R, G, and B are float
values between 0 and 1. And you can get tuples like that by calling a
colormap object. The example below gets 9 equally spaced color values
from the "Blues" colormap:

colors = [pyplot.cm.Blues(z)forzinnumpy.linspace(0,1,9)] x = numpy.random.randn(50,9) y = numpy.random.randn(50,9)foriinrange(9): pyplot.plot(x[:,i],y[:,i],",",color=colors[i],label="Blue #%i"% (i+1)) pyplot.legend()

#### Plots with error bars ¶

Use `pyplot.errorbar`. 'Nuff said.

#### Log scales ¶

Check out `pyplot.semilogx`, `pyplot.semilogy`, and `pyplot.loglog`.
You can call these with arguments to make the plot, or call them with
no arguments to force other plotting routines to use the log axes for
that plot (this is useful for putting error bars on a log plot; call `errorbar` and one of the log routines with no arguments - in either order).

#### Labels and LaTeX ¶

If you're using a *backend* that supports it, you can include
LaTeX markup in your plot labels and legends. Just put the LaTeX math
in $$ quotes, like you normally would, and put an `r` (for "raw") before the string quotes to tell Python not to try to interpret any backslashes in the string itself. Like this:

pyplot.xlabel(r"$\alpha$")

There are a lot of different matplotlib backends, most with some combination of GUI toolkit (GTK, wxWindows, Qt, Tk) and drawing system (Agg, Cairo). All the Agg backends support LaTeX (like GtkAgg or TkAgg), and you can set which one you use in your {{{.matplotlib/matplotlibrc}}] file if LaTeX doesn't just work for you.

#### Legends ¶

You can add legends with explicit labels for all the plots you made by passing a bunch of labels to `pyplot.legend`. But it's better to just add a `label="something"` keyword argument to all the plots you want to have on the legend, and then call `pyplot.legend` with no arguments.

#### Single plots with multiple axes ¶

http://matplotlib.sourceforge.net/examples/api/two_scales.html

#### Multiple plots with shared axes ¶

http://matplotlib.sourceforge.net/examples/pylab_examples/shared_axis_demo.html

## Additional Numpy Utilities ¶

#### Random numbers ¶

Check out `numpy.random`, obviously).

#### Linear algebra ¶

There's lots of good stuff in `numpy.linalg`, like linear least squares solvers (`lstsq`), singular value decomposition (`svd`), inverses (`inv`), and eigenvalue decomposition (`eig`). All of these use the standard, high-quality LAPACK Fortran routines.

## SciPy ¶

#### Where to get SciPy documentation ¶

* Inline Python help. Be aware that you have to import individual scipy modules separately.

* The new, nice, but *maybe* not-quite-complete online documentation: http://docs.scipy.org/doc/scipy/reference/ (I remember a few months ago it said "work in progress", but now it doesn't seem to say that anymore).

* The old, ugly, but fairly complete automatically-generated online documentation: http://www.scipy.org/doc/api_docs/

#### More linear algebra ¶

Even more LAPACK wrappers can be found in `scipy.linalg` (including all the stuff in `numpy.linalg`).

#### Special Functions and Polynomials ¶

** scipy.special**. Most of these are straightforward

`ufunc`s. Unfortunately, it's a pain to get the documentation. If you do

>>> help(scipy.special.jn)

you'll get a worthless page about `ufunc`s in general. You'll need to do

>>> help(scipy.special)

and scroll down to learn that `scipy.special.jn(n,v)` is a Bessel function with integer order `n` at the real value `v`.

The special polynomials work a little differently. If you do

>>> H = scipy.special.hermite(5)

you'll get a special `numpy.ndarray` subclass: a polynomial object. You can evaluate a polynomial by calling it

>>> H(3.2) # 5th order Hermite polynomial evaluated at 3.2 5878.5382400000062

and you can do a ton of other useful things with it, like integration (`H.integ`), differentiation (`H.deriv`),
and all the usual mathematical operators. You can also compose them by
calling one polynomial with the another polynomial as an argument.

The only "problem" with polynomials is that their coefficients are
backwards. A nth-degree polynomial behaves like an array with shape
(n+1), which makes sense...but you'll note that in the above example, `H[0]` is the coefficient for 5th-order term, and `H[4]` is the 0th order term.

#### Optimization ¶

** scipy.optimize**. You can find a Levenberg-Marquardt non-linear least squares fitter at

`scipy.optimize.leastsq`. There's also a simulated annealing algorithm,

`scipy.optimize.anneal`, if you're worried about local minima. And a whole lot more. My recommendation is to figure out what you want by reading the optimization chapter in Numerical Recipes, and then go and find it in

`scipy`. They're all there.

#### Least Squares Fitting ¶

For most simple fitting problems, `scipy.optimize.leastsq`
is overkill. It's designed for non-linear least squares optimization,
and in a lot of cases (including fitting straight lines or polynomials)
the problem is actually a linear least squares problem. In many more
cases, the problem can be transformed to a linear one (such as fitting
straight lines or polynomials in log space). The fit and covariance
matrix of a linear least squares problem are *much* more robust than those of a non-linear problem, and while `scipy.optimize.leastsq` *should*
give you good answers if you give it a linear problem, it won't if you
don't do the math to transform your problem to linear. Check out the
least squares via SVD discussion in Numerical Recipes for more
information on why.

So whenever you can, make the problem linear, and then use `numpy.linalg.lstsq` (or equivalently, `scipy.linalg.lstsq`). A *linear* least squares problem is often written as a matrix equation `Ax=b`, where `A` is the linear model, `b` is the data, and `x`
are the parameters. You might notice that this doesn't leave any place
for the errors on the data to go (it assumes that the're all unity),
but you can fix that by transforming `A = A/sigma[:,numpy.newaxis]` and `b=b/sigma` and solving the resulting problem.

If all you want to do is fit a polynomial (including a straight line) with no errors, use `scipy.special.polyfit`. It's very straightforward.

If you want to fit a polynomial, but you do have error bars, you'll want to use `numpy.lstsq`, like this:

importscipy.specialimportnumpydefpolyfit_with_err(x,y,sigma,deg):"""Fit a polynomial p(x) = ''p[0] * x**deg + .... + p[deg]'' of degree 'deg' to points '(x,y)' with errors in y given by 'sigma'."""n = numpy.arange(deg,-1,-1) A = numpy.power.outer(x,n)# A = x**nA /= sigma[:,numpy.newaxis]# we made A ourselves so we can overwrite ity = y/sigma# ...but we don't want to overwrite y.v,residues,rank,s = numpy.linalg.lstsq(A,y)returnscipy.special.poly1d(v)# reverse array because of poly1d's screwed up ordering

#### The SciPy Disclaimer ¶

Jim doesn't use SciPy much. That's not because he doesn't like it - he just hasn't needed it much - and in fact he hears from respectable sources that it's quite useful. But he doesn't know that much about it, and that's why he isn't writing about it.

So be aware: there's a lot more stuff available in SciPy, but you'll have to find out about it for yourself.

## Pickle, Shelve, and Persistence ¶

Almost any Python object can be `pickled` - that is, turned
into a string or saved to disk, and then loaded back up later, on some
other computer, as if nothing had ever happened. This is done by the `pickle` module (but actually the `cPickle` module, which is a faster version of `pickle`.

Here's how it works:

importcPickle a = [5,3,2,6,1,2,"hurray!",3.14159,{"is this a dictionary inside a list?":"yes"}] s = cPickle.dumps(a)# s is now a string representation of anew_a = cPickle.loads(s)# new_a is an exact copy of a

or, to save to disk:

>>>importcPickle >>> a = [5,3,2,6,1,2,"hurray!",3.14159,{"is this a dictionary inside a list?":"yes"}] >>> f = open("a.p","w") >>> cPickle.dump(a,f)# save a in pickle format to file "a.p"<close python> >>>importcPickle >>> f = open("a.p","r") >>> a = cPickle.load(f)# a is just like new

The default pickle format is plain text (though not very readable), and
hence totally architecture-independent. If you're pickling big things,
like large numpy arrays,
you'll want to call `dump` with `protocol=2`, which is *way*
faster, and also more compact (but also binary, so it might not be a
good idea if you deal with non-intel Macs or Sparc boxes much).

If you want to save a bunch of little things to one file, ** shelve** provides a dictionary-like object that stores multiple objects, by key, to a single file using pickle. It looks like this:

>>>importnumpy >>>importshelve >>> s = shelve.open("data.s",protocol=2) >>> s['name'] ="a nice little string">>> s['some random data'] = numpy.random.randn(500) >>> s.close() <close python> >>>importshelve >>> s = shelve.open("data.s",protocol=2) >>>'name'] a nice little string

The advantage of this over pickling a dictionary is that with a shelve only the objects you ask for are loaded from disk, and since the files are stored in a database file, it will be pretty efficient at doing so. Still, for big objects, it's probably a better idea to pickle them separately.

A few caveats for `shelve`:

* Always be sure to close your shelve objects. Otherwise changes might not get written to disk.

* Unlike real dictionaries, keys for shelve objects must be strings.

* Don't edit mutable shelve objects in place. In other words:

>>>importshelve >>> s = shelve.open("data.s",protocol=2) >>> s['d'] = {} >>> s['d']['k'] = 5.6# This line doesn't do anything! Do this instead:>>> d = s['d'] >>> d['k'] = 5.6 >>> s['d'] = d >>> s.close()

## Cluster Computing ¶

Jim's favorite MPI interface right now is the absurdly simple `pypar`, found at http://datamining.anu.edu.au/~ole/pypar/.

## Logging ¶

Check out the `logging` module in the standard library. It's really cool.

## Command Line Parsing ¶

Check out the `optparse` module in the standard library.
Just suck it up and use it once. You'll be glad you did, because then
you can just copy and paste that the next time you want to use it.

## Home-Brew Solutions ¶

From Matt Auger: http://gravlens.physics.ucdavis.edu/~auger/software.php

Jim's catalog package (new version with a slightly different interface coming soon, so don't get too attached to it): http://dls.physics.ucdavis.edu/~jbosch/astro_utils/catalogs.tar.gz