We left off by introducing a couple tools from Scipy and Matplotlib for basic data processing and visualization.
In this case, we had a very small and simple data set - roughly 100 differents numbers - so performance wasn't an issue. However, as the data sets grow, the overhead of using Python compared to a compiled language like C++ starts to show.
For example, if we wanted to do some image processing for a desktop wallpaper image (say 1280 x 1024 pixels large), just to access every pixel is quite a few steps. If we wanted to do something like apply a Gaussian blur on a small neighborhood of each pixel using a convolution, we've just increased the number of steps by an order of magnitude!
Another example is, if we try time-stepping a hyperbolic PDE like an advection equation using a finite difference method using a fine grid, we need to do a potentially large number of operations per step, so we'd like to make this as fast as possible to get any sort of long term evolution.
To overcome this, we're going to introduce Numpy arrays as a way to get close to compiled level performance while still bring able to use a high-level description.
We'll start out getting comfortable with the basic construction, manipulation and operations of arrays.
Let's start with a simple example and define a couple Numpy arrays. This is going to seem similar to lists at first.
import numpy as np
A = np.array([1., 2., 3., 4.])
B = np.array([2., 5., 1., 3.])
print A
print B
If I want to get fancier, I can even define multidimensional arrays! For example, let's define a 2d array:
A = np.array([
[1., 2., 3.],
[4., 5., 6.],
])
print A
The way Numpy refers to these dimensions is via the shape attribute. If we want to check the dimensions of an array, we just use:
A.shape
As you can imagine, it would be tedious to design arrays by hand like this. Luckily, there are a number of convinience methods for constructing commonly used arrays. Let's start out with the analogue of the familiar range function.
n = np.arange(1, 10)
n
n = np.arange(1, 10, 2)
n
Another very common variation on this is doing a uniform subdivision of an interval:
x = np.linspace(0, 1, 10)
x
This is frequently used to compute a bunch of values over an interval. For example:
x = np.linspace(-1, 1)
y = x**3 - x/2.0
In order to help see what's happening, let's verify that this looks right by plotting it:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, y)
plt.show()
Just like we saw with Python lists, we can access and slice elements in arrays. This time, we have more freedom since arrays can be multidimensional. As a starting example, let's try filling in some different regions of a zero array.
Z = np.zeros((10, 10))
Z[2, 1] = 1.0
Z
Z = np.zeros((10, 10))
Z[3, :] = 1.0
Z
Z = np.zeros((10, 10))
Z[:, 3] = 1.0
Z
Z = np.zeros((10, 10))
Z[4, :] += 1.0
Z[:, 3] += 1.0
Z
Just like with lists, I can also take a slice from an array. This time slices can be combined per dimension.
Z = np.zeros((10, 10))
Z[3:7, 5:8] = 1.0
Z
Finally, striding works just like before. For example, if I want an alternating sequence of 1 and -1, I can use:
x = np.empty(12)
x[::2] = 1.0
x[1::2] = -1.0
x
We've already hinted at some basic arthimetic operations using linspace. It turns out that these work generally for arrays. We'll also see that many standard functions also have "vectorized" versions which work with arrays.
A = np.array([
[1., 2., 3.],
[4., 5., 6.],
])
B = np.array([
[1., 5., 2.],
[6., 4., 2.],
])
A + B
A - B
A * B
Notice that this is entrywise multiplication! Perhaps this isn't what you were expecting? In order to do matrix-style multiplication, we instead use the dot function. This works for both matrix-vector and matrix-matrix multiplication.
A = np.array([
[1., 4., 2.],
[5., 2., 1.],
[6., 2., 1.],
])
x = np.array([1., 2., 3.])
np.dot(A, x)
B = np.array([
[1., 4., 1.],
[2., 1., 5.],
[3., 3., 9.],
])
np.dot(A, B)
It turns out that arrays are also a little "smarter" than just this, and support a feature called broadcasting which automatically allows certain arrays of different shapes to be "stretched" to fit another shape.
For very sophisticated uses, this can be tricky to get right so we won't get into it in-depth here. However, it is useful in simple cases which we'll look at here:
A + 3.0 # added to each entry
v = np.array([1., 4., 3.])
A + v # added to each row
v = np.array([1., 2., 3.])
w = np.array([10., 20., 30.])
v[:, np.newaxis] + w[np.newaxis, :] # all pairs sum
As mentioned before, many common functions are availible in "vectorized" form.
x = np.linspace(-5, 5, 100)
y = np.sin(2*np.pi*x) * x
print y
You can see this computes sin at each x value. Although we could do the same using a list, this approach is much faster (and arguably simpler once you get used to it). To verify this, let's plot the function:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.plot(x, y)
plt.show()
This is nice but Numpy's power really starts to show on larger data sets. Let's take a step from 1d to 2d computations. As a starting example, we'd like to evaluating and plot the function $f(x, y) = x^2 - y^2$ on the domain $[-2, 2] \times [-2, 2]$.
To do this, we need the analogous of linspace in higher dimensions. We'll form such a thing using a new function called meshgrid.
x, y = np.meshgrid(np.linspace(-2, 2, 100),
np.linspace(-2, 2, 100))
f = x**2 - y**2
How do we plot this? We'll introduce a new plotting function called pcolormesh.
plt.figure()
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.pcolormesh(x, y, f)
plt.show()
This is a good start! However, something about this is asthetically unsatisfying...it doesn't do a good job highlighting an important aspect of the function...the zero set. Let's see how we can make a quick change to improve this.
plt.figure()
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.pcolormesh(x, y, f, cmap='RdBu')
plt.show()
As an aside, Matplotlib supports a number of different colormaps. Some of these have particular purposes. There are three general classes called sequential, diverging and qualitative. In this case, I'm using the fact the diverging colormaps are good at highlighting a distinct central value in a dataset. If you're interested, you can learn more about this here!
When working with arrays, it's not uncommon to have to spend some time getting things into the right "shape" or format that you want. Numpy provides a some functions which help with this sort of thing. The first two we'll look at have to do with reshaping an array.
A = np.arange(1, 10)
print A
print A.shape
B = A.reshape(3, 3)
print B
print B.shape
You can see that we've taken our array of 9 elements and "wrapped" it into a 3-by-3 array. To undo this, we could either use reshape again, or we can use the more common ravel:
print np.ravel(B)
print B.ravel() # equivalent but a little shorter.
It's also common to have a few different arrays that you want to merge or "stack" them together in some way or another. Let's look at how to stack arrays "vertically", "horizontally" and "column-wise".
A = np.array([1., 2., 3.])
B = np.array([4., 1., 2.])
np.vstack([A, B])
np.hstack([A, B])
np.column_stack([A, B])
Aside from arthimetic-type operations, Numpy also provides many other things like finding the minimum or maximum value in an array, sorting arrays, adding up all the entries of an array and much, much more. Let's go ahead and take a look at some of the basics.
A = np.random.randint(1, 10, 5)
A
print np.min(A)
print np.max(A)
np.sort(A)
An interesting variation on these is given by the "argument" version. Instead of picking out the min, max or sorting, we can compute the indices which would arise from these.
print np.argmin(A)
print np.argmax(A)
print np.argsort(A)
As a closing topic, let's take a second look at how we can index elements in an array. This time, we're going to use an array to index another.
A = np.arange(1, 10)**2
A
i = np.array([1, 5, 3, 2, 3, 1])
A[i]
I can also do this in multiple dimensions!
A = np.zeros((5, 5))
i = np.arange(5)
A[i, i] = 1.0
A
A = np.arange(1, 10).reshape(3, 3)
A
Finally, I can even index using multi-dimensional arrays!
i = np.array([
[0, 0],
[1, 1],
])
j = np.array([
[0, 1],
[2, 0],
])
A[i, j]
This takes some care to use correctly, however... Since this is just an introduction, we'll leave this as a look at what's to come.