My Numpy cookbook

My Numpy cookbook

What is Numpy

Numpy, a python library, a wrapper around C and Fortran code, very fast. It focuses on matrices called ndarrays (N-dimensional array).

I have used it a lot during my studies at GA Tech and recently had to use it to do some data analysis and then I realize that I forgot so many things about the basic operations.

I had to dive into some old school materials in order to get back on track.

Numpy basic recipes

Below is a very quick and simple cookbook, that I have prepared for myself obviously :-) But I also hope it can be helpful to anybody else.

I know, I know.. there is no figure whatsoever. Graphic representation is always the best. You can check lesson 4 of this course on udacity.com, The power of Numpy. It's free :-)

Assuming we have imported the Numpy library as follow import numpy as np and have nd1 an ndarray that represent a matrix of dimension #row x #col. Assuming all the following operations will be within the limit of the matrix dimension.

  • nd1[0,0] is element at row 0, column 0

  • nd1[-1, 1] is element at last row, column 1

  • nd1[-2, 2] is element at the row preceding the last row, column 2

  • nd1[0:3, 1:7] is the region in the matrix from row 0 to the row before 3 (thus row 2) and column to the column before 7 (thus column 6)

  • a colon (:) with nothing after means all the way to the end

  • nd1[1:, 2], is the region from row 1 to the last, and column 2

  • nd1[:, 1:4] is the region from row 0 to the last (all the rows) and column 1 to column 3.

  • nd1[:, 1:8:2], select all element in row 0 to the last and column 1, 3, 5, 7 (start at col 1, each time increment by 2 until you reach the upper bound)

  • nd1[0,:] = 1, assign value 1 to entire row 0

  • nd1[:, 1] = [1,2,5,9], assign the list [1,2,5,9] to column 1, assuming the number of rows for nd1 is 4. Very important to pay attention to the dimension.

  • np.array([1,2,5,0]), convert the list [1,2,5,0] to 1D array

  • np.array([(2,3), (7,10)]), convert list of list to array (2D array in this example [ [2,3], [7,10] ]

  • a = np.array([(2,3), (7,10)]), a.sum() is sum of all elements which is 22, a.sum(axis=0) is column sums [5, 17], a.sum(axis=1) is row sums [9, 10]

  • similarly we can use a.min(), a.max(), a.mean() same way as the example above

  • np.empty(5), create an empty 1D array

  • np.empty((2,3)), create an empty 2x3 array (2D array)

  • np.ones((2,4)), create an 2x4 array of ones (by default the datatype is double)

  • np.ones((2,2), dtype=np.int_), create an 2x2 array of ones (datatype integer)

  • a is in interval [x, y) means number x<= a < y

  • np.random.random((3,4)), create a 3x2 array of random values uniformly sampled from [0.0, 1.0)

  • np.random.rand(3,4), same as line above (similar to matlab syntax)

  • np.random.normal(size=(2,3)), create a 2x3 array of sample for a gaussian (normal) distribution, 0 mean and unit standard deviation

  • np.random.normal(10, 5, size=(2,3)), same as above but mean=10, standard deviation=5

  • np.random.randint(5), single random integer between [0, 5)

  • np.random.randint(0, 5), single random integer between [0, 5)

  • np.random.randint(0,5, size=5), 5 random integer between [0, 5) as a 1D array

  • np.random.randint(0, 5, size=(2,3)), 2x3 array of random integers between [0, 5)

Assuming we have the following a=np.random.random(5,4)

  • a.shape, returns the shape of the Numpy array where a.shape[0]=5 and a.shape[1]==1

  • a.size, is the number of elements in the Numpy array

Assuming we have the following a=np.array([1,5,4,7,9,11,3])

  • a[ [0,2,3] ], returns [1,4,7] where [0,2,3] is an array of indices passed to the 1D array

  • a[ a < 8], returns the list [1,5,4,7,3] (Numpy masking)

  • a[ a < 8] = 0, replaces all the numbers less than 8 by 0 in the original list, then returns [0,0,0,0,9,11,0]

Further reference

Assuming array A and array B are 2D arrays and are the same shape. Mathematical operations in Numpy arrays are element wise. Thus considering A + B, it will result in a new array where element at position x,y in array A is added to element at x,y in array B. It's the same for A-B, A/B and A*B.

Todo proper matrices multiplication, we need to use Numpy.matmul.

This is not a exhaustive list but this is the basic I have used during my school days and till now whenever am doing some data analysis using Python. For further Numpy functions you can refer to the Numpy reference page.