Machine Learning for OpenCV
上QQ阅读APP看书,第一时间看更新

Understanding NumPy arrays

You might already know that Python is a weakly-typed language. This means that you do not have to specify a data type whenever you create a new variable. For example, the following will automatically be represented as an integer:

In [5]: a = 5

You can double-check this by typing as follows:

In [6]: type(a)
Out[6]: int
As the standard Python implementation is written in C, every Python object is basically a C structure in disguise. This is true even for integers in Python, which are actually pointers to compound C structures that contain more than just the raw integer value. Therefore, the default C data type used to represent Python integers will depend on your system architecture (that is, whether it is a 32-bit or 64-bit platform).

Going a step further, we can create a list of integers using the list() command, which is the standard multielement container in Python. The range(x) function will spell out all integers from 0 up to x-1:

In [7]: int_list = list(range(10))
... int_list
Out[7]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Similarly, we can create a list of strings by telling Python to iterate over all the elements in the integer list, int_list, and applying the str() function to each element:

In [8]: str_list = [str(i) for i in int_list]
... str_list
Out[8]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

However, lists are not very flexible to do math on. Let's say, for example, we wanted to multiply every element in int_list by a factor of 2. A naive approach might be to do the following--but see what happens to the output:

In [9]: int_list * 2
Out[9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python created a list whose content is simply all elements of int_list produced twice; this is not what we wanted!

This is where NumPy comes in. NumPy has been designed specifically to make array arithmetic in Python easy. We can quickly convert the list of integers into a NumPy array:

In [10]: import numpy as np
... int_arr = np.array(int_list)

... int_arr
Out[10]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Let's see what happens now when we try to multiply every element in the array:

In [11]: int_arr * 2
Out[11]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

Now we got it right! The same works with addition, subtraction, division, and many other functions.

In addition, every NumPy array comes with the following attributes:

  • ndim: The number of dimensions
  • shape: The size of each dimension
  • size: The total number of elements in the array
  • dtype: The data type of the array (for example, int, float, string, and so on)

Let's check these preceding attributes for our integer array:

In [12]: print("int_arr ndim: ", int_arr.ndim)
... print("int_arr shape: ", int_arr.shape)
... print("int_arr size: ", int_arr.size)
... print("int_arr dtype: ", int_arr.dtype)

Out[12]: int_arr ndim: 1
... int_arr shape: (10,)
... int_arr size: 10
... int_arr dtype: int64

From these outputs, we can see that our array contains only one dimension, which contains ten elements, and all elements are 64-bit integers. Of course, if you are executing this code on a 32-bit machine, you might find dtype: int32.