Python Machine Learning Cookbook（Second Edition）

上QQ阅读APP看书，第一时间看更新

How to do it...

Let's see how to scale data in Python:

Let's start by defining the data_scaler variable:

>> data_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))

Now we will use the fit_transform() method, which fits the data and then transforms it (we will use the same data as in the previous recipe):

>> data_scaled = data_scaler.fit_transform(data)

A NumPy array of a specific shape is returned. To understand how this function has transformed data, we display the minimum and maximum of each column in the array.

First, for the starting data and then for the processed data:

>> print("Min: ",data.min(axis=0))
>> print("Max: ",data.max(axis=0))

The following results are returned:

Min: [ 0. -1.5 -1.9 -5.4]
Max: [3. 4. 2. 2.1]

Now, let's do the same for the scaled data using the following code:

>> print("Min: ",data_scaled.min(axis=0))
>> print("Max: ",data_scaled.max(axis=0))

The following results are returned:

Min: [0. 0. 0. 0.]
Max: [1. 1. 1. 1.]

After scaling, all the feature values range between the specified values.

To display the scaled array, we will use the following code:

>> print(data_scaled)

The output will be displayed as follows:

[[ 1.          0.          1.          0.        ] 
 [ 0.          1.          0.41025641  1.        ]
 [ 0.33333333  0.87272727  0.          0.14666667]]

Now, all the data is included in the same interval.