Python Machine Learning Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

There's more...

In the last two recipes, Label encoding and One-hot encoding, we have seen how to transform data. Both methods are suitable for dealing with categorical data. But what are the pros and cons of the two methodologies? Let's take a look:

  • Label encoding can transform categorical data into numeric data, but the imposed ordinality creates problems if the obtained values are submitted to mathematical operations.
  • One-hot encoding has the advantage that the result is binary rather than ordinal, and that everything is in an orthogonal vector space. The disadvantage is that for high cardinality, the feature space can explode.