data:image/s3,"s3://crabby-images/da60d/da60dc1d9e0e8f4828b2d805b76ba022f34ed86c" alt="Machine Learning with Scala Quick Start Guide"
General issues in machine learning models
When we use this input data for the training, validation, and testing, usually the learning algorithms cannot learn 100% accurately, which involves training, validation, and test error (or loss). There are two types of error that one can encounter in a machine learning model:
- Irreducible error
- Reducible error
The irreducible error cannot be reduced even with the most robust and sophisticated model. However, the reducible error, which has two components, called bias and variance, can be reduced. Therefore, to understand the model (that is, prediction errors), we need to focus on bias and variance only:
- Bias means how far the predicted value are from the actual values. Usually, if the average predicted values are very different from the actual values (labels), then the bias is higher.
- An ML model will have a high bias because it can't model the relationship between input and output variables (can't capture the complexity of data well) and becomes very simple. Thus, a too-simple model with high variance causes underfitting of the data.
The following diagram gives some high-level insights and also shows what a just-right fit model should look like:
data:image/s3,"s3://crabby-images/822f7/822f7da6cfd1171205202392b575d4209738e8a8" alt=""
Variance signifies the variability between the predicted values and the actual values (how scattered they are).
An ML model usually performs very well on the training set but doesn't work well on the test set (because of high error rates). Ultimately, it results in an underfit model. We can recap the overfitting and underfitting once more:
- Underfitting: If your training and validation error are both relatively equal and very high, then your model is most likely underfitting your training data.
- Overfitting: If your training error is low and your validation error is high, then your model is most likely overfitting your training data. The just-rightfit model learns very well and performs better on unseen data too.
Now we know the basic working principle of an ML algorithm. However, based on problem type and the method used to solve a problem, ML tasks can be different, for example, supervised learning, unsupervised learning, and reinforcement learning. We'll discuss these learning tasks in more detail in the next section.