Information

Section: How to deal with missing values?
Goal: Get an idea of the different ways to deal with missing values and when to use them.
Time needed: 10 min
Prerequisites: AIS data, basics about machine learning

How to deal with missing values?ΒΆ

In general, there are 3 ways of dealing with missing values:

  • recover the value: sometimes, it is possible to recover the value, either by guessing or predicting with another machine learning algorithm. For example, for a broader task where we would use the attribute length and width for prediction: we can predict the missing width from the length attribute, with a good model (like the one we built in this course, for example).

  • set up a constant value: just like we filled the missing values with zeros, it is possible to set up any constant value for the missing values. This works best if the attribute containing the missing values is not the only one used in the model, and a change of it would not change too much the task we want to solve. For example, taking the mean of the series, or the median value, could work.

  • drop the columns or the rows containing missing values: if an attributes has way too many missing values, it can make sense to just ignore it, and not include it in the model we build. On the other hands, if some rows contain missing values for a lot of different attributes, it can make sense to drop them.

The way of dealing with missing values depends on the dataset, the number of missing values and the type of problem we want to solve. There is no perfect answer, what is most important is to build a model that is thought through and that makes sense in the context it will be used.

Use the following quiz to test your newly acquired knowledge:

from IPython.display import IFrame
IFrame("https://h5p.org/h5p/embed/755321", "694", "600")