Information

Section: Part 1 - Numerical data
Goal: Understand how a numerically based dataset is structured with the example of logistic data, gauge the quality of the dataset, clean it and understand how it affects the result on a logistic problem.
Time needed: 3 h 30 min
Prerequisites: Introduction about machine learning

Part 1 - Numerical dataΒΆ

In this part, we will explore the topic of data quality in a numerically based dataset. As an example, we are are using AIS data (Automatic Identification System), coming from an automatic tracking system for ships.

The section will go as follow:

  • Presentation of AIS data: general introduction of the data and study of the actual dataset we will be using

  • Problems with missing data: two different problems will be solved and dealing with missing data will be discussed

  • Problems with biased data: a few ways to misuse the data will be developped and questionned

  • General conclusion, quiz and practical task