Information
Section: Part 1 - Numerical data
Goal: Understand how a numerically based dataset is structured with the example of logistic data, gauge the quality of the dataset, clean it and understand how it affects the result on a logistic problem.
Time needed: 3 h 30 min
Prerequisites: Introduction about machine learning
Part 1 - Numerical dataΒΆ
In this part, we will explore the topic of data quality in a numerically based dataset. As an example, we are are using AIS data (Automatic Identification System), coming from an automatic tracking system for ships.
The section will go as follow:
Presentation of AIS data: general introduction of the data and study of the actual dataset we will be using
Problems with missing data: two different problems will be solved and dealing with missing data will be discussed
Problems with biased data: a few ways to misuse the data will be developped and questionned
General conclusion, quiz and practical task