Data Quality Explored
Data Quality Explored
Introduction
What is machine learning?
Supervised learning used in this course
Main types of graphs used in this course
.csv files - data format used in the course
Get started with Python
Syntax and arithmetic
Strings
Lists and indexes
Control flow statement
Functions
The Pandas library
Plot graphics
Part 1 - Numerical data
The AIS data - messages sent by a ship along its trip
Understand the data
Examine the datasets
Missing data
Predict the width of a ship
Predict the mean speed from the type of vessel
How to deal with missing values?
Biased data
Type of task and predicted attributes
Attributes selection
Prediction from all data vs. split dataset
Conclusion
Practical task
Part 2 - Image data
Introduction to image classification
Image representation
Histogram of oriented gradients (HOG)
Image classification
Single image quality
Contrast
Edge detection
Color distribution
Quality of an image dataset
Data augmentation
Part 3 - Text data
Sentiment analysis in Tweets
Simple tweet preprocessing
More complex transformations of text
Put everything together
Appendix
Modification to raw AIS data from U.S. Marine Cadastre
Machine learning algorithms and functions used for numerical prediction
Imprint
Data Privacy
repository
open issue
Index