module-1

Module 1 – Introduction to Data Science: Introduction to fault-tolerant distributed file systems and computing. The whole data science process illustrated with industrial case-studies. A practical introduction to the scalable data processing to ingest, extract, load, transform, and explore (un)structured datasets. Scalable machine learning pipelines to model, train/fit, validate, select, tune, test, and predict or estimate in an unsupervised and supervised setting using nonparametric and partitioning methods such as random forests. Introduction to distributed vertex-programming.

View the Project on GitHub ScaDaMaLe/module-1

Module 1 of Scalable Data Science and Distributed Machine Learning

Module 1 – Introduction to Data Science: Introduction to fault-tolerant distributed file systems and computing.

The whole data science process illustrated with industrial case-studies. A practical introduction to the scalable data processing to ingest, extract, load, transform, and explore (un)structured datasets. Scalable machine learning pipelines to model, train/fit, validate, select, tune, test, and predict or estimate in an unsupervised and supervised setting using nonparametric and partitioning methods such as random forests. Introduction to distributed vertex-programming.

COURSE CONTENT