Foundations of Data Science I

Overview

Credit value: 15 credits at Level 4
Convenor: Dr Felix Reidl
Assessment: programming exercises (30%) and a data analysis mini-project (70%)

Module description

In this module we cover fundamental aspects of data science and analytics. You will develop basic mathematical knowledge and skills including elements of linear algebra, preliminaries for calculus, as well as discrete probability theory and fundamentals of statistics.

We will show you how to use the popular and powerful language Python to solve computational tasks from these mathematical subjects. In particular, you will become acquainted with popular Python libraries and packages for programming to solve problems arising from linear algebra, probability theory and statistics.

Indicative syllabus

Taxonomy of data
Data representation (histograms, box plots)
Measures of central tendency (mode and the modal class, mean, median)
Measures of dispersion (range, interquartile range and percentiles, variance and standard deviation)
Counting and combinatorics (factorial, binomial coefficient)
Discrete probability (random variables, expectation, variance, and correlation)
Conditional probability and Bayes’ Rule
Common discrete distribution families (binomial, geometric, poisson)
Vector spaces (vector operations, scalar product)
Matrix algebra (matrix product, linear transformations)
Metrics
Tools: Python, Jupyter notebooks, pandas, matplotlib

Learning objectives

By the end of this module, you will have:

knowledge of basic linear algebra and matrix theory, basic discrete probability theory and statistics, and relevant Python libraries and packages
skills in programming in Python to solve computational tasks from linear algebra and discrete probability theory
an understanding of the link between the basic knowledge acquired from the module and data science/analytics applications.