Data Analytics Using R

Overview

Credit value: 15 credits at Level 7
Convenor: Dr Cen Wan
Assessment: problem-solving worksheets (20%) and a two-hour examination (80%)

Module description

In this module we cover the principle concepts and techniques of data analytics and how to apply them to large-scale data sets. You will develop the core skills and expertise needed by data scientists, including the use of techniques such as linear regression, classification and clustering.

We will show you how to use the popular and powerful data analysis language and environment R to solve practical problems based on use cases extracted from real domains.

Indicative syllabus

Introduction to big data analytics: big data overview, data pre-processing, concepts of supervised and unsupervised learning
Basic statistics: mean, median, standard deviation, variance, correlation, covariance
Linear regression: simple linear regression, introduction to multiple linear regression
Classification: logistic regression, decision trees, SVM
Ensemble methods: bagging, random forests, boosting
Clustering: K-means, K-medoids, Hierarchical clustering, X-means
Evaluation and validation: cross-validation, assessing the statistical significance of data mining results
Selection of advanced topics such as: scalable machine learning, big data related techniques, mining stream data, social networks
Tools: R

Learning objectives

By the end of this module, you will be able to:

demonstrate knowledge of advanced aspects of big data analytics
apply appropriate machine learning techniques to analyse big data sets
assess the statistical significance of data mining results
use the open-source tool R to perform basic data mining tasks on big data.