Sequence Analysis and Omics

Overview

Credit value: 30 credits at Level 7
Convenor: Dr Irene Nobeli
Assessment: open-book online tests (60%) and coursework (40%)

Module description

The analysis of biological sequences is at the core of much of bioinformatics. Sequence analysis is also central to processing large-scale datasets of genes, transcripts and proteins in biological samples, produced by recent advances in experimental methods, and linking them to biological processes, phenotypic differences and disease.

In this module we cover classical methods of biological sequence analysis and their applications to the problems of modern biology. We also discuss different aspects of molecular evolution, from sequence to structure and function. Through a series of practicals, you are introduced to major online bioinformatics resources (e.g. Ensembl) and how they can be queried to answer questions relating to biological sequences and the data stored on these sequences. Additional practicals reinforce your programming skills through short coding challenges in the context of sequence analysis.

In the second half of the module, you are introduced to experimental methods of surveying molecules in biological samples and provided with basic training in the skills required to analyse the large-scale data generated by these methods. Currently, approximately one quarter of the module is dedicated to lectures covering the applications and challenges of next generation sequencing. The corresponding practicals reinforce your skills in R and introduce you to several popular Bioconductor packages, as well as unix-based software used to process NGS data. You will also be introduced to proteomics and immunoinformatics.

Indicative syllabus

Introduction to the module; genome organisation and function
Measuring sequence similarity
Optimal vs heuristic methods for pairwise sequence alignment
Multiple sequence alignment
Profiles, position-specific score matrices and motif discovery
Hidden Markov Models in sequence analysis
Models for analysing RNA sequences
Sequence alignment in the next-generation sequencing era
Comparing and classifying protein domain structures
Evolution of protein function
Introduction to mass spectrometry and its application to proteomics
Introduction to high-throughput sequencing technologies (NGS) and computational analysis of relevant data:

Genomics
Transcriptomics
Immunoinformatics

Learning objectives

By the end of this module, you will be able to:

describe the basics of genome organisation (with emphasis on the human genome)
understand the differences between genes, transcripts and proteins and be familiar with the biological mechanisms linking them
define and differentiate between the concepts of homology and sequence similarity
describe the general principles of algorithms used to align sequences (both optimal and heuristic approaches)
understand how dynamic programming can be used in the context of sequence analysis and describe basic algorithms used in dynamic programming
build a position weight matrix from an alignment of biological sequences
describe how Hidden Markov Models (HMMs) are constructed and how they are used to represent protein families
understand the basics of protein structure classification schemes and how we use such schemes to annotate new genes and proteins
use a number of established genome browsers and bioinformatics servers to extract information on genes and proteins
write basic Python programs to carry out simple tasks in sequence analysis
describe the fundamentals and applications of selected high-throughput technologies ('omics')
outline the computational steps required to process and analyse data derived from omics technologies, including but not necessarily limited to proteomics, genomics and transcriptomics
distinguish between the various file formats common in omics applications and demonstrate an understanding of how they are used
apply bioinformatics pipelines for pre-processing and cleaning up the raw NGS data, demonstrating competence in the application of relevant software and in critically analysing the outputs in different contexts
describe and use basic statistical methods applied in the context of analysing high-throughput data in the various omics fields.