Avatar

Sam Robson

Lead Bioinformatician at the Centre for Enzyme Innovation

Centre for Enzyme Innovation, University of Portsmouth

Biography

Dr. Sam Robson is a Senior Research Fellow at the University of Portsmouth and is the Bioinformatics Lead at the Centre for Enzyme Innovation. He is a data scientist and computational biologist with an extensive publication history (including 6 Nature papers) and particular expertise in maintenance, processing and analysis of large whole genome sequencing data sets. Previous to his appointment, he worked as a Bioinformatician in the group of Prof. Tony Kouzarides at the Wellcome Trust/CRUK Gurdon Institute. The main research focus of the lab was to analyse the role of histone and RNA modifications, and in particular their role in diseases such as cancer. Prior to this, he held a Post-Doctoral Fellowship at the Wellcome Trust Sanger Institute in the lab of Dr. Matt Hurles. This work focused on the analysis of large scale copy-number variations in the human genome and their role in common diseases such as breast cancer and Crohn’s Disease. His background is in pure Mathematics having achieved his Bachelor’s degree at the University of Warwick, and he holds a Masters and PhD in Mathematical Biology and Biophysical Chemistry from the MOAC Doctoral Training Centre. He is a Professional Member of the International Society for Computational Biology and a Fellow of the Royal Statistical Society and holds CStat and CSci Professional qualifications.

Bioinformatics Group

The Bioinformatics Group at the University of Portsmouth was formed in 2017 by Dr. Sam Robson. We collaborate across the faculty on research projects utilising powerful techniques such as high-throughput sequencing, which require extensive processing and rigorous statistical analyses. We also work to build bioinformatics tools for use by the wider research community. We work on a variety of different projects and data sets, in diverse fields such as environmental biology, marine biology, microbiology, clinical research projects and paleogenomics.

Bioinformatics Computing

The University of Portsmouth hosts a Bioinformatics-specific compute cluster, with well-maintained pipelines for RNA-seq, ChIP-seq, CLIP-seq, BS-seq, Exome-seq, amplicon sequencing, and other sequencing data types used by researchers throughout the University. The cluster consists of 4 compute nodes and 1 head node. The compute nodes consist of Dell PowerEdge R630 Servers with Intel Xeon E5-2650 v4 Processors (12 cores, 2.2GHz), 128GB RAM and 2x 240GB flash (SSD) storage. This provides a total of 48 cores, or 96 threads (via Hyperthreading). The head node is a Dell PowerEdge R630 Server with 2x Intel Xeon E5-2650 v4 Processors (12 cores, 2.2GHz), 384GB RAM and 2x 240GB flash (SSD) storage. Local storage is provided by a Synology RS3617RPxs NAS Server with 120TB HDD storage, connected to the compute nodes via a 10GbE Network. We also use both Amazon Web Services (AWS) and Google Cloud Platform for cloud computing resources.

Interests

  • Bioinformatics
  • Next generation sequencing
  • Mathematics and Statistics
  • Machine learning and AI
  • Data science
  • Programming
  • Data curation

Education

  • PhD in Mathematical Biology and Biophysical Chemistry, 2008

    University of Warwick

  • MSc in Mathematical Biology and Biophysical Chemistry, 2004

    University of Warwick

  • MSc in Mathematics (Hons), 2003

    University of Warwick

Skills

R

python

perl

MySQL

linux

HTML/PHP

Systems Admin

Mathematics

Statistics

Machine Learning

Biological Sciences

Genome Sequencing

Experience

 
 
 
 
 

Lead Bioinformatician

Centre for Enzyme Innovation, University of Portsmouth

Jun 2019 – Present Portsmouth, UK
I lead a team of bioinformatics researchers using whole-genome and transcriptome sequencing technology to study novel organisms that generate enzymes able to break down substrates such as plastics. We aim to find novel enzymes and work with other reserchers to bioengineer enzymes to help combat the worldwide plastic crisis. The Centre for Enzyme Innovation recently received ~£6 million in funding from the Research England Expanding Excellence Fund.
 
 
 
 
 

Senior Research Fellow (Bioinformatics)

University of Portsmouth

May 2017 – Present Portsmouth, UK
I am a bioinformatics researcher and Faculty Bioinformatics Lead in the Faculty of Science and Health at the University of Portsmouth. I work in collaboration with other researchers within the department on projects with a high bioinformatics component, in particular working with high-throughput sequencing (e.g. ChIP-seq and RNA-seq) data sets. I also maintain my own research interests, develop tools for the wider bioinformatics community, and supervise and teach PhD and MRes students with an interest in developing bioinformatics skills.
 
 
 
 
 

Sam Robson Consulting

Self Employed

May 2014 – Mar 2017 Cambridgeshire, UK
I provided statistical expertise for the largest study of doctor burnout yet conducted. In particular, I took a very large multi-factorial dataset and identified key factors influencing doctor burnout through the use of a variety of statistical methods, including multivariate mixed-effects regression analysis.
 
 
 
 
 

Bioinformatician

Wellcome Trust/CRUK Gurdon Institute

Mar 2010 – Mar 2017 Cambridgeshire, UK
I worked as the lab bioinformatician in the lab of Tony Kouzarides at the the Gurdon Institute, as part of the University of Cambridge. My role was to help in the design of experiments and analyse data in collaboration with other members of the lab. My main focus was in the analysis of high-throughput sequencing data. In particular, the group had a focus on the understanding of the role of histone modifications in diseases such as cancer.
 
 
 
 
 

Statistical/Mathematical Biologist

Wellcome Trust Sanger Institute

Mar 2008 – Mar 2010 Cambridgeshire, UK
I worked under Matt Hurles and as a member of the Wellcome Trust Case-Control Consortium, on identifying potential associations between copy number variants (CNVs) and common disease. Structural variation, whereby large sections of DNA are lost, duplicated, relocated or inverted from one genome to the next, is prevalent in the human genome. Some of these structural variants can affect the number of copies of genes in the diploid genome (CNVs). We looked to see if any of these variants were causal for any of eight common disease by performing a large (22,243 samples) case-control genome-wide association study. My main roles were to maintain, process and normalise this extremely large data set; to perform various optimisation and QC steps for these data to ensure we were working with the cleanest data set possible; and ultimately to look for associations between CNV and common disease by using novel statistical tools developed in-house.

Recent Posts

Deep Learning using TensorFlow Through the Keras API in RStudio

Use of TensorFlow through the Keras API in RStudio to explore deep learning model training

Building a Pokémon Recomendation Machine

Use of multiple machine learning techniques to explore a database of Pokémon, including creation of a recommendation machine and …

Portsmouth Heritage Hub Inaugural Workshop

Introduction to the Portsmouth Heritage Hub and results from the inaugural Heritage Hub meeting at the Mary Rose Museum.

Strava Data Mining: Assessing Mimi Anderson's World Record Run Across the USA

Analysis of data from Strava to assess the validity of data for a World Record Run across the USA

How Predictable Are Ultra Runners?

Further exploration of posts from the Ultra Running Community (URC) Facebook page, including using machine learning techniques …

Projects

*

Biofilm Composition in Prosthetic Joint Infection

Prosthetic joint infection (PJI) represents one of the most common reasons for failure among hip and knee arthroplasty, with an …

Center for Enzyme Innovation

An innovative new Research Centre to identify novel enzymatic solutions to environmental waste problems such as plastic

Portsmouth Heritage Hub

A network for researchers and stakeholders in local historical monuments and buildings to develop collaborative research projects with …

Development of Bioinformatics Tools

Development of in-house bioinformatics tools and analysis models for use in combination with publicly available data analysis software

Genotyping of Ancient DNA from Crew Members from the Mary Rose

Genotyping of ancient DNA from crew members from the Mary Rose to identfy phenotypic traits and disease traits

Effects of Anti-Fouling Coatings on Marine Biofilms

Metatranscriptomic analysis of marine biofilm composition on commercially available and novel anti-fouling substrates

Effects of Radiation Exposure in the Environment

Identification of differentially regulated gene pathways as a result of radiation exposure

Gene Expression Profiling of Duchenne Muscular Dystrophy

Differential gene expression analysis in a model of Duchenne Muscular Dystrophy

Xenopus Development Project

Analysis of development in Xenopus laevis using whole genome analysis

Resources

*

How To Use Python

A tutorial for the use of the data science programming language Python

Sequencing Facilities

A list of commercial and academic sequencing facilities used by members of the faculty

Bioinformatics Tools

A list of commonly used tools for bioinformatics analyses

How To Use R

A tutorial for the use of the statistical programming language R, with a Bioinformatics leaning

Recent Publications

Quickly discover relevant content by filtering publications.

Complete transcriptome assembly and annotation of a critically important amphipod species in freshwater ecotoxicological risk assessment - gammarus fossarum

Because of their crucial role in ecotoxicological risk assessment, amphipods (Crustacea) are commonly employed as model species in a …

Subtle effects of radiation on embryo development of the 3-spined stickleback

The Chernobyl and Fukushima nuclear power plant (NPP) accidents that occurred in 1986 and 2011 respectively have led to many years of …

Total absence of dystrophin expression exacerbates ectopic myofiber calcification and fibrosis and alters macrophage infiltration patterns

Duchenne muscular dystrophy (DMD) causes severe disability and death of young men because of progressive muscle degeneration aggravated …

Interaction of Sox2 with RNA binding proteins in mouse embryonic stem cells

Sox2 is a master transcriptional regulator of embryonic development. In this study, we determined the protein interactome of Sox2 in …

METTL1 Promotes let-7 MicroRNA Processing via m7G Methylation

7-methylguanosine (m7G) is present at mRNA caps and at defined internal positions within tRNAs and rRNAs. However, its detection within …

Dystrophic mdx mouse myoblasts exhibit elevated ATP/UTP-evoked metabotropic purinergic responses and alterations in calcium signalling

Pathophysiology of Duchenne Muscular Dystrophy (DMD) is still elusive. Although progressive wasting of muscle fibres is a cause of …

Phosphorylation of histone H4T80 triggers DNA damage checkpoint recovery

In response to genotoxic stress, cells activate a signaling cascade known as the DNA damage checkpoint (DDC) that leads to a temporary …

Inhibition of the acetyltransferase NAT10 normalizes progeric and aging cells by rebalancing the Transportin-1 nuclear import pathway

Hutchinson-Gilford progeria syndrome (HGPS) is an incurable premature aging disease. Identifying deregulated biological processes in …

Contact