Dr. Sam Robson is a Senior Research Fellow at the University of Portsmouth and is the Bioinformatics Lead at the Centre for Enzyme Innovation. He is a data scientist and computational biologist with an extensive publication history (including 6 Nature papers) and particular expertise in maintenance, processing and analysis of large whole genome sequencing data sets. Previous to his appointment, he worked as a Bioinformatician in the group of Prof. Tony Kouzarides at the Wellcome Trust/CRUK Gurdon Institute. The main research focus of the lab was to analyse the role of histone and RNA modifications, and in particular their role in diseases such as cancer. Prior to this, he held a Post-Doctoral Fellowship at the Wellcome Trust Sanger Institute in the lab of Dr. Matt Hurles. This work focused on the analysis of large scale copy-number variations in the human genome and their role in common diseases such as breast cancer and Crohn’s Disease. His background is in pure Mathematics having achieved his Bachelor’s degree at the University of Warwick, and he holds a Masters and PhD in Mathematical Biology and Biophysical Chemistry from the MOAC Doctoral Training Centre. He is a Professional Member of the International Society for Computational Biology and a Fellow of the Royal Statistical Society and holds CStat and CSci Professional qualifications.
The Bioinformatics Group at the University of Portsmouth was formed in 2017 by Dr. Sam Robson. We collaborate across the faculty on research projects utilising powerful techniques such as high-throughput sequencing, which require extensive processing and rigorous statistical analyses. We also work to build bioinformatics tools for use by the wider research community. We work on a variety of different projects and data sets, in diverse fields such as environmental biology, marine biology, microbiology, clinical research projects and paleogenomics.
The University of Portsmouth hosts a Bioinformatics-specific compute cluster, with well-maintained pipelines for RNA-seq, ChIP-seq, CLIP-seq, BS-seq, Exome-seq, amplicon sequencing, and other sequencing data types used by researchers throughout the University. The cluster consists of 4 compute nodes and 1 head node. The compute nodes consist of Dell PowerEdge R630 Servers with Intel Xeon E5-2650 v4 Processors (12 cores, 2.2GHz), 128GB RAM and 2x 240GB flash (SSD) storage. This provides a total of 48 cores, or 96 threads (via Hyperthreading). The head node is a Dell PowerEdge R630 Server with 2x Intel Xeon E5-2650 v4 Processors (12 cores, 2.2GHz), 384GB RAM and 2x 240GB flash (SSD) storage. Local storage is provided by a Synology RS3617RPxs NAS Server with 120TB HDD storage, connected to the compute nodes via a 10GbE Network. We also use both Amazon Web Services (AWS) and Google Cloud Platform for cloud computing resources.
PhD in Mathematical Biology and Biophysical Chemistry, 2008
University of Warwick
MSc in Mathematical Biology and Biophysical Chemistry, 2004
University of Warwick
MSc in Mathematics (Hons), 2003
University of Warwick
Prosthetic joint infection (PJI) represents one of the most common reasons for failure among hip and knee arthroplasty, with an …
An innovative new Research Centre to identify novel enzymatic solutions to environmental waste problems such as plastic
A network for researchers and stakeholders in local historical monuments and buildings to develop collaborative research projects with …
Development of in-house bioinformatics tools and analysis models for use in combination with publicly available data analysis software
Genotyping of ancient DNA from crew members from the Mary Rose to identfy phenotypic traits and disease traits
Metatranscriptomic analysis of marine biofilm composition on commercially available and novel anti-fouling substrates
Identification of differentially regulated gene pathways as a result of radiation exposure
Differential gene expression analysis in a model of Duchenne Muscular Dystrophy
Analysis of development in Xenopus laevis using whole genome analysis
We recently identified the splicing kinase gene SRPK1 as a genetic vulnerability of acute myeloid leukemia (AML). Here, we show that genetic or pharmacological inhibition of SRPK1 leads to cell cycle arrest, leukemic cell differentiation and prolonged survival of mice transplanted with MLL-rearranged AML. RNA-seq analysis demonstrates that SRPK1 inhibition leads to altered isoform levels of many genes including several with established roles in leukemogenesis such as MYB, BRD4 and MED24. We focus on BRD4 as its main isoforms have distinct molecular properties and find that SRPK1 inhibition produces a significant switch from the short to the long isoform at the mRNA and protein levels. This was associated with BRD4 eviction from genomic loci involved in leukemogenesis including BCL2 and MYC. We go on to show that this switch mediates at least part of the anti-leukemic effects of SRPK1 inhibition. Our findings reveal that SRPK1 represents a plausible new therapeutic target against AML.
N6-methyladenosine (m6A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3–METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter bound METTL3 induces m6A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.
Nucleosomes are decorated with numerous post-translational modifications capable of influencing many DNA processes. Here we describe a new class of histone modification, methylation of glutamine, occurring on yeast histone H2A at position 105 (Q105) and human H2A at Q104. We identify Nop1 as the methyltransferase in yeast and demonstrate that fibrillarin is the orthologue enzyme in human cells. Glutamine methylation of H2A is restricted to the nucleolus. Global analysis in yeast, using an H2AQ105me-specific antibody, shows that this modification is exclusively enriched over the 35S ribosomal DNA transcriptional unit. We show that the Q105 residue is part of the binding site for the histone chaperone FACT (facilitator of chromatin transcription) complex. Methylation of Q105 or its substitution to alanine disrupts binding to FACT in vitro. A yeast strain mutated at Q105 shows reduced histone incorporation and increased transcription at the ribosomal DNA locus. These features are phenocopied by mutations in FACT complex components. Together these data identify glutamine methylation of H2A as the first histone epigenetic mark dedicated to a specific RNA polymerase and define its function as a regulator of FACT interaction with nucleosomes.
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn’s disease, HLA for Crohn’s disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.