DSB Retreat Boasts Diverse Research, Spotlights Young Investigators
A wide range of research topics, from studies related to pediatric cancer and glioblastoma to soil microbial communities and electronic health records analysis—were presented and discussed at this year’s Department of Systems Biology (DSB) retreat.
Eugene Douglass, postdoctoral research scientist in the Andrea Califano lab, was one of the featured presenters at the two-day retreat. For a photo gallery, view the DSB Retreat Photo Album.
Held over two days for the first time, October 3 to 4 in Ellenville, NY, the retreat gave DSB faculty, post-docs, and students a chance to get away from the bustle of New York City, learn about their peers’ research, both from Morningside labs and CUIMC labs, and network. The department this year expanded its annual program over two days, encouraging more peer-to-peer connecting and devoted the spotlight specifically to research by young investigators.
DSB researchers and graduate students participated in a poster competition held the first evening, and reviewed by Systems Biology faculty judges. At the end of the second day’s program, three Best Poster winners were announced by Andrea Califano, Dr, chair of the department. Poster competition winners this year were: Dafni Glinos , PhD, postdoctoral researcher in the lab of Dr. Tuuli Lappalainen at New York Genome Center/Systems Biology; Alexander Kitaygorodsky , a graduate student in the lab of Dr. Yufeng Shen; and Jordan Metz , an MD/PhD graduate student in the lab of Dr. Peter Sims. The poster winners gave presentations on the final day of the retreat and received a cash prize and an award certificate.
Long-Read Sequencing to Study Allelic Effects on Transcriptome Structure
Variation in transcript structure via RNA splicing and differences in the 5’ and 3’ untranslated regions is a key feature of gene regulation, as it allows the production of different protein isoforms and tuning of transcript stability. Disruption of transcript structure is one of the primary functional changes behind a large proportion of disease variants. Indeed, up to 20% of disease-causing variants in Mendelian disease affect splicing and a large fraction of variants associated to common diseases act through putative expression and splicing mechanisms. Almost all studies to date have utilized short read sequencing, which rely on existing transcript annotations and on different proxies for isoform quantification, limiting scientists’ ability to investigate the mechanism behind many of these variants. The advent of long read technologies for RNA-seq has the potential to transform transcriptome analysis, since it can directly measure full-length isoforms.
In this study, Dafni and her collaborators developed a new computational approach for investigating the effects of variants on the transcriptome using Oxford Nanopore (ONT) long read RNA-seq data. They generated cDNA sequencing data using the PCR-based protocol from 69 samples across 10 different tissues that were assayed as part of the Genotype Tissue Expression (GTEx) project, for which they also had access to short-read RNA-seq data. They carried allelic expression (ASE) analysis, whereby individuals who are heterozygous for a variant are used to determine whether there is imbalance between the two alleles. Using only the ONT data, they had allelic expression from 17,769 genes, of which 7,894 displayed ASE (median=361 per donor). Dafni and team analyzed allele specific transcript structure (ASTS) patterns genome-wide by splitting reads according to the haplotype of a heterozygous variant and determining to which transcript that read had been assigned to - an analysis that is not informative with short-read Illumina data. They had enough reads to analyze a total of 3,037 genes, of which 199 had significant differences in the transcript distributions between the two haplotypes. Despite the lower power in ASTS analysis, 32% of the genes with ASTS did not display ASE. This work provides evidence of power of long read technology for the investigation of genetic variants effect on transcriptome structure and the importance of devising computational protocols to study it in conjunction with expression.
Prediction of deleterious effect of noncoding variants mediated by RNA-binding proteins in developmental disorders
More than 3% of young children are born with developmental disorders such as congenital heart disease (CHD), congenital diaphragmatic hernia (CDH), and autism spectrum disorder (ASD). Understanding the genetic causes of these conditions is critical to improve health care for these children and to push forward human developmental biology and neuroscience. Recently, high-throughput sequencing technologies have enabled generation of large-scale genomic data in genetic studies of these conditions. However, translating human data to knowledge is challenging due to an incomplete understanding of biology and a lack of sufficiently powerful analytical methods. Alexander’s work aims to develop new computational methods based on powerful machine learning techniques to interpret genome sequencing data and identify disease-causing genetic variations. Specifically, he is currently focused on the role of regulatory non-protein coding mutations, where he and collaborators have found a substantial role of variants disrupting RNA binding protein (RBP) binding sites in CHD. RBPs oversee normal regulation of gene expression, at both the transcriptional and especially post-transcriptional stages, and so their disruption via mutation represents an important but under-studied noncoding action mechanism. To better understand the observed enrichment in these sites, they first modeled RNA binding protein processes with a robust convolutional neural network. Then, they designed a gradient boosting super-model to integrate predicted RBP binding scores with multimodal genomic data, allowing them to predict pathogenic RBP and gene regulation disruption caused by individual mutations. Finally, they applied the model back to whole genome sequencing data of autism and CHD to find new disease risk genes and improve genetic diagnosis. In summary, Alexander and his collaborators leveraged large genomic datasets with a sophisticated machine learning approach to better analyze sequencing data, predict pathogenicity of individual noncoding variants via RBP disruption, and aid their understanding of developmental disorder genetics.
Much more attention has been paid to the measurement and modeling of gene transcription, the process by which DNA becomes RNA, than to gene translation, by which RNA becomes protein. Jordan’s understanding of translational dynamics would be useful in understanding and treating diseases in which translation is specifically impacted, but we lack a systems-level understanding of translational regulation, and current methods for studying translation are prohibitively expensive for large-scale experiments and systematic screens. riboPLATE-seq is a method of RNA sequencing that Jordan and collaborators have developed to address this issue. It measures the association of genes to ribosomes, the major translational machinery in the cell, in a format made scalable by automated liquid handling and pooled library preparation, such that 96 separate samples can be prepared and sequenced simultaneously. It is their hope that this method will enable new, previously unattainable scope in the study of translational regulation.
For photos of the retreat, visit the DSB Retreat Album .
-Melanie A. Farmer