Research

Extracellular RNAs

Before coming to graduate school, I worked as a software engineer and data analyst for Dr. Aleksandar Milosavljevic in the Bioinformatics Research Laboratory at Baylor College of Medicine in Houston, TX. In that role, I contributed to several major research projects for the NIH Common Fund’s Extracellular RNA Communication (ERC) program. The purpose of the program was to study extracellular RNAs (exRNAs) across many different research areas, such as better understanding of exRNA biogenesis, discovery of exRNA biomarkers and use of exRNAs in therapies, and construction of a large set of reference profiles associated with healthy subjects as a foundation for future clinical research. As a leading member of the program’s Data Management and Resource Repository (DMRR), I presented at numerous conferences and helped create public resources to aid the wider scientific community in exRNA research.

For example, I played a key role in the design and development of the exRNA Atlas, the primary online repository for extracellular RNA data generated by members of the ERC program. I designed and implemented a web-based JSON-LD API that incorporated FAIR criteria and offered programmatic access to data and associated metadata stored in the Atlas. I also created thorough documentation on how to use the API. As of May 13, 2021, the exRNA Atlas contained data and metadata from 7,756 samples, with contributions stemming from 42 different studies and 21 different labs. As a co-first author, I helped write a highly collaborative paper (published in Cell) that both described the Atlas resource and provided an integrative data analysis, backed by wet lab experimental validation, showing strong evidence for the existence of distinct exRNA cargo types across human biofluids.

I also made significant contributions to the ERC program’s extra-cellular RNA processing toolkit (exceRpt), a small and long RNA-seq data processing pipeline specialized for human and mouse exRNA data. In collaboration with Dr. Mark Gerstein’s lab at Yale University, the creators of exceRpt, I implemented numerous bug fixes and enhancements to exceRpt, improving the robustness of the pipeline. I also made exceRpt more accessible to the wider scientific community by integrating it into the Genboree Workbench, a web-based platform that hosts a variety of different bioinformatics tools and stores related data for users. By submitting sample data for processing via the Workbench, users took advantage of a computing cluster and a highly parallel implementation of exceRpt, allowing for data from dozens of samples to be processed simultaneously. The Workbench version of exceRpt has been extensively used, with small RNA-seq data from over 26,000 samples processed since the tool was integrated in 2014. A paper describing our work on exceRpt was published in Cell Systems.

Making Predictions to Learn More About Human Health and Disease

Nowadays, as a member of Dr. Olga Troyanskaya’s lab, I’m using machine learning to make predictions in the context of human health and disease. More coming soon!