Projects

Below, I’ll highlight a few projects that have been important to me:

Innate Immune Memory in Influenza Infection (First author, published in Cell Reports)

Viral infections can induce prolonged changes in innate immunity. Here, we use blood samples from a human influenza H3N2 challenge study (NCT03883113) to perform comprehensive multi-omics analyses. We detect remodeling of immune programs in circulating innate immune cells that persist after resolution of the infection. We find changes associated with suppressed inflammation, including decreased cytokine and AP-1 gene expression as well as decreased accessibility at AP-1 targets and interleukin-related gene promoter regions. We also find decreased histone deacetylase gene expression, increased MAP kinase gene expression, and increased accessibility at interferon-related gene promoter regions. Genes involved in inflammation and methylation remodeling show modulation of gene-chromatin site regulatory circuit activity. These results reveal a coordinated rewiring of the molecular landscape in innate immune cells induced by mild influenza virus infection.

To learn more, please visit the associated publication.

SPEEDI (Single-cell Pipeline for End to End Data Integration) (Co-first author, published in Cell Systems)

To facilitate single-cell multi-omics analysis and improve reproducibility, we present SPEEDI (Single-cell Pipeline for End to End Data Integration), a fully automated end-to-end framework for batch inference, data integration, and cell type labeling. SPEEDI introduces data-driven batch inference and transforms the often heterogeneous data matrices obtained from different samples into a uniformly annotated and integrated dataset. Without requiring user input, it automatically selects parameters and executes pre-processing, sample integration, and cell type mapping. It can also perform downstream analyses of differential signals between treatment conditions and gene functional modules. SPEEDI’s data-driven batch inference method works with widely used integration and cell-typing tools. By developing data-driven batch inference, providing full end-to-end automation, and eliminating parameter selection, SPEEDI improves reproducibility and lowers the barrier to obtaining biological insight from these valuable single-cell datasets.

To learn more, please visit the associated publication, the SPEEDI R package, or the SPEEDI interactive web application.

Dog Aging Project (Published in Nature)

As part of the Dog Aging Project, a consortium funded to research dog aging and health and their relevance to human health and disease, I collaborated with a team of 4+ researchers to create Google Cloud Platform based pipelines for genomic data ingestion and processing.

To learn more, please visit the associated publication.

The Extracellular RNA Atlas (Co-first author, published in Cell)

I played a key role in the design and development of the exRNA Atlas, the primary online repository for extracellular RNA data generated by members of the Extracellular RNA Communication (ERC) program. I made important contributions to various Atlas features and tools, such as an ncRNA search feature to discover expression of selected ncRNAs across different biofluid data present in the Atlas, FTP data submission pipelines for RNA-seq and qPCR data with extensive support for linking associated metadata, and integrated downstream analysis tools for exRNA data (e.g., BioGPS for visualization of ncRNA expression, DESeq2 for pairwise differential expression of miRNAs across certain key metadata fields, and WikiPathways’ Pathway Finder for pathway analysis of differentially expressed miRNAs). I also designed and implemented a web-based JSON-LD API that incorporated various FAIR (Findability, Accessibility, Interoperability, and Reusability) criteria and offered programmatic access to data and associated metadata stored in the Atlas. Metadata were stored in MongoDB and were standardized using clinical ontologies including SNOMED CT and LOINC.

As of July 15, 2021, the Atlas contained data and metadata from 7,756 samples, with contributions stemming from 42 different studies and 21 different labs. Many studies were translational in nature, focusing on ncRNA biomarker discovery for specific conditions (e.g., multiple sclerosis, gastric cancer, and placental dysfunction). As a co-first author, I helped write a highly collaborative paper (published in Cell) that both described the Atlas resource and provided an integrative data analysis, backed by wet lab experimental validation, showing strong evidence for the existence of distinct exRNA cargo types across human biofluids.

To learn more, please visit the associated publication, the Atlas website, or the Atlas JSON-LD REST API documentation.

Extra-cellular RNA Processing Toolkit (exceRpt) (Published in Cell Systems)

I made significant contributions to the Extracellular RNA Communication (ERC) program’s extra-cellular RNA processing toolkit (exceRpt), a small and long RNA-seq data processing pipeline specialized for human and mouse exRNA data. I made exceRpt more accessible to the wider scientific community by helping to integrate it into the Genboree Workbench, a web-based platform that hosts a variety of different bioinformatics tools and stores related data for users. By submitting sample data for processing via the Workbench, users took advantage of our computing cluster and a highly parallel implementation of exceRpt, allowing for data from dozens of samples to be processed simultaneously. The Workbench version of exceRpt has been extensively used, with small RNA-seq data from over 26,000 samples processed since the tool was integrated in 2014. A paper describing our work on exceRpt was published in Cell Systems.

To learn more, please visit the associated publication or the exceRpt homepage.

William Thistlethwaite

Projects