Abstract
Purpose :
To create an exhaustively optimized single cell retina single cell transcriptome dataset created from all publicly available retina single cell transcriptomes. We use this dataset to provide the basis for a highly responsive reactive web app for querying gene expression across retina cell type, study, species, developmental stage, and other factors. Second, we demonstrate how research groups can project their private single cell experiments onto our reference retina single cell atlas with minimal compute resources.
Methods :
We re-quantified over 1 million single cell transcriptomes across three species, 30 studies, and 7 single cell technologies. After quality control we retain over 700,000 high quality transcriptomes. To optimize the batch correction, we tested 12 independent tools methods and benchmarked them with an un-biased algorithm to identify the optimal method and parameters. After batch correction we create a 2 dimensional UMAP projection, run thousands of differential gene expression tests across cell types, and calculate developmental trajectories for the major retinal cell types. We then hand-curated over 300,000 published retina cell type labels to create an xgboost-based machine learning tool to label all cells in our dataset.
Results :
After batch correction the 2D UMAP places the retina cell types in distinct spaces with the progenitor populations in the center. The photoreceptor progenitors flow from the center into the differentiated rods and cones. Likewise, the amacrine / horizontal precursors lead into the terminally differentiated amacrines and horizontals. The retinal ganglion and muller glia are also in distinct clusters. Importantly these cell type positions are generally consistent across species and studies. Our differential testing confirms that our machine learning cell type tool correctly labels the cells based on the community knowledge.
Conclusions :
Despite a high amount of technical variation between published single cell transcriptome atlases, we confirm that we can create a coherent, high-quality meta-atlas. Furthermore, we demonstrate how our meta-atlas is a community resource by illustrating projection of independent and outside single cell transcriptome data onto our meta-atlas with minimal compute and disk resources. Our dataset is made available for powerful user-led analysis at plae.nei.nih.gov.
This is a 2021 ARVO Annual Meeting abstract.