Abstract
Purpose :
The genetic causes of inherited retinal degenerations (IRDs) remain elusive in many patients. This may in part be due to the missing genetic information of unresolved regions in the current human reference genome (GRCh38). Using ultra-long DNA reads in combination with short sequencing technology, the Telomere-to-Telomere (T2T) Consortium has provided gapless human haploid genome assembly using CHM13hTERT cell line. In this new reference genome assembly, there are 3,604 gene exclusive to the T2T-CHM13 genome. To understand the relevance of these genes to retinal disease, we reanalyzed RNA-sequencing (RNA-Seq) data of non-diseased human retina and retinal pigment epithelium (RPE) using the T2T-CHM13 genome.
Methods :
We re-analyzed and mapped short-read RNA-Seq data from previously published 131 human healthy donor retina (macular and peripheral) and 36 RPE (including macular, peripheral RPE/choroid/sclera) samples obtained from non-diseased eyes to the T2T-CHM13 genome. We also included PacBio HiFi long-reads RNA-seq data from 4 healthy retinas samples. For short reads sequencing, the STAR aligner was used to align stranded, PE reads and the featureCount was used to quantify gene expression. For HiFi reads, the Iso-Seq pipeline from PacBio was used to obtain high quality isoform reads. A standard SQANTI3 workflow was used for isoform identification and classification. StringTie2 was used to merge individual transcripts to create a non-redundant gene model set.
Results :
Short-read RNA-Seq identified 36,997 expressed genes in RPE and retina. Furthermore, 500 genes (unique union of RPE and retina) fell into T2T-CHM13 exclusive regions of the genome. Based on liftovers and close paralogs between CHM13 and GRCh38 biotype annotations, ~43% of the newly discovered genes in retina and RPE are lncRNA, 22% processed_pseudogene, about 8% protein coding among others. Long-read sequencing identified 229,882 total transcripts including 42,339 novel transcripts. In the T2T-CHM13 exclusive regions we detected expression of 407 of the 500 genes identified to be expressed by the short-read data.
Conclusions :
These analyses identified expression of 500 genes located in the T2T-CHM13 exclusive regions of the genome in the retina and RPE. Future exploration of these novel T2T-CHM13 genes and transcripts may be help uncover part of the missing genetic causality of IRDs.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.