Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Harmonization of variants in the eyeGENE® cohort for data sharing
Author Affiliations & Notes
  • Ranya Al Rawi
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Kerry E. Goetz
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Melissa Reeves
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Amelia Naik
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Nia Moore
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Robert B Hufnagel
    Genetics Department, Hawaii Permanente Medical Group, Honolulu, Hawaii, United States
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Bin Guan
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Santa J. Tumminia
    National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States
  • Footnotes
    Commercial Relationships   Ranya Al Rawi None; Kerry Goetz None; Melissa Reeves None; Amelia Naik None; Nia Moore None; Robert Hufnagel Genetics Department Hawaii Kaiser Permanente, Code E (Employment); Bin Guan None; Santa Tumminia None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 2419. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ranya Al Rawi, Kerry E. Goetz, Melissa Reeves, Amelia Naik, Nia Moore, Robert B Hufnagel, Bin Guan, Santa J. Tumminia; Harmonization of variants in the eyeGENE® cohort for data sharing. Invest. Ophthalmol. Vis. Sci. 2024;65(7):2419.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Prevalent issues in the field of genetics include temporal changes and inconsistency in variant representation and nomenclature in literature, clinical databases, and clinical laboratories which hinders secondary genetic analyses such as genotype-phenotype association studies. At eyeGENE®, we aim to standardize variant format in our database by assigning each unique variant an hg19 variant ID in chromosome-position-ref-alt (vcf) format to share with the wider scientific community.

Methods : We developed a semi-automated variant standardization pipeline to convert eyeGENE® variants from HGVS cDNA format to the vcf format. Multiple tools including R, python, VEP, TransVar, InterVar, and VariantValidator were employed. Variants that failed the automatic conversion process were manually reviewed.

Results : There were 101,137 unique variants in the eyeGENE® database as of 2019. Results were collected starting in 2007 from several different clinical testing facilities. The TransVar tool was found to be most flexible as it allows input formats of Gene:Transcript:HGVS cDNA, Gene:HGVS cDNA, or Gene:HGVS protein. VariantValidator is also sufficient for conversion when the transcript ID is known. Automatic pipeline successfully converted 81.9% (8303/10137) of variants. Variants that failed the automatic process included intronic variants in IVS format, incompatible Gene:Transcript:HGVS cDNA as shown by TransVar, variants without transcript information, inconsistent HGVS nomenclature, and typological errors. After correcting typos, wrong HGVS annotation, and performing manual checks of other variant information, such as HGVS protein and dbSNP ID, a substantial majority of the variants (84.7%) were successfully converted to vcf format following manual review.

Conclusions : The conversion of HGVS to vcf format is necessary to develop interoperable datasets for genetic and genotype:phenotype correlation studies; however, it is time consuming and requires multiple tools when performed retrospectively. Lacking transcript ID and inconsistent HGVS annotation are major obstacles in this process. Using the vcf format (CHROM-POS-REF-ALT) facilitates data sharing between clinical labs and reduces the time and burden spent in reprocessing genetic data.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×