June 2022
Volume 63, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2022
Factors Involved in Developing an Open Electronic Health Record Glaucoma Dataset
Author Affiliations & Notes
  • Jimmy S Chen
    Ophthalmology, Oregon Health & Science University, Portland, Oregon, United States
    Ophthalmology, University of California San Diego, La Jolla, California, United States
  • Michael F Chiang
    Ophthalmology, National Eye Institute, Bethesda, Maryland, United States
  • Michelle Hribar
    Ophthalmology, Oregon Health & Science University, Portland, Oregon, United States
    Clinical Epidemiology & Medical Informatics, Oregon Health & Science University, Portland, Oregon, United States
  • Footnotes
    Commercial Relationships   Jimmy Chen None; Michael Chiang Novartis, Code C (Consultant/Contractor), Genentech, Code F (Financial Support), InTeleretina, Code I (Personal Financial Interest); Michelle Hribar None
  • Footnotes
    Support  This work is supported by grants R21LM013937 and P30EY10572 from the National Institutes of Health (Bethesda, MD) and and by unrestricted departmental funding from Research to Prevent Blindness (New York, NY).
Investigative Ophthalmology & Visual Science June 2022, Vol.63, 2297. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jimmy S Chen, Michael F Chiang, Michelle Hribar; Factors Involved in Developing an Open Electronic Health Record Glaucoma Dataset. Invest. Ophthalmol. Vis. Sci. 2022;63(7):2297.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Data science research is dependent on large, well-compiled datasets. However, these datasets are difficult to acquire, and there are currently no best practices for sharing these data. An open dataset could address this gap, allowing researchers to investigate new hypotheses and develop more generalizable studies. This abstract describes factors considered in constructing such a dataset.

Methods : A dataset containing medical record numbers (MRNs) of glaucoma patients, providers and their specialty departments, visit identifiers and dates, raw progress notes and medication lists extracted from the EHR, and statistical analysis from a previously published manuscript was used (Chen et al, Ophthalmology Science, 2021). These progress notes and medication lists were previously manually annotated for medications names, frequency, route, and indication. Each dataset element was reviewed for protected health information (PHI). If PHI was present, a decision was made to remove or de-identify the data field. Example data fields, including PHI, and their rationale for inclusion/exclusion are described in Table 1.

Results : Patient MRNs, visit identifiers, and visit dates were the only data fields specifically with PHI, and were de-identified using an R library, anonymizer, which uses hash functions to encode identifying variables. Visit dates were shifted and truncated. Provider data was removed, and department data was included as is. Annotated medication data were paired with all data fields as a CSV and statistical code was included without modification. While medication lists were included as is, progress notes potentially contained PHI and required de-identification using a natural language processing algorithm, Philter (Python), with results verified by a clinician (JSC). These data could be uploaded to online data repositories such as Dryad or Figshare (Table 2), published as a Data Descriptor Article (Zarbin et al, TVST, 2021), and potentially used to develop or validate text-processing algorithms involving medication data.

Conclusions : Processing and uploading datasets for open-source dataset publication is a feasible, inexpensive process and could become standard practice to increase collaboration as well as dataset accessibility in vision research.

This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.

 

Table 1. Examples of Data Fields and Data Files for Consideration when Creating an Open Dataset.

Table 1. Examples of Data Fields and Data Files for Consideration when Creating an Open Dataset.

 

Table 2. Examples of Free or Low Cost Data Repositories.

Table 2. Examples of Free or Low Cost Data Repositories.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×