June 2017
Volume 58, Issue 8
Open Access
ARVO Annual Meeting Abstract  |   June 2017
Applied Machine Learning to Medicare Utilization Data
Author Affiliations & Notes
  • Paul Lee
    Retina Consultants of WNY, Williamsville, New York, United States
  • Augustine Lee
    Pulmonology, Mayo Clinic Jacksonville, Jacksonville, Florida, United States
  • Nader Moinfar
    Retina Consultants of WNY, Williamsville, New York, United States
  • mariangela rivera
    Retina Consultants of WNY, Williamsville, New York, United States
  • Rebecca Metzinger
    Retina Consultants of WNY, Williamsville, New York, United States
  • Footnotes
    Commercial Relationships   Paul Lee, None; Augustine Lee, None; Nader Moinfar, None; mariangela rivera, None; Rebecca Metzinger, None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2017, Vol.58, 5075. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Paul Lee, Augustine Lee, Nader Moinfar, mariangela rivera, Rebecca Metzinger; Applied Machine Learning to Medicare Utilization Data. Invest. Ophthalmol. Vis. Sci. 2017;58(8):5075.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Purpose : Evaluate machine learning algorithms to classify factors that determine high utilization of Medicare services. Identification of such factors is important to optimize quality and resource allocation. Confluence of open source data and high-capacity computing has distributed such analysis away from specialized computing environments.

Methods : Multiple CMS and US Census datasets were combined with clinical intuition to identify attributes that might be associated with Medicare utilization pattern - gender, years in practice, population of the providers’ zip codes, participation in PQRS and participation in EHR. Attribute identification was limited to these nonclinical factors given the confines of the publicly available data sources.

From this data, providers performing intravitreal injections (67028) were selected. Utilization data was normalized to reflect treatments per patient rather than the raw treatment volume. This group of values was then categorized into high (>50 percentile) and low group (<50 percentile).

Linear/nonlinear classification algorithms were performed on R statistical software using the Caret package for model comparison. Linear/Logistic regression, Naïve Bayes, Support Vector Machine (SVM), linear discriminate analysis (LDA), K-nearest neighbors (KNN), Random Forest and Classification & Regression Trees (CART) algorithms were evaluated. Accuracy and kappa scores were used for comparison.

Results :
Figure 1: Min, median, mean, max of the Accuracy and Kappa scores

As demonstrated on figure 1, K-nearest neighbor was chosen due to the best combination of accuracy and Kappa values. Further refinement of the KNN by increasing the number of neighbors to 20 (increment =1) did not significantly improve the results.

Conclusions : It is possible to predict some of the characteristics associated with high-utilization using the public data sources. Expansion with enhanced demographic data as well as Inclusion of clinical data would strengthen the predictive ability of such techniques. It is important to note that only quantitative conclusions can be drawn since the datasets lack any clinical data. Specifically, gender, years since graduation, population of the providers’ zip code, participation in EHR/PQRS can predict high utilization with 68% accuracy under the parameters and limitations reported. The choice of the models will be determined by trade-off between bias and variance depending on the need.

This is an abstract that was submitted for the 2017 ARVO Annual Meeting, held in Baltimore, MD, May 7-11, 2017.



This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.