Dr. Rorer, MD, Chief Ophthalmic Medical Officer of the FDA's Division of Ophthalmic and Ear, Nose, and Throat Devices within the Center for Devices and Radiological Health, Office of Device Evaluation, provided the FDA's perspectives on evaluating the performance of diagnostic medical devices used in clinical trials of DR, including those used to measure trial outcomes. In FDA's guidance document “Design Considerations for Pivotal Clinical Investigations for Medical Devices: Guidance for Industry, Clinical Investigators, Institutional Review Boards and Food and Drug Administration Staff,” issued on November 7, 2013, diagnostic devices are defined as those used to provide results that are used alone or with other information to evaluate a subject's “target condition.” The term “target condition” refers to an identifiable state, such as a state of health or a stage of disease, in a subject that prompts clinical action. Examples of diagnostic devices are imaging systems, nonimaging in vivo diagnostic devices, devices that provide anatomic measures, devices that measure subject function, or algorithms that yield a composite, subject-specific output. Based on their potential risk to patients, devices are grouped into and regulated by the FDA as three classes. Class I devices have a simple design, pose low risk to patients, and are subject to the lowest level of regulation, such as general controls (e.g., medical device listing with the FDA), with most being exempt from premarket submission for review. Class II devices have a more complex design, pose a greater potential risk to patients, must meet a higher standard, such as special controls (e.g., specific performance standards) in addition to general controls, and often require premarket notification, also known as “510(k) clearance,” which depends on demonstrating substantial equivalence to a legally marketed predicate device. Clearance does not imply that the FDA has reviewed clinical evidence supporting all potential clinical uses of the device. Class III devices have a complex design, pose the highest level of risk, are subject to the highest level of regulation, and hence require premarket approval through the premarket approval (PMA) application process.
In 2006, when the FDA and NEI convened to discuss clinical trial endpoints for DR, visual acuity charts and fundus cameras were the main diagnostic devices used in DR therapeutic trials. Around the same time, optical coherence tomography (OCT) and electronic visual acuity charts were put forth as promising diagnostic devices for measuring structural and functional outcomes, respectively, in future DR trials. Also available around this time were devices such as contrast sensitivity charts, perimeters, color vision testers, SLO, electroretinograms, and visual-evoked potentials. To be useful in early therapeutic intervention trials for DR, a diagnostic device must be capable of detecting changes that occur early in the natural history of DR. Diagnostic device performance should support such an indication for use. Technological advances since 2006 have led to a number of methods to “improve” images obtained using available diagnostic device technologies, including wide-field imaging, ultrawide-field imaging, and adaptive optics, a technique that adjusts for wavefront distortions in optical imaging systems. Other advances to assess blood flow, perfusion, and oxygenation in the retina include Doppler, stroboscopic fundus cameras, optical coherence angiography, and retinal oximeters, the latter two of which had not been FDA cleared or approved at the time of the workshop. Metabolic imaging, which measures changes in reflectance or fluorescence elicited by light stimulation during disease-related metabolic stress, is another technique on the horizon that might help detect metabolic changes prior to the onset of irreversible cell damage in DR; retinal metabolic imaging had not been cleared or approved by the FDA as of the workshop.
Many health-related mobile applications are now available on the market. The majority of these applications do not meet the definition of a medical device, and thus the FDA does not regulate them. Some mobile applications (apps), however, may meet the definition of medical devices, but owing to their low potential risk to patients, the FDA will not enforce requirements under the Federal Food, Drug, and Cosmetics Act for such devices. The FDA exercises regulatory authority over only those mobile apps that are medical devices and whose function could pose safety risks to patients. The February 2015 FDA guidance document on mobile medical applications (MMAs) elaborates these considerations.
16 In general, the FDA considers a mobile app a medical device when the app meets the definition of a medical device, which is any device whose intended use is for diagnosis, cure, mitigation, treatment, or prevention of a disease or condition, or is intended to have structural or functional effects in humans—or if the app is intended as an accessory to a regulated medical device or used to create a regulated medical device from a mobile platform. The FDA has a Web page dedicated to MMAs.
17
Next, Dr. Rorer described paths to bringing diagnostic devices for DR to the market. The 510(k) clearance pathway requires manufacturers to show substantial equivalence of the new device to a similar “predicate” device legally marketed in the United States. Substantial equivalence depends on comparing the intended use and indications for use, technological characteristics, and performance measures of the devices.
When assessing the clinical performance of a diagnostic device for a particular indication for use, it is important for the device to be studied in the same context of that use—for the same purpose, on the intended patient population, by similar users, in the same type of clinical setting.
Dr. Rorer noted other considerations when designing studies for assessing the performance of diagnostic devices, using the case of imaging devices for illustration. When the device output includes qualitative output such as images, masked graders using preestablished criteria should assess the images obtained with the predicate and new devices in the same retinal location and in the same eye using equivalent parameters. Numerous pairs of images from subjects across the intended population should be assessed, including subjects with various forms of pathology and those who are disease free. Assessments should include image quality as well as the identification of relevant pathology. Devices that provide quantitative measurements should be evaluated for agreement, defined by how one device model's output compares with another's (agreement is distinct from accuracy except when the device is compared with a gold standard); bias, defined as the estimate of systematic measurement error (defined as the mean difference between the measured value and the reference value and expressed as difference in measurement units or percent difference); and precision, defined as an estimate of random measurement error and reflecting the closeness between repeated, independent measurements on the same eye under the specified testing conditions. (Variability related to devices, operators, settings, and patient alignment can affect precision. Repeatability and reproducibility are precision measures that vary with testing conditions, which must be clearly described.)
Dr. Rorer noted that agreement, bias, and precision measures can be either constant or variable across the measurement range of the device. Further, these measures of device performance may not be identical for healthy subjects and those with pathology. These measures can also vary with image quality. Thus, appropriate measurement validation studies should be carried out. Clinical decision limits, which allow discrimination between different health states of subjects, must be established before conducting pivotal diagnostic clinical performance studies.
18 Cross-sectional studies of known normal and diseased subjects with disease of varying severity can reveal preliminary information about potential decision limits. Once the limits are established, a pivotal diagnostic clinical performance study may be performed. Such a study compares the reported diagnosis or referral decision with the clinical reference standard, that is, the best available method for establishing the true status of a subject with respect to a target condition, and uses a different population of subjects than that used to determine clinical decision limits. Clinical reference standards may be individual methods or combinations of methods, can evolve over time, and are typically established by evidence of current practice from medical and regulatory communities. Therefore, any report of diagnostic device performance should always include the definition of the clinical reference standard used.
Finally, Dr. Rorer encouraged the gathering to solicit input and feedback from the FDA on proposed preclinical testing and clinical trial design through the presubmission program prior to embarking on studies and during early stages of device development. This program provides investigators and manufacturers an opportunity to meet with the FDA. Dr. Rorer concluded the talk with a call to action, highlighting the need for well-characterized diagnostic devices with low bias and imprecision for detecting early-stage DR. She added that diagnostic device performance must be carefully considered when the devices are incorporated into therapeutic trials, especially for evaluating endpoints.