Studies of AI health technologies should aim to generate evidence that justifies and directs investment in the next step of an intervention along the translational pathway. The selection of an appropriate study design therefore depends upon how far a specific AI technology is on that translational pathway. This is a large part of the reason why the exponential growth of clinical studies of AI is largely composed of preclinical studies and small-scale prospective clinical studies (equivalent to early-phase clinical trials).
10 However, whether considering a pharmacotherapy, a physical medical device, or an AI technology, larger scale interventional clinical trials should ultimately inform decisions to implement healthcare interventions into real-world care. Estimating real-world effectiveness from preclinical performance is often challenging but is particularly difficult with AI-enabled interventions. They are complex interventions that can have unpredictable impacts on a healthcare pathway, such as underperformance for a specific subpopulation.
11,12 Interventional studies provide evidence beyond simple technology performance (e.g., diagnostic accuracy) and can provide vital information about the actual effect on the patient and other downstream consequences. The randomized controlled trial (RCT) design is held up as the benchmark for evidence generation due to its use of random allocation to tackle the major bias of unequal allocation between interventional and control groups. To maximize their value, RCTs should be designed to reflect the intended real-world application as closely as possible.
In contrast to the exponential growth of “early-phase” clinical studies of AI health interventions, the number of larger scale clinical trials of AI health technologies (phase 3 equivalents) is still small.
10,13 This scarcity of published late-stage study designs is not purely a rational reflection of the translational stage of AI technologies, as they number far less than the number of AI-enabled medical devices that have been granted regulatory approval for clinical use.
13,14 Two factors may account for this. First, there may be evidence available to regulators that is not in the public domain. Such publication is not an obligation for AI manufacturers making submissions to regulators. Publishing also requires significant resource allocation from manufacturers and may even risk their intellectual property. Such failures to publicly report studies are unhelpful, though, and manufacturers should be encouraged to share results openly to support better evaluation decisions across healthcare systems. Second, it appears that many regulatory approvals for AI-enabled interventions are based on non-interventional studies alone.
15 This may benefit AI technology manufacturers who avoid the costs associated with large-scale interventional trials but does not support decision makers evaluating AI-enabled interventions for patient and service benefit. It is striking that, despite the huge interest in AI health technologies, a systematic review looking for prospective RCTs of AI health technologies in any clinical setting identified just 65 eligible publications since September 2020.
13 Most of these studies (
n = 24) took place in China, with Europe (
n = 14), the United States (n = 12), and Japan (
n = 5) being the major contributors. Categorizing eligible studies clinically, the largest contributors were gastroenterology (
n = 15) and radiology (
n = 5), with primary care, emergency medicine, diabetology, and cardiology each contributing four eligible RCTs. Despite their scarcity, this systematic review indicated a good overall quality of study design across these RCTs (
Table).
Although well-designed, large-scale RCTs are a valuable source of evidence in the evaluation of an AI health technology, it is important to understand their limitations. To complement the quantitative evidence available from RCTs, researchers should also design qualitative research studies that use stakeholder perspectives and experiences to generate evidence regarding the sociotechnical mechanisms by which an AI health technology influences outcomes in a specific healthcare context.
16 Many such studies are also preclinical in nature; just 20 studies of stakeholder perspectives of AI-enabled interventions in prospective clinical use were identified by a recent bibliometric study.
17 Ideally, qualitative studies would complement quantitative research methods to help support improvements in intervention design and implementation within various healthcare contexts.
18 These facets concerning the mechanism by which complex healthcare interventions exert their impact on the wider health system are addressed elsewhere.
19