Choroidalyzer: An Open-Source, End-to-End Pipeline for Choroidal Analysis in Optical Coherence Tomography

Purpose To develop Choroidalyzer, an open-source, end-to-end pipeline for segmenting the choroid region, vessels, and fovea, and deriving choroidal thickness, area, and vascular index. Methods We used 5600 OCT B-scans (233 subjects, six systemic disease cohorts, three device types, two manufacturers). To generate region and vessel ground-truths, we used state-of-the-art automatic methods following manual correction of inaccurate segmentations, with foveal positions manually annotated. We trained a U-Net deep learning model to detect the region, vessels, and fovea to calculate choroid thickness, area, and vascular index in a fovea-centered region of interest. We analyzed segmentation agreement (AUC, Dice) and choroid metrics agreement (Pearson, Spearman, mean absolute error [MAE]) in internal and external test sets. We compared Choroidalyzer to two manual graders on a small subset of external test images and examined cases of high error. Results Choroidalyzer took 0.299 seconds per image on a standard laptop and achieved excellent region (Dice: internal 0.9789, external 0.9749), very good vessel segmentation performance (Dice: internal 0.8817, external 0.8703), and excellent fovea location prediction (MAE: internal 3.9 pixels, external 3.4 pixels). For thickness, area, and vascular index, Pearson correlations were 0.9754, 0.9815, and 0.8285 (internal)/0.9831, 0.9779, 0.7948 (external), respectively (all P < 0.0001). Choroidalyzer's agreement with graders was comparable to the intergrader agreement across all metrics. Conclusions Choroidalyzer is an open-source, end-to-end pipeline that accurately segments the choroid and reliably extracts thickness, area, and vascular index. Especially choroidal vessel segmentation is a difficult and subjective task, and fully automatic methods like Choroidalyzer could provide objectivity and standardization.


Introduction
The retinal choroid is a densely vascularised tissue at the back of the eye, providing essential nutrients and support to the outer retinal pigment epithelium and photoreceptors [1].The choroid is emerging as a window into systemic vascular health including brain [2], kidney [3], and heart [4].The choroid is also affected by ophthalmic conditions like myopia [5].Thus, the choroid is a potential source of biomarkers for ocular and non-ocular disease [6,7,8,9].This is driven by improvements in optical coherence tomography (OCT) imaging, especially enhanced depth imaging OCT (EDI-OCT) [10].Previously, only the retinal layers were well-captured whereas the choroid, which sits below the hyper-reflective retinal pigment epithelium, was not imaged well and thus received little attention.Now, the choroid can be captured well and is a promising frontier for systemic health assessment [11], especially as OCT devices become commonplace even at high street optometrists.To compute choroidal metrics that could serve as potential vascular biomarkers like choroidal thickness, area, or vascular index, the choroid region and vasculature must be identified and segmented accurately and reliably.
While choroidal region segmentation is relatively straightforward compared to vessel segmentation, as only a single shape needs to be identified per scan, accurate detection of the lower choroid boundary (Choroid-Sclera, C-S, junction) can be time consuming and at times ambiguous due to poor contrast or image noise.While semi-automatic methods have been proposed [6,12,13,14,15,16,17,18,19,20], these typically require training and expertise to use and do not remove human error and subjectivity.Fully-automatic, deep learning-based approaches to region segmentation have been proposed and address both the time-intensive and the ambiguous nature of region segmentation, drastically improving both the ease and standardisation of choroidal segmentation.Many of these methods are not openly available to the research community [21,22,23,24], but recently DeepGPET, an open-source choroidal region segmentation method, was published that can be freely downloaded from GitHub [25].
Choroidal vessel segmentation is a far more complex and time-consuming task.The choroidal vessels are highly heterogeneous in terms of vessel size, shape, and edge contrast and are sometimes hard to discern due to poor contrast or noise, making manual segmentations prohibitively time consuming and very subjective.Currently, local thresholding algorithms are commonplace for choroidal vessel segmentation [26,27,28], and the current state-of-the-art is the Niblack algorithm [29,30].Niblack is a local thresholding technique which segments the vessels using a fixed-size sliding window and a standard deviation offset to determine a pixel-level threshold.However, there is evidence of wide inter-grader disagreement between the two commonly used adaptations to Niblack's algorithm [31].Deep learning approaches have been proposed previously trained on manual annotations or Niblack's algorithm [32,33], but are not openly available at the time of writing.
Finally, in addition to region and vessel segmentation, there are two more necessary steps that are often overlooked, namely fovea detection and computation of choroidal metrics.OCT B-scans are not necessarily perfectly centred and the size of a pixel can differ not only between devices but also between scans.Thus, once region, vessels, and fovea are extracted, choroidal metrics should be computed in a fovea-centred region-of-interest [6], which must account for key details like the pixel-scaling of the scan.Currently, each of these four steps is done by a different tool [34,35] with ad-hoc and non-standardised approaches used especially for fovea detection.[36].
We address these issues by proposing Choroidalyzer, an endto-end pipeline for choroidal analysis.Choroidalyzer consists of a single deep learning model that simultaneously segments the choroidal region and vessels and detects the fovea location, combined with all the code needed to extract choroidal thickness, area, and vascular index in a fovea-centred region of interest.Fig. 1 shows how Choroidalyzer improves on the current state-of-the-art by providing a comprehensive solution for all elements of choroidal analysis.To our knowledge, Choroidalyzer is the first open-source method for comprehensive, automatic analysis of the choroid from a raw OCT B-scan.Choroidalyzer is highly effective, can be run on a standard laptop in less than one-third of a second per image, does not require any specialist training in image processing and is available on GitHub: https://github.com/justinengelmann/Choroidalyzer.

Study population
Our dataset contains 5,600 OCT B-scans of 233 participants from 6 cohorts of healthy and diseased individuals, unrelated to ocular pathology; OCTANE [37], a longitudinal cohort study investigating choroidal microvascular changes in renal transplant recipients and healthy donors; Diurnal Variation [37], a sub-cohort of OCTANE of young individuals investigating the possible effects of diurnal variation on the relationship between the choroid and markers of renal function; Normative, a detailed OCT examination of one of the authors (J.B.) with informed consent; i-Test [37], a cohort of pregnant women evaluating whether the choroidal microvasculature reflects cardiovascular changes in both healthy and complicated pregnancies; Prevent Dementia, a longitudinal cohort tracking middle-aged individuals with varying risk of developing late onset Alzheimer's dementia [38]; GCU Topcon [39], an investigation into diurnal variation of the choroid in emmetropic and myopic individuals.All studies adhered to the Declaration of Helsinki and received relevant ethical approval and informed consent from all subjects was obtained in all cases from the host institution.Table 1 describes the population statistics and image acquisition statistics for each cohort.
Three OCT device types were used from two device manufacturers: the spectral domain OCT SPECTRALIS Standard Module OCT1 system and the spectral domain OCT SPECTRALIS portable FLEX Module OCT2 system (both Heidelberg engineering, Heidelberg, Germany), and the swept source OCT DRI Triton Plus (Topcon, Tokyo, Japan).For the Heidelberg devices, active eye tracking with built-in Automatic Real Time (ART) software was used with horizontal and vertical line scans capturing a 30 • (9mm) fovea-centred region of interest, with an ART of 100, i.e. each final B-scan is the average of 100 B-scans.Posterior pole macular line scans covered a 30-by-25-degree rectangular region of interest using 31 consecutive scans, each with an ART of 50 (Posterior pole scans in the Normative cohort were acquired with an ART of 9).All Heidelberg data was collected at a pixel resolution of 768 × 768 pixels, with a signal quality ≥ 15.The Topcon device imaged the macular region using 12 fovea-centred radial scans, spaced 30 • apart and covering a 30 • (9mm) region of interest.Each B-scan had a resolution of 992 × 1024 pixels which was cropped horizontally by 32 pixels and resized to the resolution of the Heidelberg scans of 768 × 768.All Topcon data had an image quality score > 88 determined by the built-in TopQ software.
Five of the six cohorts were split into training (4,144 B-scans, 122 subjects), validation (466 B-scans, 28 subjects) and internal test sets (756 B-scans, 37 subjects) containing approximately 75, 10, and 15% of the B-scans.We split the data on the subjectlevel, such that no individual ended up in more than one set.The remaining cohort, OCTANE, was entirely held-out as an external test set (168 B-scans, 46 individuals).Supplementary Table S1 gives a detailed overview of population and image characteristics for each of the four sets.

Ground-truth (GT) labels
The fovea coordinate was defined as the horizontal (column) pixel index which aligned with the deepest point of the foveal Figure 1 A comparison between Choroidalyzer and the existing state of choroidal analysis.To obtain choroidal metrics in a foveacentred region of interest, researchers currently need to combine many different tools.Choroidalyzer unifies everything into a end-to-end pipeline that is very fast and convenient to use.  1 Overview of population characteristics.SD, standard deviation.Note that one participant's sex from the GCU Topcon cohort was not recorded.
pit depression [36], i.e.where the central foveal pit was most illuminated, typically aligning with a ridge formed at the photoreceptor layer.The choroidal region was defined as the space posterior to the boundary delineating the retinal pigment epithelium layer and Bruch's membrane complex (RPE-Choroid, RPE-C, junction) and superior to the boundary delineating the sclera from the posterior most point of Haller's layer (Choroid-Sclera, C-S, junction).Between the choroid and sclera lies the suprachoroidal space, which is rarely visible on OCT B-scans and we consider not to be part of the choroid itself.The choroidal space is made up of interstitial fluid, or stroma, seen as brightly illuminated strips in the OCT B-scans, with interspersed, irregular areas of darker intensity representing choroidal vasculature This has been both empirically observed [40,26] and widely accepted among the research community [29].The choriocapillaris, a dense network of choroidal capillaries, is seen as a small band below Bruch's membrane complex approximately 10 microns thick [1] (roughly 3 pixels deep in OCT B-scans), and is assumed as part of the choroidal vasculature alongside larger vessels seen in Haller and Sattler's layers.
For OCT B-scans centred at the fovea (i.e.horizontal, verti-cal and radial scans), the foveal column location was detected manually.Those not centred at the fovea do not show the fovea.The GTs for choroidal region segmentation were generated using DeepGPET [25] with the default threshold of 0.5.897 scans were excluded from the dataset (and removed from Table 1 and supplementary Table S1) because of poor region segmentations -these were primarily Topcon B-scans which DeepGPET had not been trained on before.
GTs for vessel segmentation were generated using a novel, multi-scale quantisation and clustering-based approach, called multi-scale median cut quantisation (MMCQ), which we found to produce superior results to standard application of Niblack in preliminary analysis on the training set.MMCQ segments the choroidal vasculature by performing patch-wise local contrast enhancement at several scales using median cut clustering (quantisation) [41] and histogram equalisation.The pixels of the subsequently enhanced choroidal space are then clustered globally using median cut clustering once more, classifying the pixels belonging to the clusters with the darkest pixel intensities as vasculature.The code for this algorithm is freely available here [LINK TO BE ADDED UPON ACCEPTANCE].
To improve the fidelity and robustness of our vessel segmentation GTs, we randomly varied the brightness and contrast of each OCT B-scan before application of MMCQ.We used 5 linearly spaced gamma levels to fix the mean brightness of each image between 0.2 and 0.5 and simultaneously altered the contrast using 5 linearly spaced factors between 0.5 and 3. A 3:2 majority vote for vessel label classification was used across all 25 variants.This improves robustness as spurious over-and undersegmentation contigent on specific image statistics are averaged out.

Choroidalyzer's deep learning model
Choroidalyzer segments the choroid region and vessels, and detects the fovea using a UNet deep learning model [42] with a depth of 7.This relatively high depth allows our model to better consider the global context.The first 3 blocks increase the internal channel dimension from 8 to 64, after which it is kept constant to reduce memory consumption and parameter count.Blocks consist of two convolutional layers, each followed by BatchNorm [43] and ReLU activation.Our up-blocks use a 1 × 1 convolution to reduce the channel dimension followed by bilinear interpolation, which is more compute and memory efficient than the standard transposed convolutions.We train our model for 40 epochs using the AdamW optimizer [44] with a learning rate of 5 × 10 −4 and weight decay of 10 −8 to minimise binary crossentropy, clamping the maximum gradient norm to 3 before each step.We use automatic mixed precision to speed up training dramatically while reducing memory consumption by almost half.Forward pass and loss computation are done in bfloat16, a half-precision datatype optimised for machine learning.
During training, we apply the following data augmentations in random order per sample: Horizontal flip (p = 0.5), changing the brightness and contrast independently (factors ∼ U(0.5, 1.5), p = 0.95), random rotation and shear (degrees ∼ U(−25, 25) and ∼ U(−15, 15) respectively, p = 1 /3), scaling the image (factor ∼ U(0.8, 1.2), p = 1 /3), where U(a, b) denotes a uniform distribution between a and b, and p the probability of the transform being applied.For peripapillary scans which have a resolution of 1536 × 768, we use a crop of 768 × 768 using a random multiple of 192 as offset per example and epoch.
The fovea is only a single point which would be difficult for a segmentation model to learn, as predicting close to 0 for all pixels would yield virtually the same loss as a perfect prediction.Thus, we create a target 51 pixels high and 19 pixels wide centred at the GT fovea location.The exact fovea location is set to 1, the whole column to 0.95, and adjacent columns to 0.95 − (d * 0.1) where d is the column distance from the fovea.Finally, we employ one-sided label smoothing and set all other pixels to 0.01 instead of 0 to stabilise training.We extract fovea column predictions by applying a 21-width triangular filter to the column-wise sums of our model's predictions and taking the column with the highest value.

Statistical analysis
We evaluate agreement in segmentations using the area under the receiver operating characteristic curve (AUC) and Dice coefficient, applying a fixed threshold of 0.5 to binarize our model's predictions.For the fovea column location, we use mean absolute error (MAE) and median absolute error (Median AE).For derived choroid metrics, we evaluate agreement with Pearson and Spearman correlations and further report MAEs.
All choroidal metrics were computed using a region of interest (ROI) centred at the foveal pit, measuring 3mm temporally and nasally -the ROI for volume scans was centred at the middle column index of the image -corresponding to the standardised ROI according to the Early Treatment Diabetic Retinopathy Study (ETDRS) macular grid of 6,000 × 6,000 microns [45].As peripapillary scans do not allow for a fovea-centred region of interest, we only look at segmentation metrics and use a threshold of 0.25 for vessel predictions.Area was computed by counting the pixels within the ETDRS grid, while thickness was measured at three linearly spaced locations, spanning the ETDRS grid, as point-source micron distances between the RPE-C and C-S junctions, locally perpendicular to the RPE-C junction.
Choroid vascular index is the ratio of vessel to non-vessel pixels in the choroid within the ETDRS grid.Our deep learning model outputs probabilities instead of discrete predictions, which capture uncertainty.As capturing uncertainty is desirable, we propose a "soft" vascular index which takes the ratio of predicted probabilities instead of discretized binary predictions.On the validation set, we found that this improves agreement.
To examine and characterise the behaviour of our model, we analysed cases of high error in detail.Concretely, for each of the three tasks (region and vessel segmentation, fovea detection), we selected the 15 examples from each test set where Choroidalyzer produced the highest errors.For redundant cases (i.e.adjacent, highly-similar slices from a volume scan) only one was retained.For fovea detection, cases of low error were also discarded.This left 28 cases for region, 29 for vessel, and 25 for fovea.
An adjudicating clinical ophthalmologist (I.M.) was provided with the original image, Choroidalyzer's prediction and the GT while being masked to the identity of the methods.Images and labels were provided individually and as composites.For each example, the adjudicator was asked which label they preferred.They also rated each label qualitatively on a 5-level ordinal scale ("Very bad", "Bad", "Okay", "Good" and "Very good") for region segmentation quality, as well as intravascular and interstitial vessel segmentation quality.The latter two to quantify any potential under-segmentation of vessels and over-segmentation of the interstitial space.
Finally, we selected a random subsample of 20 B-scans at the patient-level from the external test set to be manually segmented by two graders, M1 and M2.M1 was a clinical ophthalmologist (I.M.) and M2 was a a PhD student who has worked with choroidal OCT data for the last 4 years (J.B.).Manual graders segmented the region and choroidal vessels using ITK-Snap [46].The manual segmentations were compared to Choroidalyzer and to the current state-of-the-art, namely DeepGPET for region segmentations [25] and Niblack for vessel segmentation using a window size of 51 and standard deviation offset of -0.05, which mirrors previously published work [33].

Performance on internal and external test sets
Table 2 shows the performance of Choroidalyzer on the internal and external test sets.Our model achieves very good performance in terms of AUC and Dice for region and vessels on both sets.Metrics for region are higher than for vessels, which is expected as choroidal vessel segmentation is much more difficult and ambiguous than region segmentation, and thus the GTs are themselves imperfect.Performance was slightly higher for the internal test set than the external test set, which is expected, but only marginally so, indicating that our model generalises well to   new cohorts.For the peripapillary scans which only exist in the internal test set, our model achieved an AUC of 0.9996 (region) / 0.9925 (vessel) and Dice of 0.9636 (region) / 0.7155 (vessel).This is reasonable performance but lower than for other scans.
For fovea detection, the model had a MAE of 3.9 px for the internal and 3.4 px for the external test set, with the median absolute error being 3 px for both.This is excellent performance as an error of 3 px on a 768 px-wide image will not meaningfully change our region-of-interest or resulting metrics (data not shown -see the supplementary materials for the analysis effects of fovea location on downstream metrics).For the derived choroid metrics, Choroidalyzer shows excellent agreement with the GTs on thickness and area, with Pearson and Spearman correlations of 0.9692 or greater for both internal and external test sets.For the vascular index, performance is a bit lower, with correlations between 0.7948 and 0.8285.Although vascular index depends on both region and vessel segmentation, the other metrics indicate that the differences in vascular index are driven primarily by differences in vessel segmentation.Still, the observed correlations are high in absolute terms.Fig. 2 shows correlation and Bland-Altman plots for the three derived metrics on both test sets, which likewise indicate generally very good agreement.Fig. 3 shows some examples for each of the three imaging devices.

Comparison with manual segmentations
Table 3 shows the results from manual segmentations.For automated methods, we compare with each manual grader and then averaged the performance across both graders to make the results more concise.The comparisons with individual graders are reported in the supplementary Table S2.Interestingly, while vessel Dice for Choroidalyzer (0.7410 vs M1 and 0.7927 vs M2; mean 0.7669) is again much worse than region Dice and even worse than the vessel Dice on both test sets, it is very similar to the inter-grader agreement of 0.7699.More generally, the intergrader agreements for all other metrics are similar to Choroidalyzer's agreement with the graders, with the notable exception of vascular index.Here, Choroidalyzer's MAE is better (0.0555 vs M1 and 0.0506 vs M2; mean 0.0531) than the inter-grader agreement (0.0618), as is the Spearman correlation, but Pearson correlation and ICC are worse.Compared to the respective stateof-the-art (SOTA, i.e.DeepGPET for region, Niblack for vessel segmentation), Choroidalyzer has better agreement with the graders for most of the metrics, although methods are generally comparable.
Table 4 shows the time per scan for the manual graders and automatic approaches.The manual graders on average needed more than 26 and 22 minutes (mean 24), with the vast majority of that time spent on the vessel segmentation.By contrast, the automatic methods on a standard laptop needed about a second per scan and no human time at all.Thus, to get through a dataset of 100 scans, it would take manual graders about 40 hours of work, but with automated methods it would be less than 2 minutes.With GPU-acceleration, Choroidalyzer and DeepGPET could achieve throughputs of dozens or hundreds of scans per second even on consumer-grade hardware.Comparing the automated methods with each other, Choroidalyzer took 73% less time than DeepGPET and Niblack, while also detecting the fovea location.All three methods are very fast but for very large datasets or deployment on edge devices, Choroidalyzer's efficiency is an additional advantage over existing automated methods.

Detailed error analysis
Table 5 shows the results of manual inspection of scans where Choroidalyzer produced the highest error compared to the GT on the test sets.For region segmentation, Choroidalyzer was preferred in 8 cases, the GT in 5, and both methods were considered equally good in 15 cases.In terms of quality, Choroidalyzer was "Very bad" in only one case compared with 2 for the GT and "Very good" 3 times compared to none for the GT.For the vessels, Choroidalyzer was preferred in 13 cases, the GT in 4, and both were tied in 12 cases.Vessel segmentation is a harder task, with no methods achieving "Very good".However, the intravascular scores for Choroidalyzer are substantially better, with no "Bad" or "Very bad" (vs. 3 and 2, respectively for GT) and far more "Good" (17 vs. 5), and the interstitial scores are similarly better.Finally, for the fovea, Choroidalyzer was preferred 23/25 times and the GT only twice, indicating that large fovea errors are almost exclusively due to mistakes in the manual GT labels.Fig. 4 shows the distributions of fovea errors for both test sets along with each example in both sets.For very large residuals (10+px), the GTs are wrong and Choroidalyzer correctly identifies the fovea location.For errors around 7px, still twice the MAE, both methods are similar with either method sometimes being more correct.Further exploration revealed the majority of incorrectly labelled ground-truths to be Topcon OCT B-scans, as each 12-stack of radial scans are not centred at the fovea, and initial manual annotation detected the fovea for only one to represent each stack.Despite this oversight, Choroidalyzer learned to dectect the fovea robust and accurately.

Discussion
We developed Choroidalyzer, an end-to-end pipeline for choroidal analysis.Choroidalyzer shows excellent performance on the internal and external test sets.Where Choroidalyzer produced the highest errors were primarily cases of imperfect GTs and Choroidalyzer was generally preferred by a blinded adjudicating ophthalmologist (I.M.), further indicating robustness and good performance.Its agreement with manual segmentations, which demand substantial time and attention from a human expert, is comparable to the inter-grader agreement.This suggests that Choroidalyzer performs well compared to laborious   manual segmentation and also highlights the subjectivity introduced by manual graders.Choroidalyzer not only produces results similar to that of a skilled manual grader, it also does so fully-automatically without introducing subjectivity and thus increases standardisation and reproducibility.If researchers use Choroidalyzer, their results are repeatable and would be much more comparable to other studies also using Choroidalyzer than if different manual graders were used in each case.
Additionally, Choroidalyzer saves a substantial amount of time per image over manual segmentation, freeing up researcher time and enabling large scale analyses that otherwise would not be possible.Even compared to the current state-of-the-art for automated methods, DeepGPET and Niblack, Choroidalyzer can do the analysis in roughly a quarter of the time.More importantly, Choroidalyzer provides an end-to-end pipeline which makes it easier to implement and use than having to combine multiple methods like Niblack and DeepGPET.Ease-ofuse is often underappreciated in the literature but key in saving researchers time and allowing them to focus on the science.
Choroidalyzer performed well against manual graders relative to the state of the art methods, reaching or surpassing the levels of agreement even between the two manual graders, particularly for vascular index, a far more difficult metric to calculate accurately than area and thickness.The inter-grader agreement between manual graders for these metrics indicate a potential lower bound of what effect sizes we might expect from these metrics.This has important downstream impact on the statistical confidence of results from cohort studies, particularly when assessing the choroidal vasculature [31,28].
It is often difficult to visualise the choroid due to imaging noise, poor eye tracking and patient fixation, or operator inexperience.Thus, in some cases vessel boundaries can be hard to discern.This is why we proposed to use a soft version of choroid vascular index where the probabilities that Choroidalyzer outputs are used instead of thresholded, binarized segmentations.The probabilities capture uncertainty about the precise location of the vessel wall and thus is more robust than using a single, somewhat arbitrary threshold.Users could also tune the binarization threshold for their own images, if desired, which might help in instances of poor visibility of the choroidal vasculature.
Segmentation performance for peripapillary scans was reasonable but much worse than for other scan types.This could be due to those scans being relatively rare in our dataset and showing parts of the retina on the nasal side of the optic disc that are not captured in fovea-centred scans.More peripapillary training data would likely increase performance.In our opinion, at present Choroidalyzer can be used for these scans but requires subsequent manual inspection and potential correction.Furthermore, adjusting the binarization threshold for the vessel predictions can improve results.
Our model detected the fovea well, and the largest errors were cases where ground-truths were incorrectly labelled with the model correctly identifying the fovea location as confirmed by masked adjudication.Thus, the model performed even bet- ter than what the quantitative results suggest.In the present work, we have focused on identifying the fovea column which is needed to define the fovea-centred region of interest.However, after selecting and evaluating our final model, we realised that in relatively rare, highly myopic cases, the retina and choroid can be at a steep angle relative to the image axes.For those, it would be best to define the region of interest along the choroid axes rather than image axes, most easily done by drawing a centre line from the fovea perpendicularly through the retina and choroid.Thus, it could be useful to also segment the retina and to determine both the row and column of the fovea.While not our initial objective, we did some preliminary analyses and found that we can derive the fovea row well with our current model (data not shown).Furthermore, to understand the effect of fovea location error on downstream choroidal metrics, we simulated random per-sample deviations of ±6px, twice the Median AE, and found that they yielded virtually identical results (see supplementary Fig. S1 and Fig. S2).
The dataset in the present work was substantially larger than the one used for DeepGPET and importantly contains both Heidelberg and Topcon scans.As a result, Choroidalyzer can segment even difficult Topcon scans where DeepGPET failed (Fig. 5).Choroidalyzer was trained on region and vessel GTs generated by fully-and semi-automatic methods, respectively, which were then checked for errors and only manually improved where needed.Recent work argues that such approaches to generating GTs are preferable as they reduce subjectivity and thus bias and inconsistency [47].
Choroidalyzer also has limitations.Most importantly, there is no quality scoring component to reject B-scans that do not show the choroid in sufficient detail to allow for reasonable analysis.While modern OCT devices typically show the choroid in good detail, especially if EDI is used, this is not always the case.Most devices provide some quality indicators, but we have not investigated quality thresholds for specific devices, below which Choroidalyzer would not function.Furthermore, OCT quality indicators are typically focused on the retina, and although poor visualisation of the retina might imply poor visualisation of the choroid, the reverse is not necessarily the case.A quality scoring method specific to the choroid would be a useful addition to the field.Another limitation is that Choroidalyzer was trained only on cohorts relating to systemic health but not ocular disease or data acquired during routine clinical practice.
Future work could improve the underlying deep learning model of Choroidalyzer, e.g. by training and evaluating it on data from more diverse sources.Data with ocular pathology, e.g.abnormally sized choroids due to myopia, age-related macular degeneration or central serous chorio-retinopathy, could be used to investigate whether Choroidalyzer is robust in those contexts and to train an improved version if needed.Moreover, automated quality scoring methods relating to the choroid would address a key need in choroidal analysis.Finally, Choroidalyzer could be extended to measure additional choroidal metrics, such as macular thickness and vessel density maps across a volume, or relating to its curvature.

Conclusion
Choroidal thickness, area, and especially vascular index are highly interesting metrics and potential biomarkers for both systemic and ocular health.However, calculating them used to be laborious and -when done manually -subjective.Choroidalyzer provides an efficient, end-to-end pipeline to alleviate these problems.We hope that by making Choroidalyzer openly accessible, we will enable researchers and clinicians to conveniently calculate these metrics and use them for their research, while improving reproducibility and standardisation in the field.

Thickness Area Vascular index
Figure S2 Correlation and Bland-Altman plots for choroidal metrics for the poorest performing simulation of perturbing the fovea column coordinate on a random 10% subsample of the dataset.

Figure 2
Figure 2 Agreement in thickness, area and vascular index for a) the internal and b) the external test sets.Top row are scatterplots with best regression fit and identity lines, bottom row are Bland-Altman plots.Note that we chose to fit each plot to the data range and thus the scale of the axes are not exactly the same between internal and external test sets, especially for vascular index.

Figure 3
Figure 3 Examples of Choroidalyzer being applied to scans from different imaging devices.Six fovea-centred OCT B-scans, two per imaging device type, from the internal test set showing region segmentations (left), vessel segmentations (middle), and fovea column location (right).

Figure 4
Figure 4 Histogram of absolute errors for fovea column detection for the internal (left) and external (right) test sets.Examples for different levels of error are shown with dotted lines indicating which part of the distribution they come from.In the examples, the teal line indicates the GT label, the dashed orange line the prediction.

Figure 5
Figure 5 Three example Topcon OCT B-scans with successful region segmentations from Choroidalyzer (right) and failed segmentations from DeepGPET (middle).

Figure S1
Figure S1Distribution of Pearson correlation coefficients for each choroidal metric when perturbing the fovea coordinate column.Note the scale of the y-axis, even the lowest correlation we observed was > 0.99.

Table 2
Metrics for Choroidalyzer against ground-truth annotations from the internal and external test sets.

Table 3
Comparison metrics for the 20 images assessed manually and algorithmically from the external test set.Comparisons made between the 2 manual graders, the proposed model and current state of the art, DeepGPET for region segmentation and the Niblack thresholding algorithm for vessel segmentation.SOTA: current state-of-the-art, i.e.DeepGPET for region and Niblack for vessel segmentation.

Table 4
Mean (standard deviation) execution time of the four different approaches to region and vessel segmentation for the 20 images assessed manually and algorithmically from the external test set.SOTA: current state-of-the-art, i.e.DeepGPET for region and Niblack for vessel segmentation.Automated methods were run on a standard laptop with a 4 year old i5 CPU and 16 GB of RAM but no GPU.