Our study applied the latest Conformité Européenne-marked version of a DLS designed for DR detection (indicates conformance with European Union product legislation). The algorithm’s development is described in detail by Krause et al. In brief, a deep neural network was trained with an ‘Inception-V.4’ architecture to predict a 5-point DR grade, referable diabetic macular oedema (DMO), and gradability for both DR and DMO. The input to the neural network was a colour retinal photograph with a resolution of 779×779 pixels. The neural network outputs a number between 0 and 1 (indicating its confidence) for each prediction. This value is determined through multiple computational stages, parameterised by millions of numbers.

The model was trained by presenting images from a training set consisting of 2.3 million retinal photographs with a known DR severity grade. For each photograph, the model predicted its confidence for the known severity grade, slowly adjusting its parameters to improve its accuracy over time. A tuning dataset evaluated the model throughout training to determine model hyperparameters. An ‘ensemble’ of five individual models was then created to combine predictions for the final output. To transform the model’s confidence-based outputs into discrete predictions, a threshold was used for each binary output (DMO, DR gradability and DMO gradability), and a cascade of thresholds was used to output a single DR severity level. Operating thresholds were optimised for high sensitivity suitable for a screening setting as previously described, and locked prior to the commencement of this study.

Despite generally performing well, an important limitation of deep learning systems (DLSs) is a tendency for reduced performance when applied to populations distinct from those in which they were developed. These discrepancies may arise for several reasons, such as variations in normal features or disease characteristics.

Since the large training datasets required to develop a DLS tend to favour well-resourced populations, there are concerns that poor generalisability could lead to the exacerbation of healthcare inequities. Furthermore, there is evidence that existing structural biases may be translated into the performance of algorithms during training. Numerous examples exist within medical imaging where AI systems underperform among racial and ethnic minority groups.

Recent work has demonstrated a possible mechanism for such a bias DLSs learn to predict racial identity even when this is unrelated to the task at hand. Even more concerning, we are unable to prevent this from occurring since the basis for these predictions is unknown.

The overall implication of these findings is that explicit assessment of model performance within racial and ethnic subgroups is critical. This is particularly important for disadvantaged communities where the benefits of improved efficiency are likely to have the greatest impact. 

Careful consideration of processes for integrating DLSs into clinical-care pathways is critical, especially for Indigenous Australians. In addition to lower screening rates, Indigenous patients experience reduced follow-up after referral. Proposed explanations for this include:

  1.  higher proportions living in areas serviced by visiting specialists.
  2. reduced accessibility through conventional communication pathways such as mail and telephone and
  3.  poor understanding of the need for attendance. A key benefit of a DLS is the ability to provide an immediate referral decision at the time of screening, facilitating in-person education and appointment planning. Although there is some supporting evidence derived from other settings that such a pathway would result in increased referral adherence, further work in this area is needed.

Our study shows that a DLS can detect DR in an Indigenous Australian cohort with improved sensitivity and similar specificity compared with a retina specialist. This demonstrates the potential of the system to support DR screening among Indigenous Australians, an underserved population with a high burden of diabetic eye disease. Inadequate DR screening represents an important source of healthcare inequity and is therefore an urgent priority for Australia.

Leave a Reply