Today, there are about a dozen approved AI systems for breast cancer screening. But so far it is difficult to objectively assess the accuracy of the algorithms.
Now researchers at Karolinska Institutet have developed a national validation platform that can compare how well different AI systems are at detecting signs of breast cancer.
So far, the platform has been used to start evaluating the algorithms of three different companies, which are based on nearly 40,000 mammograms from three regions in Sweden.
It is important to evaluate the diagnostic accuracy of AI algorithms that can be used clinically. Even if they meet regulatory requirements, it doesn’t mean they work well in all contexts, says researcher Frederic Strand at the Karolinska Institutet.
Code provided
In a study, researchers describe how they progressed. We hope more people can create similar platforms.
“By making our code freely available, we hope it will be useful in algorithmic assessments of more types of cancer than breast cancer,” says Frederick Strand.
He believes that the possibility of an objective assessment of the different algorithms is long overdue. Several regions of the country are in the early stages of using AI in mammography screening.
Therefore, it is necessary to develop systems that can evaluate algorithms under relevant local conditions. Frederick Strand says it’s important for each hospital to choose the right regimen based on their circumstances so they don’t risk missing out on breast cancer or calling an unnecessarily large number of healthy women.
Differences can be detected in training and technique
Today, there is no standardized assessment of the accuracy of algorithms in medical diagnosis. This is because manufacturers of AI systems have trained and tested their algorithms on different mammogram images.
In the platform, all algorithms are allowed to process the same images, and then the result is compared to the “conclusion” about the actual diagnosis of cancer from the National Breast Cancer Quality Registry.
It makes it possible to show differences in algorithms that may depend on how they are trained and the techniques and methods used in mammography screening. Because existing AI algorithms are trained on specific demographics that developers have access to, they can contain bias* that skews results.
– The platform will also be able to show the biases present in the algorithm, especially in terms of age, geographical origin and socioeconomic status, says Frederic Strand.
* Bias is systematic errors that can occur, for example, as a result of selection in research. Then the results can be distorted.
Help for manufacturers
The hope is that this way of testing algorithms will help manufacturers improve their products.
– However, it is important that healthcare contribute to development by requiring manufacturers to take part in independent tests, for example before purchase, says Frederick Strand.
Regions can use the platform
The platform was developed as part of a research project that will end in 2024. The researchers will now come up with proposals on how to make the platform permanent for national use. At the same time, more regions have been invited to use the platform already today.
– We would like more regions to take the opportunity to participate in this opportunity. Even when it comes to evaluating algorithms other than the three already proven, says Frederic Strand.
The pilot project started in 2021 and the platform was completed last year.
Stady:
VAI-B: a multicenter platform for external validation of AI algorithms in breast imagingAnd Journal of Medical Imaging.
communication:
Fredrik Strand, researcher at the Department of Oncology and Pathology, Karolinska Institutet and radiologist at Karolinska University Hospital, [email protected]
“Extreme tv maven. Beer fanatic. Friendly bacon fan. Communicator. Wannabe travel expert.”
More Stories
The contribution of virtual reality to research in medicine and health
The sun could hit the Internet on Earth
In memory of Jens Jørgen Jørgensen