The deep learning model
was developed with 1868 eligible NCCT scans with non-traumatic ICH collected
between January 2011 and April 2018. We tested the model on two independent
datasets (TT200 and SD 98) collected after April 2018. The model’s diagnostic
performance was compared with clinicians’ performance. We further designed a
simulated study to compare the clinicians’ performance with and without the
deep learning system augmentation.
The proposed deep
learning system achieved area under the receiver operating curve of
0.986 (95% CI 0.967–1.000) on aneurysms, 0.952 (0.917–0.987) on hypertensive
hemorrhage, 0.950 (0.860–1.000) on arteriovenous malformation (AVM), 0.749
(0.586–0.912) on Moyamoya disease (MMD), 0.837 (0.704–0.969) on cavernous
malformation (CM), and 0.839 (0.722–0.959) on other causes in TT200 dataset.
Given a 90% specificity level, the sensitivities of our model were 97.1% and
90.9% for aneurysm and AVM diagnosis, respectively. The model also shows an
impressive generalizability in an independent dataset SD98. The clinicians
achieve significant improvements in the sensitivity, specificity, and accuracy
of diagnoses of certain hemorrhage etiologies with proposed system
augmentation.