Investigated deep learning approaches for classifying knee osteoarthritis severity from X-ray images using the Kellgren-Lawrence (KL) grading scale. Evaluated seven architectures and built a weighted ensemble of the top performers.
Key Results
- • Xception achieved best performance: 71.7% accuracy, 0.69 F1, 0.64 Cohen's Kappa
- • Evaluated 7 architectures including baseline CNN and 5 pretrained models
- • Built weighted ensemble of top 3 models
- • Thorough failure analysis identifying model collapse and confusion patterns
- TypeAI Coursework
- DateFebruary 2026
- StackPython, TensorFlow/Keras, Transfer Learning
- Dataset352 knee X-ray images (KL grading scale)
Architectures Evaluated
- • Baseline CNN: Custom convolutional architecture
- • Attention CNN: Squeeze-and-Excitation + spatial attention modules
- • VGG16: Pretrained on ImageNet with fine-tuning
- • ResNet50: Deep residual network
- • DenseNet121: Dense connectivity pattern
- • EfficientNetB0: Compound-scaled architecture
- • Xception: Depthwise separable convolutions (best performer)
Failure Analysis
Conducted thorough failure analysis identifying model collapse in ResNet50 and EfficientNetB0 (both defaulted to predicting dominant classes), and adjacent-grade confusion patterns where models struggled to distinguish between neighbouring KL grades. Addressed challenges including class imbalance, small dataset size (352 images), and subtle inter-grade visual differences.