Testing Classification Algorithms on Wisconsin Breast Cancer Dataset
Breast cancer (BC) is one of the most common cancers among
women worldwide, representing the majority of new cancer
cases and cancer-related deaths according to global statistics,
making it a significant public health problem in today’s society.
The early diagnosis of BC can improve the prognosis and chance
of survival significantly, as it can promote timely clinical treatment
to patients. Further accurate classification of benign tumors can prevent
patients undergoing unnecessary treatments. Thus, the correct diagnosis of
BC and classification of patients into malignant or benign groups is the
subject of much research. Because of its unique advantages in critical
features detection from complex BC datasets, machine learning (ML) is
widely recognized as the methodology of choice in BC pattern classification
and forecast modelling.
Developer
Sushil Patil
Coding Language Used
Python
The objective of this project is to see the performance of different classification techniques on Wisconsin's Breast Cancer Dataset to classify the patients into benign or malignant groups. I've used following techniques for the project:
- Artificial Neural Networks
- Logistic Regression
- Linear Support Vector Machines
- Non-Linear Support Vector Machines
- Random Forest
- K Nearest Neighbours
- Decision Trees
Results
Following results were obtained after runnning all the techniques on train dataset.
Based on the above analysis we can see that Kernel SVM has the highest accuracy but here committing type 2 error in this business problem i.e. classifying someone with breast cancer as healthy is very costly and has the risk of human life. So, Recall is more important parameter in this case and the classification technique with highest recall value should be chosen even if we have to compromise on Accuracy. So, the decision-makers should go with KNN classification method.