Skip to content

cesarliz10/spam_classifier_kaggle

Repository files navigation

Prediction model for SMS Spam classification

In this project a Prediction model is built for the SMS Spam Collection dataset of the kaggle competition https://www.kaggle.com/uciml/sms-spam-collection-dataset.
The dataset corpus consists of 5574 SMS messages collected for Spam research. Each message has been tagged either as legitimate or spam. Further details and proper acknowledgements for the dataset collection can be found on the link aforementioned. The development of the model has been conducted with Python (scipy, pandas, scikit-learn, matplotlib, seaborn).

Most of the classifier features are the outcome of a Bag-of-Words approach. Due to the large number of features generated, spare matrices (scipy) have been used.