whatsapp

whatsApp

Have any Questions? Enquiry here!
☎ +91-9972364704 LOGIN BLOG
× Home Careers Contact

SMS Spam detection and classification using ML

Thousands probably millions of messages and emails are sent almost every day. How many of them are spam? Is there a way to classify them at the first view?

Price : 5500

Connect us with WhatsApp Whatsapp

Course Duration
Approx 10

Course Price
₹ 5500

Course Level
Advanced

Course Content

We worked on the SMS Spam collection data set. It consists of 5574 messages and only 747 of them are spam messages. That means that this dataset is unbalanced and it is possible our final model has some bias.

Exploratory Data Analysis

The data set has only two columns, ‘label’ and ‘text’. By creating a new column named ‘text_length’ we can use it in order to visualize the distribution of two categories: ‘ham’ and ‘spam’.The ‘ham’ subset has a maximum value of the length of 910 characters. Below we cite the two plots for distributions with and without this value.

Insights of the distributions and the statistical analysis

These insights could help us answer the first question. We could say that there is a hidden pattern to classify a message as spam or ham from its length. We could not be sure 100% but there is a big chance to classify it as spam and be right.

Natural Language Process

Using NLP first thing we work on was to take a look at the most common and most significant words for each message category. Below we refer to tools we use for the NLP

Watch free demo