E-mail spam filtering is a very widely discussed and studied topic in the field of pattern classification. E-mails can be filtered as spam or non-spam based on many features such as the frequency or occurrence of a few words in the e-mail, the length of the e-mail or the domain from which it is being sent. Based on these basic characteristics, researchers have come up with many techniques to identify a spam e-mail for a non-spam e-mail. While most of these techniques are based on strong foundations, there are subtle or wide differences in their efficiencies. By efficiency, I mean the accuracy, time required to get to the result and other factors which can give one algorithm or technique an edge over the rest. In this project, I aim to implement and evaluate three major e-mail spam filtering algorithms.

Ø Realizing this issue, it is an imperious necessity to develop more accurate and effective spam detection models for the emailing platforms.

Ø In this paper, an efficient email spam detection model based on machine learning is proposed to detect spam emails.

Hardware Requirement:-

• System: Pentium IV 2.4 GHz.

• Hard Disk: 500 GB.

• Ram: 4 GB.

• Any desktop / Laptop system with above configuration or higher level .

Software Requirements:-

• Operating system : Windows XP / 7

• Coding Language :Python

• Interpreter :Python 3.6

• IDE : python IDE

• ML APIS : Sklearn, numpy, pandas, matplotlib, machine learning algorithms.

METHODOLOGY

· A dataset about emails spam is taken.

· The dataset is loaded and preprocessed with various machine learning techniques.

· The preprocessed data is divided as training and testing data.

· The prediction model is built using several machine learning algorithms like KNN, logistic regression, SVM and Naïve Bayes etc.

· The model is trained using training dataset and once the model has been trained successfully it has to be tested.

· The trained model is tested using testing dataset and accuracy is calculated.

· The algorithm which gives the best accuracy is taken as our final prediction model.

· The finalized model can be used to detect malicious emails and also classify them as normal and spam emails.

Spam email is one of the most demanding and troublesome internet issues in today’s world of communication and technology. Spammers by generating spam mails are misusing this communication facility and thus affecting organizations and many email users. In order to enhance the system’s performance and results, the concept of boosting approach could be considered for future work. The boosting technique will replace the weak classifier’s learning features with the strong classifier’s features and thus enhancing the overall system’s performance.

E-mail Spam Detection

Course Content

Information

Customer Service

Extra

My Account

Help & Support

Connect Us