Netspam Detection using Machine Learning
Price : 10000
Nowadays, a big part of people rely on available content in social media in their decisions (e.g., reviews and feedback on a topic or product). The possibility that anybody can leave a review provides a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research, and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this paper, we propose a novel framework, named Net Spam, which utilizes spam features for modeling review data sets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. The results show that Net Spam outperforms the existing methods and among four categories of features, including review-behavioral, user- behavioral, review-linguistic, and user-linguistic, the first type of features performs better than the other categories.
In modern world of informatics for knowledge gathering social media portals play an important role. Today a lot of people rely on the written reviews of other users to select services and products. The reviews which are written by customer serves better to service provider for improving quality of product and services. The reviews therefore play a vital role in taking their business to height. While positive contents of reviews can provide boost to a business, negative contents of reviews can highly affect integrity and causes business failures. Since any person can write comments as review, gives a chance for spammers to comment spam reviews which deceive users’ choices. A lot of techniques have been used to identify spam reviews based on language patterns, behavior patterns. Graph based algorithms are also used to identify spammers. The concept of Net Spam is to build a retrieved review dataset and to convert the issue of spam identification into a classification issue.
Now days, online reviews have become one of the vital elements for customers to do online shopping. Organizations and individuals use this information to buy the right products and make business decisions. This has influenced the spammers or unethical business people to create false reviews and promote their products to out-beat competitions. Sophisticated systems are developed by spammers to create bulk of spam reviews in any websites within hours. To tackle this problem, studies have been conducted to formulate effective ways to detect the spam reviews. Various spam detection methods have been introduced in which most of them extracts meaningful features from the text or used machine learning techniques.
We proposed a system using various machine learning techniques and algorithms to detect new spam based on twitter data.
Hardware Requirement
Ø System : Pentium IV 2.4 GHz.
Ø Hard Disk : 500 GB.
Ø Ram : 4 GB
Ø Any desktop / Laptop system with above configuration or higher level
Software Requirements
Ø Operating system : Windows XP / 7
Ø Coding Language : Python
Ø Interpreter : Python 3.6
Ø IDE : Python IDE
Ø ML APIS : Sklearn, numpy, pandas, machine learning algorithms
Nowadays, a big part of people rely on available content in social media in their decisions (e.g., reviews and feedback on a topic or product). The possibility that anybody can leave a review provides a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research, and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. We proposed a system that will detect spam tweets with good accuracy using machine learning techniques and algorithms like SVM, KNN, Naïve Bayes, Logistic Regression, Decision Tree etc.