Introduction
Social networking websites are getting popular in recent years. They play a more and more significant role in people’s daily lives nowadays. By using these social networking services, it is convenient for people to communicate with their friends easily, publish posts about their life freely, and follow hot topics immediately. Among these sites, Twitter has had the fastest growth against other social network sites with more than 284 million active users record their daily life. Twitter aims to allow individuals have relation together and share ideas through short messages called tweets.
The nature of such social network enables its users to express their views and opinions freely without any fear of disclosing their identity which leads to undesirable consequences. Social networks have the characteristics of diversification, popularization, high real-time and interactivity. However, it also provides a large quantity of opportunities for spammers as more people are involved than ever. Such websites become a platform for spammers to distribute spam messages which leads to an unpleasant or even deteriorating social network environment. Unfortunately, Twitter has become a new attacked platform for social spammers to achieve their malicious goals such as sending spam [3], spreading malware [4], and performing other illicit activities [5,6]. The studies show that more than six percent of messages in twitter are spam. [2]. Thus, many users suffer from overwhelming disturbance from these messages, and some are even forced to leave these networks. These malicious spamming activities have seriously threatened normal users’ personal privacy and information security. Moreover, these malicious messages lead to serious costs of network resources, interfere with the normal data mining and analysis, and increase the operational burden of social network. Besides, there are also a number of spammers making profits through marketing promotion behaviors including malicious likes, comments, votes and replies, which severely harm the credibility evaluation system of the social network and user’s trust relationship. Therefore, it is important to detect social spammers to protect users’ privacy, information security and quality of social networking.
Problem statement
With the advent of social networks sites, spammers started flooding using new techniques to make the social networking sites such as Twitter as part of the spamming activities. Even though these sites have an option to report spam or abuse activities, the spammers frequently change their address or account to hide their identities.
A major goal of this research is to detect spammers accounts that are involved in the spamming activities in Twitter. The researcher aims to design and implement a system to detect the spammers in Twitter. This will provide valuable insights to the area of spam detection and defense in various social networks. A number of researches has been carried out to analyze the characteristics and behavior of spammers in social networks. However, spammers adapt new strategies to bypass the known detecting systems. Hence, new features and characteristics for spammers need to be explored in order to effectively indicate spam accounts. This research will investigates social spammer content and behavior issues, and proposes an effective machine learning model for spammer detection. The major novelty of the paper is to study a set of most important features related to message content and user behavior and apply them on a number of widely used classification algorithms for spammer detection. All of which are used for training with a real Twitter labeled dataset. The experiment and a comparison work will be done in order to show that the proposed solution enables to provide higher accuracy.
Motivation
Purifying the network environment and providing better experience for users have become pressing tasks for social networks operators. However, existing studies on spam detection mostly use only textual features or some word characteristics that still cannot provide appropriate and satisfactory spam detection method. There are still many long surviving spam accounts. It is obviously that only taking account of text content without considering new features of spammers is not enough for effectively detecting spammers. Furthermore, there are many new techniques used to cover spamming behavior used by spammers and most of studies in the area didn’t consider these techniques. Spammers usually adapt their strategies to bypass the known detecting systems. The first generation of spammers on Twitter was generally naive and had obvious characteristics that helped separate it from the rest of the population, spammers nowadays recur to cheap automated techniques to gain trust and credibility and go unnoticed in the larger crowd.
Consequently, the performances of this kind of detecting methods still need to be improved further. According to literature, there is a high demand to develop a highly accurate detection system that outperforms previous models.
Proposed Work
In this work, we will study the contemporary population of spammers on Twitter. An overview of the complete process of spam detection is described below.
4.1 Data Collection
The preliminary step for the detection of spammers is data collection and necessary preprocessing to convert it into a form, which can be used by the learning algorithms. Since there is an off-the-shelf dataset for social spammer and spam message detection study, the researcher will use one of the known labeled dataset in the field detecting English tweets.
4.2 Feature Identification
Since, spammers behave differently from non-spammers, therefore we can identify some features or characteristics in which both these categories differ. According to literature, there are various features used to detect spam accounts including number of followers, followees and number of URLs. The researcher will propose new features that are most suitable to the detection of the current Twitter’s spammers population to help in developing a highly accurate detection system that outperforms previous models.
4.3 Preprocessing Step
In the preprocessing steps, we extracted feature vector for each user and each message. Taking in to account the new detection features that adapt to the current evading techniques. After that we analyze the difference between spammers and non-spammers from both content and behavior point of view according to dataset collected and we use these features to train a Machine Learning-based system that has the goal of detecting spammers.
4.4 Learning Algorithms
There are various different classification algorithms, which can be used to classify an account as “Spammer” or “Non-Spammer”. In our work, we will use three types of algorithms (not defined yet) modelled by WEKA software which is a very powerful, open source tool and widely used in the area. For the corroboration of result’s accuracy, a comparison of outputs obtained from each algorithm will be done.
Our Service Charter
-
Excellent Quality / 100% Plagiarism-Free
We employ a number of measures to ensure top quality essays. The papers go through a system of quality control prior to delivery. We run plagiarism checks on each paper to ensure that they will be 100% plagiarism-free. So, only clean copies hit customers’ emails. We also never resell the papers completed by our writers. So, once it is checked using a plagiarism checker, the paper will be unique. Speaking of the academic writing standards, we will stick to the assignment brief given by the customer and assign the perfect writer. By saying “the perfect writer” we mean the one having an academic degree in the customer’s study field and positive feedback from other customers. -
Free Revisions
We keep the quality bar of all papers high. But in case you need some extra brilliance to the paper, here’s what to do. First of all, you can choose a top writer. It means that we will assign an expert with a degree in your subject. And secondly, you can rely on our editing services. Our editors will revise your papers, checking whether or not they comply with high standards of academic writing. In addition, editing entails adjusting content if it’s off the topic, adding more sources, refining the language style, and making sure the referencing style is followed. -
Confidentiality / 100% No Disclosure
We make sure that clients’ personal data remains confidential and is not exploited for any purposes beyond those related to our services. We only ask you to provide us with the information that is required to produce the paper according to your writing needs. Please note that the payment info is protected as well. Feel free to refer to the support team for more information about our payment methods. The fact that you used our service is kept secret due to the advanced security standards. So, you can be sure that no one will find out that you got a paper from our writing service. -
Money Back Guarantee
If the writer doesn’t address all the questions on your assignment brief or the delivered paper appears to be off the topic, you can ask for a refund. Or, if it is applicable, you can opt in for free revision within 14-30 days, depending on your paper’s length. The revision or refund request should be sent within 14 days after delivery. The customer gets 100% money-back in case they haven't downloaded the paper. All approved refunds will be returned to the customer’s credit card or Bonus Balance in a form of store credit. Take a note that we will send an extra compensation if the customers goes with a store credit. -
24/7 Customer Support
We have a support team working 24/7 ready to give your issue concerning the order their immediate attention. If you have any questions about the ordering process, communication with the writer, payment options, feel free to join live chat. Be sure to get a fast response. They can also give you the exact price quote, taking into account the timing, desired academic level of the paper, and the number of pages.