A DYNAMIC APPROACH TO SPAM FILTERING · A DYNAMIC APPROACH TO SPAM FILTERING Indumathi.J1,...

22
A DYNAMIC APPROACH TO SPAM FILTERING Indumathi.J 1 , Gitanjali.J 2 1 Department of Information Science and Technology, Anna University,Chennai 600 025. Tamilnadu,India. 2 School of Information Technology and Engineering VIT University, Vellore-632014, Tamil Nadu, India *Corresponding author E-mail: [email protected] July 13, 2018 Abstract Spam, the unsolicited guest in our inbox is showing in- credible proliferation in the recent period. Spam is encoun- tered by every internet user at one point or the other. Spam- mers are growing leaps and bounds and they tirelessly get equipped with the latest technologies; forcing the research community to devise new anti-spam techniques. The email service providers in spite of providing filters, find that it is challenging to keep up with the shifting sands of spam technology. Hence, development of anti-spam filtering tech- niques should be perpetually kept in stride with appropriate outcome solutions. The algorithms frequently, determine spams using statistical or heuristic schemes. The paper pro- poses a system with improved facilities that can overcome all the limitations of the existing system. The proposed Fac- tors Hyperbolic Tree (FHT) based algorithm merged with the Bayesian algorithm (unlike the lexical matching algo- rithms), handles spam filtering in a dynamic environment by considering various relevant factors like age, temperature 1 International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 179-200 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 179

Transcript of A DYNAMIC APPROACH TO SPAM FILTERING · A DYNAMIC APPROACH TO SPAM FILTERING Indumathi.J1,...

A DYNAMIC APPROACH TO SPAMFILTERING

Indumathi.J 1, Gitanjali.J2

1Department of Information Science and Technology,Anna University,Chennai 600 025.

Tamilnadu,India.2School of Information Technology and EngineeringVIT University, Vellore-632014, Tamil Nadu, India

*Corresponding authorE-mail: [email protected]

July 13, 2018

Abstract

Spam, the unsolicited guest in our inbox is showing in-credible proliferation in the recent period. Spam is encoun-tered by every internet user at one point or the other. Spam-mers are growing leaps and bounds and they tirelessly getequipped with the latest technologies; forcing the researchcommunity to devise new anti-spam techniques. The emailservice providers in spite of providing filters, find that itis challenging to keep up with the shifting sands of spamtechnology. Hence, development of anti-spam filtering tech-niques should be perpetually kept in stride with appropriateoutcome solutions. The algorithms frequently, determinespams using statistical or heuristic schemes. The paper pro-poses a system with improved facilities that can overcomeall the limitations of the existing system. The proposed Fac-tors Hyperbolic Tree (FHT) based algorithm merged withthe Bayesian algorithm (unlike the lexical matching algo-rithms), handles spam filtering in a dynamic environmentby considering various relevant factors like age, temperature

1

International Journal of Pure and Applied MathematicsVolume 120 No. 6 2018, 179-200ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

179

etc., A decision is made normally by considering a certainamount of factors along with specific reasoning along withthe keywords. To express the relationship among decisions,factors objects, a FHT is used which can describe every ob-ject of interest with related factors. The outcomes of theexperimentations show this practice professionally filters E-mail, along with enhancement of the filtering degree of pre-cision. In this paper, the proposed FHT algorithm, which,filters e-mail by conditional factors instead of word matchingis enhanced. The enhance FHT is tested and proved to beapt for semantic analysis and decision supporting field. Theresults of the experiments, show that the proposed enhancedtechnique can more proficiently filter E-mail, and improvethe filtering degree of precision, than the FHT algorithm.Experimentally, the enhance FHT is a more practical tech-nique to be used in anti-spam filters; and it outperforms theprevious techniques, thereby proving it is more robust.

Key Words:Spam,Ranked Term Frequency,Factors Hy-perbolic Tree,Bayesian algorithm

1 INTRODUCTION

In the hands of a cybercriminal, spam not only improves subjectivegains; it also opens the Pandoras Box, leading to pilfering a person’sidentity or sell contraband or stalk preys

or unnerve operations with malicious programs or disseminatemalicious contents. When the entire planet is strenuous to realizethe purpose in lifetime by making a difference in all the line of work,the spam flip-flops everything with its payloads. Spam, the unso-licited bulk mail is usually sent to several recipients, with payloadsof virus and Trojans. Let us take a quick look at the significance,history of email spams and filtering techniques used.

1.1 HISTORY OF EMAIL SPAM

The first documented spam is a note, promoting the availabilityof a new model of Digital Equipment Corporation (DEC) comput-ers; and was sent by Gary Thuerk to 393 recipients on ARPANETin 1978.His assistant, Carl Gartley, wrote a single mass e-mail forwhich the feedback from the net community was violently negative.

2

International Journal of Pure and Applied Mathematics Special Issue

180

From the USENET, the world’s biggest online conferencing system,mailing, the computer nerds (who were big fans of Monty Python)recognized the mass mailing as spam and the name caught on. Af-ter this major recorded instance, many spam incidents have madequantum jumps. This unsolicited email was called spam. Table 1in annexure gives a quick view of the spam timeline.

1.2 WHYDOWECALL UNWANTED EMAILS”SPAM”?

The geneses of the term is hilarious. The word originates fromMonty Pythons famous Spam sketch. It was the script of that spamskit the inexorable spam, spam, spam, spam, and the way thatunwanted mail just throw up out of the PC the email connection wasforged by the Monty Python sketch. As the scene develops, you cansee why this outline is a great analogy for uninvited and unwelcomeemail. Till date, the term spam is used to indicate the uninvitedjunk mail which makes up around 80 to 85% of email. Likewise,the anoraks who tenanted the PC world were ardent Python fans.Monty Python sketch happens in a spoon cafe with a menu in whichevery single dish features spam.

1.3 MOTIVATION

Surf-Controls Anti-Spam Prevalence Study (2002) points out thatthere are many reasons unsolicited commercial e-mail is such a prob-lem:

• Consumer perception: for many people the accessing of e-mailstill represents a bit of a struggle especially at peak traffic times,and network congestion can make it an laborious task to simplydownload your e-mail.[Surf Control ”AntiSpam Prevalence Study”(2002) (last accessed 19 June 2005)]

• Cost shifting: The inexpensiveness of an e-mail is the mainreason for its soaring utility. From the advertisers perspective, theprice of sending hundreds or thousands of messages per hour, iseven more economical. But the costs of receiving it ranging fromthe long-distance charges or per-minute access charges for dialinginto an Internet service provider (ISP) to the cost of connectivityand disk storage space at the ISP and the inevitable administrative

3

International Journal of Pure and Applied Mathematics Special Issue

181

costs when the incoming flood outstrips capacity, results in systemoutages. These costs can be quite substantial. Here cost of oppor-tunities is possible lost because of system outages, delayed services,and overflowing mailboxes. .[Surf Control ”AntiSpam PrevalenceStudy” (2002) (last accessed 19 June 2005)]

• Fraud: In response to several surveys it is found that theconsumers hate to receive spams and, many ISPs have taken avariety of costly steps to reduce the volume of spam transmittedthrough their systems, including the build-up of extra capacity toaccommodate the demands of filtering and storing.

• Global implications: Email is a wonderful tool of professionaland personal communication; there are even more far-reaching po-tentials of e-mail that may be lost if the medium’s functionality andutility get destroyed by the proliferation of junk e-mail. The In-ternet is an incredible tool for spreading information critical to thedevelopment of freedom and democracy around the world. .[SurfControl ”Anti-Spam Prevalence Study” (2002) (last accessed 19June 2005)]

• Harm to the marketplace: The email message from the spam-mers travels to million people, via numerous other systems en routeto its destinations, once again shifting cost away from the origina-tor. The carriers in between precipitously are bearing the burden ofcarrying advertisements for the spammer.[Surf Control ”Anti-SpamPrevalence Study” (2002) (last accessed 19 June 2005)]

• Theft: The sending of spam results in one party’s imposingcosts on another, against the party’s will and without permission.Some have called unsolicited e-mail a form of postage-due market-ing or a form of theft. .[Surf Control ”AntiSpam Prevalence Study”(2002) (last accessed 19 June 2005)]

This paper is organized as follows: In the subsequent section2, the perils of spam are briefly reviewed and the relationship ofspam with privacy and security are presented in Section 3. Possiblesolutions and antispam techniques are dealt in the sections 4 and5.The notions of Factor Hyperbolic Tree, Ranked Term Frequency[RTF] are explained in section 6, 7.The results and analysis and theconclusions are given in Sections 8 and 9 respectively.

4

International Journal of Pure and Applied Mathematics Special Issue

182

2 WHY SPAM IS A PROBLEM?

The most hostile of spam e-mails is that, these are not only as-saulting the users without their assent, but also invades the e-mailspace of the user, discarding the network capacity and consumesprolonged time in checking and deleting the spam mails. To lookat the tip of the iceberg, a few issues arising out of spams are dis-cussed.

2.1 DESTRUCTIONS CAUSED BY SPAM.

Direct damages caused by spams are Loss of productivity and Useof corporate network resources like bandwidth, disk space. Theindirect damages, like the risk of shifting spam accountability fromone person to another or domain, are being recognized as spammersby the servers that have been sent spam deprived of knowing itand accidently deleting important valid messages erroneously wheneradicating spam.

Spam by propagation also leads to dangers like malware whichdoes not have its own means of propagation: Trojans, key loggers,backdoors etc.

There are cases wherein criminals, sent emails with attachmentsenclosing malware, and utilized it to access information stored onyour computer. This known as ’invoice’ email scam steals bankdetails as the malicious software logs your online banking details,along with other financial information, and sends it to criminals.

Trolls (http://www.digitalethics.org/essays/end-anonymous - emails)are individuals who hide their identity and behind an anonymousemail address, derive pleasure in create an atmosphere of hate anddiscontent ranging from infrequent off-color remarks to full-blowndaily bullying with far-reaching and sometimes tragic consequences.Just this year, a young girl committed suicide after being cyberbul-lied by anonymous posters on the site www.ask.fm.

3 SECURITY,PRIVACY AND SPAMS

Security and privacy are like the two eyes and they are indispens-able. There are several mechanisms of preserving privacy Gitan-jali J.,(2007,2008,2009),Indumathi.J.,(2012,2013a,b,c), Indumathi

5

International Journal of Pure and Applied Mathematics Special Issue

183

J., Uma G.V.(2007a,b,2008a,b,c,d),Murugesan K.,(2009,2010a,b),PrakashD.,(2009),Satheesh Kumar K(2008).

ePrivacy DirectiveThe Directive on Privacy and electronic communications (2002/

58 /EC), part of the EUs eCommunications regulatory framework,which came into force in July 2002 states that; the ePrivacy Direc-tive protects the privacy and the personal data of natural persons(and the legitimate interests of legal persons) when using commu-nications services. It also bans spam and spy ware.

(https://iapp.org/media/pdf/resource center/ePR 2018-04-13 .pdf)Does the Spam Filter Compromise the Privacy of My Email?A spam filtering mechanism has to be devised in order to use

a spam control system which does not jeopardize the privacy ofclients email messages. The spam filter software should run onmail servers, which is absolutely automated, running with minimumhuman intervention, should not save any data about the messageafter regulating their spam scores.

4 SOLUTIONS

We can keep the spam problem at bay and limit it to tolerablelevels. The research community has continuously striving to de-tect and eliminate spam for quite a few years. Several techniqueshave shown beneficial, confined results, but most only for a shorttime. Henceforth, for any new anti-spam proposal, we need anarray of complementary techniques and sustained efforts to accli-matize them, as spammers endure to adapt their own methods.The solutions can be taken in terms of its likely incremental bene-fit, rather than as a nominee to be the Final Ultimate Solution toSolve Spam (FUSSP).

4.1 How Anti-Spam Can Save Your Business?

(https://afterglowprod.com/security/4-important-reasons-to-use-anti-spam-filtering-in-your-business/)

Block threats: The spam filter scope is to block the spam fromever reaching the email client. The best solution is to automaticallydetect and delete or hold the activating spam malware securely.These spams get activated either instantly or slowly.

6

International Journal of Pure and Applied Mathematics Special Issue

184

Filter legitimate emails: A genuine mail has to be discernibleand shouldnt be trashed. Anti-spam filtering is bestowed with eru-dite recognition capabilities which prevents spam only and lets thegenuine email mail to land securely in mailboxes.

Meet data regulations: all businesses are subject to strict pri-vacy and data storage regulations, and one has to meet conditionsincluding always using spam filtering to reduce the risk of databreach.

Protect your business reputation: Unless the data of the clientis fully protected, the company faces financial loss, business reputa-tion takes a nosedive and they have to admit the breach. Anti-spamfiltering can ensure these types of scenarios dont happen to you.

5 ANTI-SPAM TECHNIQUES

To tackle the spams we have to concoct protection mechanisms. Aquick glance at the anti-spam techniques are as follows:

Various anti-spam techniques are used to prevent email spam.No technique is a complete solution to the spam problem, and eachhas trade-off between incorrectly rejecting legitimate email (falsepositives) vs. not rejecting all spam (false negatives) and the asso-ciated costs in time and effort.

The Anti-spam techniques can also be discussed under fourbroad groups:

• Anti-spam techniques which assist individual actions,• Anti-spam techniques programmed by email administrators,• Anti-spam techniques automated by email senders and• Anti-spam techniques employed by researchers and law en-

forcement officials.The different approaches which filter data, by automatically

identifying and eradicating the untenable messages are• Knowledge-based technique,• Clustering techniques,• Learning based technique,• Heuristic processes.Different existing email spam filtering system regarding• Machine Learning Technique (MLT) such as Nave Bayes, SVM,

K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and

7

International Journal of Pure and Applied Mathematics Special Issue

185

rules.An email comprises information about the sender, receiver, and

message and there are divergent mechanisms to safe guard them.Header forging is taken care by Google, Yahoo, and Microsoft whospend money and resources. Our fortification is usually determinedby the email service utilized as everyone uses different type of pro-tections.

Types of Filters based on locationDepending on the location of the message navigating from the

sender to the subscribers inbox, we can categorize the various typesof filters based on influence deliverability and inbox placement asGateway spam filters, Third party (or hosted) spam filters andDesktop spam filters.

Types of filtering analysisThe main types of filtering analysis, looking at the four main

aspects of mail when making filtering decisions are the source ofthe mail, the reputation of the sender, the content of the mail theysend and Subscriber engagement

6 RELATED WORK

An extensive comparison of the different methods has been madein many papers [Androutsopoulos et al.,(2000), H. Katirai [1999],K. Mock [2001], Zhang et al.,(2004)]

Amidst several techniques, the methods proposed by, Fiaidhiet al. [2013] and Arora et al. [2014] estimates that, 70% todaysbusiness emails are spam[Scholar, M. (2010)].

Knowledge engineering and Machine learning are two subdivi-sions of spam filtering. Knowledge engineering guides you to de-termine the spam emails; whereas, Machine learning does not needany rules defined prior.

As majority of the spam filtering methods utilize text techniques[Chang et al.,(2009)]; classifiers are used to extract features froman email. Many available models use machine learning algorithms[Asuncion et al.,(2007), Ian H. Witten et al.,(2005), T.S. Guzella

.,(2009)]. There are systems that use automatic classification ofemails [4]; that use decision-based systems [C. Wu(2009)], Bayesianclassifiers [Yue Yang et al.,(2007)], support vector machine [I. An-

8

International Journal of Pure and Applied Mathematics Special Issue

186

derouysopoulos et al.,(2000 a,b)], neural networks[Yue Yang et al.,(2007)]and sample-based methods [F. Fdez Riverola et al.,(2007)].

Literature study also indicates that there are various approachesof features selection methods used in e-mail classification. But weneed to classify only with a small set of discriminative features ispreferred in view of processing complexity.

The trajectory of three information visualization innovations aretree maps, cone trees, and Hyperbolic tree. Lamping et al.[1995]introduced the Factors Hyperbolic Tree (FHT) which uses a hy-perbolic tree. Hyperbolic trees employ hyperbolic space, whichinnately

has ”more room” than Euclidean space. Hyperbolic trees havebeen patented in the U.S. by Xerox.[ US patent 5590250, Lamping].

The charts are ideal for exhibiting single values and simple se-quences of values. But the case is tangent when hierarchical dataneeds to be displayed. An array of problems arise when the usersneed to visualize the information relating to each individual node,and the existence (and possibly the nature) of the relationships be-tween nodes. The problem is further compounded wherein, it be-comes cluttered with very large trees.To overcome this limitation aHyperbolic Tree (or Hypertree) is used.

7 HYPERBOLIC TREE

The Hypertree, reduces the screen space used to display nodes ex-ponentially as the distance from the centre of the chart increases.Thus the Hypertree can pack much more information into the littleavailable space than a standard tree graph. Hyperbolic Trees (orHyper Trees, or H-Trees) is a metaphore for the graphical repre-sentation of ontologies. A hyperbolic tree originates from a centralpoint (node/general category), and it then branches off into subcat-egories/ outgoing relations. Every subcategory continues to branchout so that you have manifold levels of specificity.

Advantages- The hyper tree is dynamic, it has good focus, itprovides user with main focal point in the whole hierarchy andusers can select their own navigation direction. It is useful to showdiverse discrete levels of detail for one category.

Disadvantages-It has copyright issues. Because of the hyperbolic

9

International Journal of Pure and Applied Mathematics Special Issue

187

distortion, some nodes are too close to the border and therefore maynot be visible to all.

8 IMPLEMENTATION

8.1 FACTOR HYPERBOLIC TREE (FHT)

Keeping in view of the tradeoffs, we have taken the Factor Hy-perbolic Tree (FHT) for anti-spam filtering. The factor hyperbolictree based algorithm deliberates on the dynamic factors to filterspam, and it eludes the use and upkeep of white-list and black-list.Frequently objects have relevant factors that affect its value. TheFHT model expresses the relationship between objects, its factorsand decisions. To implement the model a large database is essentialand it should be periodically updated with knowledge about vari-ous factors. The decision is taken by bearing in mind the factorswith its reasoning. The issues such as digitizing the relationshipbetween objects and factors, accessing relevant factors from hugedatabase, types of factor that must be included, etc. should beaddressed before the implementation.

8.1.1 CLASSIFIED FACTORS

The factors can be classified as two types:• Normal Factors- The factors that change according to time

naturally or periodically.• Abnormal Factors - The factors that occur suddenly and affect

the object quickly.The fuzzy logic is used to score the e-mail. This score is related

to a predefined value to check if the mail is a spam or not. Theinfluencing factor in the algorithm is the object‘s attribute whichis matchless to every object.

10

International Journal of Pure and Applied Mathematics Special Issue

188

[Source:https://www.ontology4.us/Ontology4/Visualization/Hyperbolic-Trees/index.html]

8.2 RANKED TERM FREQUENCY [RTF]

The Ranked term frequency algorithm extracts features from themail. Generally nouns are reclaimed from the mail and exposedto verification, that how important is that word for the document.The term is ranked according to the number of times it appears inthe document. This count is normalized. A high weight in RTF isreached by a high term frequency, so the weights tend to filter outimportant noun terms. Henceforth, to know the real meaning ofdocument, first m nouns with greater weights are selected to FHTprocess.

a. FHT based algorithm

1. Extract features by using RTF and input to FHT.

2. Search related factors to the features in FHT and output fac-tors to Fuzzy Logic Unit in two types: normal factors andabnormal factors.

3. Fuzzy Logic Unit will compute normal factors to obtain anincomplete result; then abnormal factors will be processed toobtain another incomplete result.

11

International Journal of Pure and Applied Mathematics Special Issue

189

4. Weighted computation Unit will calculate two incomplete re-sults with predefined weights to obtain a final decision.

Factorial analysis is applied to optimally setting up a rule base,during step 3.We test the final output compared to prior knownspam to identify the spam email by FHT.

b. Bayesian AlgorithmBayesian spam filtering, is a commonly used to identify and

filter spam. It entails manual intervention as a user would train thefunction as to what is spam. The tool uses the concept of Bayestheorem and so using probabilities to weigh a message. Gmail andYahoo use Bayesian spam filtering technique. Bayesian spam filterlooks for combination of words that are statistically likely to occurin spam messages, and for words that are statistically likely to occurin legitimate messages, in order to determine the probability thatan e-mail is likely to be spam or a legitimate e-mail.

The Bayesian filter works on the principle that most events aredependent and that the probability of an event occurring in thefuture can be inferred from the occurrences of this event in thepast. This approach is used to identify spam. If some piece of textoccurred mostly in spam emails but not in legitimate mail, then itwould be reasonable to suppose that this email is probably spam.

9 ANALYSIS AND RESULT

The experimental results show that the FHT algorithm when com-bined with Bayesian algorithm filters out spam with high precision.Furthermore, the FHT algorithm is more efficient than other meth-ods when it filters E-mails with complex influencing factors andwhen this is paired with Bayesian techniques the spam are detectedmore easily and with more precision. The main feature is that theFHT based algorithm can filter E-mails based on influencing fac-tors instead of matched words to allow dynamic filtering of spamEmails. The Bayesian algorithm matches the keywords also so thismakes sure that even if a spam gets past the FHT algorithm it iscaught by the Bayesian algorithm and vice versa. The pairing upof FHT and Bayesian algorithms ensures that a spam does not goundetected.

12

International Journal of Pure and Applied Mathematics Special Issue

190

The experimental results show that the FHT algorithm whencombined with Bayesian algorithm filters out spam with high preci-sion. Furthermore, the FHT algorithm is more efficient than othermethods when it filters E-mails with complex influencing factorsand when this is paired with Bayesian techniques the spam are de-tected more easily and with more precision. The main feature isthat the FHT based algorithm can filter E-mails based on influenc-ing factors instead of matched words to allow dynamic filtering ofspam Emails. The Bayesian algorithm matches the keywords alsoso this makes sure that even if a spam gets past the FHT algorithmit is caught by the Bayesian algorithm and vice versa. The pairingup of FHT and Bayesian algorithms ensures that a spam does notgo undetected.

10 PERFORMANCE AND EVALUA-

TION

In order to compute the performance of the FHT based algorithmand associate it to Bayesian algorithm the following experimentalsetup is made.

• Experiment environment: Windows XP professional• Development tools: VS2005, Microsoft Access 2003.A. Results and evaluationThe result of the enhanced e-mail filter is based on the weights of

normal and abnormal factors. Varied ratios of the above normal andabnormal factors produced diverse outputs . In this experiment,the ratio is set to a static value and several groups of E-mails aretested appropriately. The outputs are verified for x1 spam andx2 emails. Based on the correct classifications of spam and emailit is found that the proposed enhanced FHT algorithm expressesa lower correct filtering percentage than the FHT and Bayesianalgorithm. The enhanced FHT algorithm also presents a higherfiltering percentage than the FHT algorithm, which indicates thatenhanced FHT algorithm is more efficient for filtering spam thanthe FHT algorithm in our experiment.

13

International Journal of Pure and Applied Mathematics Special Issue

191

11 CONCLUSION

The outcomes of the experimentations show this practice profes-sionally filters E-mail, along with enhancement of the filtering de-gree of precision. In this paper, the proposed FHT algorithm,which, filters e-mail by conditional factors instead of word matchingis enhanced. The enhance FHT is tested and proved to be apt forsemantic analysis and decision supporting field. The results of theexperiments, show that the proposed enhanced technique can moreproficiently filter E-mail, and improve the filtering degree of preci-sion. Experimentally, the enhance FHT is a more practical tech-nique to be used in anti-spam filters; particularly when informationrelated to existing environment factors is not precisely denoted bylexical matching algorithms.

References

[1] Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G.Paliouras, C. Spyrpolous, An evaluation of nave Bayesian anti-spam filtering, in: Proceeding of 11th Euro Conference onMatch Learn, 2000.

[2] Anderouysopoulos, J. Koutsias, K.V. Chandrianos, G.Paliouras, C. Spyrpolous, An experimental comparison of naveBayesian and keywordbased anti-spam filtering with personale-mail messages, in: Proceeding of the an International ACMSIGIR Conference on Res and Devel in Inform Retrieval, 2000.

[3] Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C.D.Spyropoulos, and P. Stamatopoulos. Learning to filter spam e-mail: A comparison of a naive bayesian and a memorybasedapproach. In H. Zaragoza, P. Gallinari, , and M. Rajman,editors, Proceedings of the Workshop on Machine Learningand Textual Information Access, 4th European Conference onPrinciples and Practice of Knowledge Discovery in Databases(PKDD 2000),pages 113, 2000.

[4] Anti-Spam filtering using neural networks and Bayesian classi-fiers. Yue Yang and Sherif Elfayoumy. Proceeding of the 2007

14

International Journal of Pure and Applied Mathematics Special Issue

192

IEEE international symposium on computational intelligencein robotics and automation.

[5] Asuncion, D. Newman, UCI Machine Learning Repository,2007. http://www.ics.uci.edu/mlearn/MLRRepository.html.

[6] Baku Azerbaijan ,SaadatNazirova Institute of InfotechTechnology of Azerbaijan National Academy of Science2011, published Online August 2011(http://www. SciRP.org/journal/cn)accepted May 15,2011. Communication andNetwork 2011,3,153-160.

[7] Bin Wang Gareth J. F. Jones Wenfeng Pan, ”Using On-line Linear Classifiers to Filter Spam Emails”, Springer-VerlagLondon Limited 2006, Published online: 3 October 2006.

[8] Enrico Blanzieri,AntonBryl, 10 July 2009,”A Survey OfLearning-Based

Techniques of Email Spam Filtering”,Springer,Published on-line: 10 July 2009 Springer Science+ Business Media B. V.2009.

[9] F. Fdez Riverola, E. Iglesias, F. Diaz, J.R. Mendez, J.M. Chor-chodo, Spam hunting: an instance-based reasoning system forspam labeling anf filtering, Decis. Support Syst. 43 (3) (2007)722e736.

[10] Gitanjali J., Banu S.N., Indumathi J., Uma G.V.(2008), APanglossian Solitary-Skim Sanitization for Privacy Preserv-ing Data Archaeology, International Journal of Electrical andPower Engineering. Vol. 2, No. 3, pp.154 -165.

[11] Gitanjali J., Md.Rukunuddin Ghalib., Murugesan K., Indu-mathi J., Manjula D. (2009) , An Object-Oriented ScaffoldPremeditated For Privacy Preserving Data

Mining of Outsourced Medical Data,International Journal ofSoftware Engineering and Its Applications , Accepted for pub-lication. In press. 2009.

[12] Gitanjali J., Md.Rukunuddin Ghalib., Murugesan K., Indu-mathi J., Manjula D. (2009) , A Hybrid Scheme Of Data Cam-ouflaging For Privacy Preserved

15

International Journal of Pure and Applied Mathematics Special Issue

193

Electronic Copyright Publishing Using Cryptography And Wa-termarking Technologies, International Journal of Security andIts Applications.

[13] Gitanjali J., Shaik Nusrath Banu*, Geetha Mary A*., Indu-mathi J., Uma

G.V.(2007), An Agent Based Burgeoning Framework for Pri-vacy Preserving Information Harvesting Systems, Interna-tional Journal of Computer Science and

Network Security, Vol.7, No.11, pp.268-276.

[14] H. Katirai(1999). Filtering junk e-mail: A performance com-parison between genetic programming and naive bayes.

[15] Harisinghaney, A., Dixit, A., Gupta, S., & Arora, A. (2014,February). Text and image based spam email classification us-ing KNN, Nave Bayes and Reverse DBSCAN algorithm. In Op-timization, Reliabilty, and Information Technology (ICROIT),2014 International Conference on (pp. 153-155).IEEE.

[16] Harisinghaney, A., Dixit, A., Gupta, S., & Arora, A. (2014,February). Text and image based spam email classification us-ing KNN, Nave Bayes and Reverse DBSCAN algorithm. In Op-timization, Reliabilty, and Information Technology (ICROIT),2014 International Conference on (pp. 153-155). IEEE.

[17] Ian H. Witten, Eibe Frank, Data Mining e Practical MachineLearning Tools and

Techniques, second ed., Elsevier, 2005.

[18] Indumathi J.(2012) , A Generic Scaffold Housing The In-novative Modus Operandi For Selection Of The SuperlativeAnonymisation Technique For

Optimized Privacy Preserving Data Mining,Chapter 6 of bookData Mining

Applications in Engineering and Medicine, Edited by AdemKarahoca InTech ; ISBN: 9535107200 9789535107200 ; 335pages ;pp.133-156

16

International Journal of Pure and Applied Mathematics Special Issue

194

[19] Indumathi J.(2013a) , Amelioration of Anonymity ModusOperandi for Privacy

Preserving Data Publishing , Chapter 7 of book Network Se-curity Technologies: Design and Applications. AbdelmalekAmine (Tahar Moulay

University, Algeria), Otmane Ait Mohamed (Concordia Uni-versity,USA) and Boualem Benatallah (University of NewSouth Wales, Australia).Release Date: November, 2013. Copy-right 2014. 330 pages; PP. 96-107

Indumathi J., (2013b), An Enhanced Secure Agent-OrientedBurgeoning Integrated Home Tele Health Care Framework forthe Silver Generation, Int. J. Advanced Networking and Ap-plications Volume: 04, Issue: 04, Pages: 16-21,

Special Issue on Computational Intelligence A Research Per-spective held on 21st -22nd Feburary, 2013

[20] Indumathi J., (2013c), State-of-the-Art in Reconstruction-Based Modus Operandi for Privacy Preserving Data Dredg-ing, Int. J. Advanced Networking and Applications Volume:04, Issue: 04, Pages: 9-15, Special Issue on Computational

Intelligence A Research Perspective held on 21st -22nd Febu-rary, 2013

[21] Indumathi J., Uma G.V.(2007a), Customized Privacy Preser-vation Using

Unknowns to Stymie Unearthing Of Association Rules, Journalof Computer Science, Vol. 3, No. 12, pp. 874-881.

[22] Indumathi J., Uma G.V.(2007b), Using Privacy PreservingTechniques to

Accomplish a Secure Accord, International Journal of Com-puter Science and Network Security, Vol.7, No.8, pp. 258-266.

[23] Indumathi J., Uma G.V.(2008a), A Bespoked Secure Frame-work for an Ontology-Based Data-Extraction System, Journalof Software Engineering, Vol. 2, No. 2. pp. 1-13.

[24] Indumathi J., Uma G.V.(2008b), A New flustering approachfor Privacy

17

International Journal of Pure and Applied Mathematics Special Issue

195

Preserving Data Fishing in Tele-Health Care Systems, Inter-national Journal of Healthcare Technology and Management.Special Issue on: ”Tele-Healthcare System Implementation,Challenges and Issues.” Vol.9 No.5-6, pp.495 516(22).

[25] Indumathi J., Uma G.V.(2008c), A Novel Framework for Op-timized Privacy Preserving Data Mining Using the innovativeDesultory Technique, International Journal of Computer Ap-plications in Technology ; Special Issue on: ”Computer Appli-cations in Knowledge-Based Systems”. In press. 2008. Vol.35Nos.2/3/4, pp.194 203.

[26] Indumathi J., Uma G.V.(2008d), An Aggrandized FrameworkFor Genetic Privacy

Preserving Pattern Analysis Using Cryptography And Contra-vening - Conscious Knowledge Management Systems, Interna-tional Journal of Molecular Medicine and Advance Sciences.Vol. 4, No. 1, pp.33-40.

[27] Int. J. Communications, Network and System Sciences, 2013,6, 88 -99 http://dx.doi.org/10.4236/ijcns.2013.62011 Engi-neering Department, Faculty of Engineering, Aswan Univer-sity, Aswan, Egypt.

[28] International Journal of Engineering Trends and Technology(IJETT) Volume 11 Number 6 - May 2014 ISSN: 2231-5381http://www.ijettjournal.org Page 315 A Review on DifferentSpam Detection Approaches Rekha 1 ,Sandeep Negi 2

[29] International Journal of Engineering Trends and Technol-ogy (IJETT) Volume 4 Issue 9- Sep 2013 ISSN: 2231-5381http://www.ijettjournal.org Page 4237 Content- Based SpamFiltering and Detection Algorithms- An Efficient Analysis &Comparison 1R.Malarvizhi, 2K.Saraswathi

[30] IOSR Journal of Computer Science (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN:2278-8727 PP 68-72 www.iosrjournals.org Inter-national

Conference on Advances in Engineering & Technology 2014(ICAET-2014) 68, Page Effective Spam Detection Method forEmail Savita Teli, Santoshkumar Biradar

18

International Journal of Pure and Applied Mathematics Special Issue

196

[31] K. Mock. An experimental framework for email categorizationand management. In 24th Annual ACM International Confer-ence on Research and Development in Information Retrieval,New Orleans, LA, September 2001.

[32] L. Zhang, J. Zhu, and T. Yao. An evaluation of statistical spamfiltering techniques.

ACMTransactions on Asian Language Information Processing(TALIP), 3(4):243 269, 2004.

[33] Lamping, John; Rao, Ramana; Pirolli, Peter (1995). A fo-cus+context technique based on hyperbolic geometry for visu-alizing large hierarchies. Proceedings of the ACM Conferenceon Human Factors in Computing Systems (CHI 1995). pp. 401408.

[34] M. Chang, C.K. Poon, Using phrase as features in email clas-sification, J. Syst. Softw. 82 (2009) 1036e1045.

[35] Mohammed, S., Mohammed, O., Fiaidhi, J., Fong, S. J., &Kim, T. H. (2013). Classifying Unsolicited Bulk Email (UBE)using Python Machine Learning Techniques.

[36] Mohammed, S., Mohammed, O., Fiaidhi, J., Fong,S. J., &Kim, T. H. (2013). Classifying Unsolicited Bulk Email (UBE)using Python Machine Learning Techniques.

[37] Murugesan K., Gitanjali J., Indumathi J., ManjulaD.(2009),Sprouting Modus Operandi for Selection of the BestPPDM Technique for Health Care Domain,

International Journal Conference in recent trends in computerscience. Vol.1, No.1, pp. 627-629.

[38] Murugesan K., Indumathi J., Manjula D. (2010a), An Opti-mised Intellectual Agent Based Secure Decision System ForHealth Care, International Journal of Engineering Science andTechnology Vol. 2(8), 2010, 3662-3675

[39] Murugesan K., Indumathi J., Manjula D. (2010b), A Frame-work for an Ontology-Based Data-Gleaning and Agent Based

19

International Journal of Pure and Applied Mathematics Special Issue

197

Intelligent Decision Support PPDM System Employing Gen-eralization Technique for Health Care, International Journalon Computer Science and Engineering Vol. 02, No. 05, 2010,1588-1596

[40] Prakash D., Murugesan K., Indumathi J., Manjula D. (2009),A Novel Cardiac Attack Prediction and Classification using Su-pervised Agent Techniques, In the CiiT International Journalof Artificial Intelligent Systems and Machine Learning, May2009. Vol.1, No.2, P.59.

[41] Satheesh Kumar K.,Indumathi J., Uma G.V.(2008), Designof Smoke Screening Techniques for Data Surreptitiousness inPrivacy Preserving Data Snooping Using Object Oriented Ap-proach and UML,IJCSNS International Journal of Computer

Science and Network Security, Vol.8 No.4, pp.106 - 115.

[42] Scholar, M. (2010). Supervised learning approach for spamclassification analysis using data mining tools. organization,2(8), 2760-2766.

[43] Scholar, M. (2010). Supervised learning approach for spamclassification analysis using data mining tools. organization,2(8), 2760-2766.

[44] Surf-Controls Anti-Spam Prevalence Study2002,URL:http://www.surfcontrol.com/resources/Anti-Spam Study v2.pdf.

[45] T.S. Guzella, T.M. Caminhas, A review of machine learningapproaches to spam filtering, Expert Syst. Appl. 36 (2009)10206e10222.

[46] Vasudevan V., Sivaraman N*., SenthilKumar S*., MuthurajR*., Indumathi J., Uma. G.V.(2007), A Comparative Study ofSPKI/SDSI and K-SPKI/SDSI

Systems, Information Technology Journal 6(8); pp.1208-1216.

[47] VenkateshRamanathan and Harry Wechsler,”Phishing Detec-tion Methodology Using Probabilistic Latent Semantic Anal-ysis”,AdaBoost, and co-training EURASIP Journal on Infor-mation Security,2012:1,phishGILLNET, 2012.

20

International Journal of Pure and Applied Mathematics Special Issue

198

[48] Wikipedia.org. 2012. Fingerprint (computing).http://en.wikipedia.org/wiki/Fingerprint (computing).Andrew G. West, Avantika Agrawal, etc. 2011. Autonomouslink spam detection in purely collaborative environments. InWikiSym 2011: The 7th International Symposium on Wikisand Open Collaboration. Network Security, pp. 1517.

[49] Wu, Behavior-based spam detection using a hybrid methodof rule-based techniques and neural networks, Expert Syst.(2009).

[50] http://www.detritus.org/spam/skit.html

[51] http://www.digitalethics.org/essays/end-anonymous-emails

[52] http://www3.cs.stonybrook.edu/ ay-chakrabort/courses/cse508/

[53] https://afterglowprod.com/security/4-important-reasons-to-use-anti-spam-filtering-in-your-business/

[54] https://iapp.org/media/pdf/resource center/ePR 2018-04-13.pdf

[55] https://www.drh.net/blog/2013/09/spam-complaints-cause-effect-cure/

[56] https://www.matchilling.com/comparison-of-machine-learning-methods-in-email- spam-detection/

[57] https://www.semanticscholar.org/paper/Filtering-Spam-by-Using-Factors- Hyperbolic-Trees-Hou-Chen/200d1ea917da5e883504e064b443d5f33682c42e/figure/2

[58] https://www.semanticscholar.org/paper/Filtering-Spam-by-Using-Factors- Hyperbolic-Tree-Hou-Chen/460b9927afd2846d7a566821d2d0cf5fd1f41bea/figure/1

[59] US patent 5590250, Lamping; John O. & Rao; Ramana B.,”Layout of node-link structures in space with negative curva-ture”, assigned to Xerox Corporation

21

International Journal of Pure and Applied Mathematics Special Issue

199

200