Privacy-Aware Collaborative Spam Filtering document

of 79/79
  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Privacy-Aware Collaborative Spam Filtering document

1. 2. 3. 4. 5. 6.








10. 11. 12.


Abstract:A Filter is the core of the decision making in Privacy-Aware Collaborative Spam Filtering document. A filter decides on a per-mail basis whether the message should be downloaded or not. A pipeline of filters is setup (yes, again setup in the configuration) and a message, which needs to be downloaded, is passed through this pipeline. At any point of the pipeline, a filter could indicate that the message should not be processed through the pipeline anymore. For example a filter (sender based) could find a match from the list of spammers it has and reject the message. There are two kinds of filters -- global and local. These are not an attribute of a filter itself, but rather depend on the usage of a filter. Local filters are associated to a maildrop whereas global filters are applicable to all maildrops. For example, you might want a Message-ID filter to be applicable to all maildrops whereas keep a sender-based filter only for the maildrop where you expect mail from that sender. A filter has the single job of deciding whether or not to download a single message. The actual decision of whether to download a mail or not is made through a sequence of filters. There can be a global set of filters as well as per maildrop one. A maildrop represents your mailbox from which you want to download your mail. In this project we define totally six filters. But we can define more than that number of filters as our requirements. In this project our focus is on main and basic filters like HeaderMailFilter, MessageIDMailFilter, NullFilter, ReceipientMailFilter, SenderMailFilter, SizeMailFilter and SubjectMailFilter.








document is proposed to be developed with WINDOWS-2000 Server as the operating system and Java Mail API of J2EE Technologies. This package will have provision for creating your own filters and using those generated filters in appropriate places.

System Analysis:Existing System: The existing system is not computerized. All the mails were being done manually. To make this laborious job simple it is to be computerized. The administrator maintains all the mailboxes of employees of our organization. He is the responsible to organize those mailboxes. But incase of deleting unwanted mails he deletes them manually by marking after checking which mails are unwanted based on some facts like large sized mail, userID and so on. Proposed System: The first step of analysis process involves the identification of need. The success of a system depends largely on how accurately a problem is defined, thoroughly investigated and properly carried out through the choice of solution. This package has been developed in order to overcome the difficulties encountered while using the manual system. Faster and timely deletion of mails which are unwanted is another motivating factor for the development of this package. Project Scope and Objectives:

Privacy-Aware Collaborative Spam Filtering document is a tool to deleteunwanted mails. Lot of effort was put to make it user friendly.

Optimum utilization of tool is possible. All basic filters are provided.

Reduces the user interaction work. The wastage of time is reduced. It also helps in optimum distribution of funds by the management among user groups for procurement of new equipment.

More flexible it means user (administrator)can add his own number offilters if he interested easily.

Company Profile:Global Interactive Solutions has emerged to be a world-class solutions and products organization with clientele spread across geographies. It has timeand-again taken up challenges for accomplishing the mission of customer satisfaction armored with a focused vision and technical expertise. Our growth and success has evolved from our ability to foresee customer challenges and address them with apt solutions. Our teams, comprising of research innovators, architects and developers have constantly worked on developing products, solutions and mission-critical applications. We started with Visual SHIFT, our initial product that addressed the Y2K problem. It received global acclamation and was awarded "Product of the Year" by Datamation under Y2K product category. Gartner Group, the research and consulting organization, rated Visual SHIFT as "Best in Class". It also won the accolades of being the "Best Product" from HYSEA (Hyderabad Software Exporters Association). Global Interactive Solutions Technologies is a global software development firm specializing in software testing and product development services catering to technology companies across diverse industry segments.

Our flexible delivery model helps us offer focused IT solutions, which help our clients respond quickly to their business opportunities. Our clients engage with us to enable them stay ahead in the technology adoption curve and to develop and protect their Intellectual Property Assets. Our Technology Excellence Groups embrace new technologies as they emerge to provide clients with solutions that give them a competitive edge in their businesses. Leveraging our strengths in Research & Development and expertise in Component Based Application Development, we have been successfully providing our global clientele with software testing and product development services. With such technology foresight and sophisticated product development and testing expertise, we credit our success to commitment, performance, delivery and customer delight. Our world-class practices and methodologies make us the preferred technology partner for many technology companies. Our growth comes from the unique business model and integration of people, processes and technology. We have continually demonstrated our commitment to develop Cost-effective, quality products and custom-applications built on strict timelines by adopting industry standard processes. People, experience and skill sets are the ultimate competitive differentiators when it comes to finalizing a Strategic Offshore Outsourcing deal. Global Interactive Solutions is an IT services company that adapts solutions to the market requirements. Its people are well qualified and experienced in the technology platforms they work. Personnel are trained and retrained, that make them as masters in the chosen area. Consolidating delivering Integration. our capabilities in diverse and technologies, and our solid

foundation in product and application development, we built expertise in end-to-end solutions providing Enterprise Application

Our technical expertise coupled with functional know-how equipped us to collaborate with global organizations to deliver enterprise-wide solutions for business verticals such as Insurance, Retail and Distribution, Consumer Electronics, Healthcare and Utilities. Our clientele comprise of organizations of varied sizes - from small and medium companies to Fortune 100 corporations. We act as strategic technology partners for global conglomerates and also provide R&D outsourcing services to international technology labs.

Requirements Specification DocumentThe Privacy-Aware Collaborative Spam Filtering document is developed with the aim of automatically deleting the unwanted mails based on our definitions from the specified maildrops. The Privacy-Aware Collaborative Spam Filtering document takes all the necessary definitions, in which we define some facts based on those mails are deleted automatically. The administrator can define those facts to delete the unwanted mails.

1. Introduction1.1 Purpose: The purpose of this document is to describe all external requirements for the Privacy-Aware Collaborative Spam Filtering document. It also describes the interfaces for the system. It is a. To implement Privacy-Aware Collaborative Spam Filtering document we need a mail Server, which is capable of storing mail in corresponding mailboxes. In our project we implement or tested our filters on James server as it is available open. As a user interface we used Microsoft Outlook Express. Because it user-friendly and easy to access, read and maintain our mails.



To send mails we need a protocol capable to send or deliver mails. And for receiving mails we need another protocol to get those mails from our mailboxes. In our project we used SMTP for sending the mails and POP3 for receiving the mails. These both are available in single mail server i.e. our James mail server we used.

d. Using XML language and basic java we can write script or code for filters. Because XML provides application interoperability. 1.2

Scope: This document describes the requirements of thesystem. It is meant for use by the developers, and will also be the basis for validating the final system. Any changes made to the requirements in the future will have to go through a formal change approval process. The developer is responsible for asking for clarifications. When necessary and will not make any alterations without the permission of client. This project work intends to delete the not required mails from the mailboxes of organization personnel. In this lot of effort was put to make it perfect. Work Load to delete mails was avoided. The time for processing and deleting mails is considerably reduced. It helps a lot to the administrator by saving his valuable time. Thus he can allot that for other important activities. It provides more extendibility. Besides the existing filters Administrator can add his own filters if needed in future easily. We can apply these filters on any other Mail Severs to drop unwanted mails from specified maildrops. Administrator has two options to delete the mails: one is to run the filters manually whenever he wanted. Other one is he can set those filters to run automatically on schedule base.


Definition: A Filter is the core of the decision making inPrivacy-Aware Collaborative Spam Filtering document. A filter decides on a per-mail basis whether the message should be downloaded or not.


Reference: Not Applicable.


Developers Responsibilities overview:

The points that mentioned in system requirements specification are 1. An introductory nature describing mainly the Purpose of the system requirements specifications document. Outlining the scope of the envisaged application.

2. Describes the iterations of the system with its environment without going into the internals of the system. Also describes the constraints imposed on the system. Thus it is out side the envisaged application. The assumptions made are also listed. It is supported by the UML Diagrams 3. It also describes the internal behaviour of the system in response to the inputs and while generating the outputs. This document is also supported with detailed level UML diagrams, list of inputs, process explanation and list of output. 4. Contains external interface requirements, which includes the user, hardware and software interfaces. 5. Deals with performance requirements of the system. Contains the design constraints composing of software constraints and hardware constraints. 1.6 Product functions overview: In the Organization every employee has a mailbox. To this mailbox any one can send any number of mails for that mailbox owner. Some times we are suffering from spam mails, lengthy mails which may occupy all the memory allotted for our mail box and so on. These kind of mails are controlled by our company administrator as he is the responsible to manage all these mailboxes. He can set some constraints on those mailboxes like drop these kinds of mails if any. Here those constraints are nothing but our filters. By embedding these filters in companys mail server he can restrict the mails. There no need to delete the

mails manually after marking the delete mails. In this project administrator has to run those filters on specified mailboxes manually when ever he wants. There is one more option that is he can set those filters to run periodically without taking the permission from administrator. When ever you run these filters they simply apply the logic we have written it already in a java file on every mail in a all mailboxes or specified mailboxes. Based on this logic decides whether to down load the mail or not. This functionality automates the function of deleting the mails. 1.7


In our project user is an administrator. He must have the knowledge of how to implement or embed these filters on MailServer.



General constraints: The system should run on Pentium,under windowsNT/2000 professional or server or forward versions of Microsoft operating systems with minimum 16 MB RAM for better performance. Actually these filters can apply on any kind of Mail servers.


Assumptions and Dependencies:It is assumed that the James is real Mail Server resource and required information already existed with the system. b. It is assumed that mail client is Micro Soft Outlook Express or Netscape Communicator. c. All the details produced by the user are correct. d. User will ask for new filters when he wants to filter mails more deeply or any situation, to filter like this come. a.

2. Function RequirementsFunctional requirements specify which outputs should be produced from the given inputs. They describe the relationship between the input and output of the system, for each functional requirement a detailed description of all data inputs and their source and the range of valid inputs must be specified.

All the operations to be performed on the input data to obtain the output should be specified. 2.1


Null Filter: It deletes all kind of mails irrespective ofcharacteristics of mails. This filter consumes messages. It also marks them for deletion all


Header Mail Filter: Matches a header in themessage. This requires the name of the header and value of the header.


MessageID Mail Filter: Filters messages if theycontain a duplicate Message-id. This Filter stores the list of downloaded message-ids in the specified file


Recipient Mail Filter: This filter matches therecipients of the message against those provided in a list.


Sender Mail Filter: This filter matches the sender ofthe message against those provided in a list


Size Mail Filter: This filters messages based on theirsize



Log Files:

It writes the log files according to the operations server handled. It writes also error message if any failure occurred to indicate fault where happened. It represents all this information in the form codes assigned for each and every operation.

3. External Interface Requirements3.1

User Interface: After the filters are embedded in Mailserver and making all of them working properly no need of user interaction in case of administrator set those filters to run periodically. Otherwise it is the responsibility of the administrator to run them when he required. Totally the user interaction is very low.



these interface requirements should specify the inter face with other. Software which the system will use or which will use the system, this includes the interface with the operating system and other applications. The message content and format of each interface should be given.




Hardware interface is very important to the documentation. If the software is execute on existing hardware or on the pre-determined hardware, all the characteristics of the hardware, including memory restrictions, should be specified. In addition, the current use and load characteristics of the hardware should be given.


4. Performance RequirementsAll the requirements relating to the performance characteristics of the system must be clearly specified. There are two types of performance requirements static and dynamic. Static Requirements are those that do not impose constraint on the execution characteristics of the system. These include requirements like the number of terminals to be supported, and number simultaneous users to be supported, number of files, and their sizes that the system has to process. These are also called capacity of the system. Dynamic requirements specify constraints

on execution behavior of the system. These typically include response time and throughput constraints on the system.

The processing speed, respective resource consumption throughput and efficiency measure performance. For achieving good performance Few requirements like reducing code, less use of controls, minimum involvement of repeated data etc., are to be followed. Each real-time system, software what provides required function but does not conform to performance of software requirements is acceptable. These requirements are used to test run time performance of software with the context of an integrated system.

5. Design constraints5.1

Software constraints

: : Windows2000 Server/NT or any Mail server : Log files

Operating System Reports Other Applications5.2

: James Server

Hardware Constraints

: : Pentium IV 2.0 GHZ : 256 MB : 40 GB : 1.44 MB : 52 Bit : VGA :101 Standards

Pentium Processor RAM Hard Disk Floppy Disk CD/ROM Drive VDU Key Board

6. Acceptance Criteria Before accepting the system, the developer must demonstrate that the system works on the details of the user email-ids entered in the corresponding files. The developer will have to show through test cases that all conditions are satisfied.

The Java Apache Mail Enterprise Server (a.k.a. Apache James) is a 100% pure Java SMTP and POP3 Mail server and NNTP News server designed to be a complete and portable enterprise mail engine solution. James is based on currently available open protocols. The James server also serves as a mail application platform. The James project hosts the Apache Mailet API, and the James server is a Mailet container. This feature makes it easy to design, write, and deploy custom applications for mail processing. This modularity and ease of customization is one of James' strengths, and can allow administrators to produce powerful applications surprisingly easily. James is built on top of version 4.1.3 of the Avalon Application Framework. This framework encourages a set of good development practices such as Component Oriented Programming and Inversion of Control. The standard distribution of James includes version 4.0.1 of the Phoenix Avalon Framework container. This stable and robust container provides a strong foundation for the James server.

This documentation is intended to be an introduction to the concepts behind the James implementation, as well as a guide to installing, configuring, (and for developers) building the James server.

The James ServerJames is an open source project intended to produce a robust, flexible, and powerful enterprise class server that provides email and email-related services. It is also designed to be highly customizable, allowing administrators to configure James to process email in a nearly endless variety of fashions. The James server is built on top of the Avalon Framework. The standard James distribution deploys inside the Phoenix Avalon Framework container. In addition to providing a robust server architecture for James, the use of Phoenix allows James administrators to deploy their own applications inside the container. These applications can then be accessed during mail processing. The James server is implemented as a complete collection of servers and related components that, taken together, provide an email solution. These components are described below.

POP3 ServiceThe POP3 protocol allows users to retrieve email messages. It is the method most commonly used by email clients to download and manage email messages. The James version of the POP3 service is a simple and straightforward implementation that provides full compliance with the specification and maximum compatibility with common POP3 clients. In addition, James can be configured to require SSL/TLS connections for POP3 client connecting to the server.

SMTP ServiceSMTP (Simple Mail Transport Protocol) is the standard method of sending and delivering email on the internet. James provides a full-function implementation of the SMTP specification, with

support for some optional features such as message size limits, SMTP auth, and encrypted client/server communication.

NNTP Service NNTP is used by clients to store messages on and retrieve messages from news servers. James provides the server side of this interaction by implementing the NNTP specification as well as an appropriate repository for storing news messages. The server implementation is simple and straightforward, but supports some additional features such as NNTP authentication and encrypted client/server communication. Fetch POPFetch POP, unlike the other James components, is not an implementation of an RFC. Instead, it's a component that allows the administrator to configure James to retrieve email from a number of POP3 servers and deliver them to the local spool. This is useful for consolidating mail delivered to a number of accounts on different machines to a single account.

The Spool Manager, Matchers, and MailetsJames separates the services that deliver mail to James (i.e. SMTP, Fetch POP) from the engine that processes mail after it is received by James. The Spool Manager component is James' mail processing engine. James' Spool Manager component is a Mailet container. It is these mailets and matchers that actually carry out mail processing.

RepositoriesJames uses a number of different repositories to both store message data (email, news messages) and user information. User repositories store user information, including user names, authentication information, and aliases. Mail repositories store messages that have been delivered locally. Spool repositories store messages that are still being processed. Finally, news repositories are used to store news messages. Aside from what type of data they store, repositories are distinguished by where they store data. There are three types of storage - File, Database, and DBFile.

Remote ManagerJames provides a simple telnet-based interface for control. Through this interface you can add and delete users, configure per-user aliases and forward addresses, and shut down the server.

Maillet API: The Mailet API is a simple API used to build mail processing applications. James is a Mailet container, allowing administrators to deploy Mailers (both custom and pre-made) to carry out a variety of complex mail processing tasks. In the default configuration James uses Mailers to carry out a number of tasks that are carried out deep in the source code of other mail servers (i.e. list processing, remote and local delivery). As it stands today, the Mailet API defines interfaces for both Matchers and Mailets.Matchers, as their name would suggest, match mail messages against certain conditions. They return some subset (possibly the entire set) of the original recipients of the message if there is a match. An inherent part of the Matcher contract is that a Matcher should not induce any changes in a message under evaluation.

Mailets are responsible for actually processing the message. They may alter the message in any fashion, or pass the message to an external API or component. This can include delivering a message to its destination repository or SMTP server. The Mailet API is currently in its second revision. Although, the Mailet API is expected to undergo substantial changes in the near future, it is our aim that existing Mailets that abided purely by the prior Mailet API interfaces will continue to run with the revised specification.James bundles a number of Matchers and Mailets in its distribution.


The objective of Simple Mail Transfer Protocol (SMTP) is to transfer mail reliably and efficiently. SMTP is independent of the particular transmission subsystem and requires only a reliable ordered data stream channel. An important feature of SMTP is its capability to relay mail across transport service environments. A transport service provides an interposes communication environment (IPCE). An IPCE may cover one network, several networks, or a subset of a network. It is important to realize that transport systems (or IPCEs) are not one-toone with networks. A process can communicate directly with another process through any mutually known IPCE. Mail is an application or use of interposes communication. Mail can be communicated between processes in different IPCEs by relaying through a process connected to two (or more) IPCEs. More specifically, mail can be relayed between hosts on different transport systems by a host on both transport systems.

2. THE SMTP MODEL The SMTP design is based on the following model of communication: as the result of a user mail request, the sender-SMTP establishes a two-way transmission channel to a receiver-SMTP. The receiver-SMTP may be either the ultimate destination or an intermediate. SMTP commands are generated by the sender-SMTP and

sent to the receiver-SMTP. SMTP replies are sent from the receiverSMTP to the sender-SMTP in response to the commands.

Once the transmission channel is established, the SMTP-sender sends a MAIL command indicating the sender of the mail. If the SMTPreceiver can accept mail it responds with an OK reply. The SMTPsender then sends a RCPT command identifying a recipient of the mail. If the SMTP-receiver can accept mail for that recipient it responds with an OK reply; if not, it responds with a reply rejecting that recipient (but not the

Whole mail transaction). The SMTP-sender and SMTPreceiver may negotiate several recipients. When the recipients have been negotiated the SMTP-sender sends the mail data, terminating with a special sequence. If the SMTP-receiver successfully processes the mail data it responds with an OK reply. The dialog is purposely lock-step, one-at-a-time. +----------+ +----------+ +------+ | | | | | User || | SMTP | | +------+ | Sender- |Commands/Replies| Receiver-| +------+ | SMTP || SMTP | +------+ | File || | and Mail | || File | |System| | | | | |System| +------+ +----------+ +---------+ +------+ Sender-SMTP Receiver-SMTP Model for SMTP Use Figure 1 -------------------------------------------------------

The SMTP provides mechanisms for the transmission of mail; directly from the sending user's host to the receiving user's host when the August 1982 Simple Mail Transfer Protocol two host are connected to the same transport service, or via one or more relay SMTP-servers when the source and destination hosts are not connected to the same transport service. To be able to provide the relay capability the SMTP-server must be supplied with the name of the ultimate destination host as well as the destination mailbox name. The argument to the MAIL command is a reverse-path, which specifies who the mail is from. The argument to the RCPT command is a forward-path, which specifies who the mail is to. The forward-path is a source route, while the reverse-path is a return route (which may be Used to return a message to the sender when an error occurs with a relayed message). When the same message is sent to multiple recipients the SMTP encourages the transmission of only one copy of the data for all the recipients at the same destination host. The mail commands and replies have a rigid syntax. Replies also have a numeric code. Commands and replies are not case sensitive. That is, a command or reply word may be upper case, lower case, or any mixture of upper and lower case. Note that this is not true of mailbox user names. For some hosts the user name is case sensitive, and SMTP implementations must take case to preserve the case of user names as they appear in mailbox arguments. Host names are not case sensitive. Commands and replies are composed of characters from the ASCII character set [1]. When the transport service provides an 8-bit byte (octet) transmission channel, each 7-bit character is transmitted right justified in an octet with the high order bit cleared to zero. When specifying the general form of a command or reply, an argument or special symbol will be denoted by a meta-linguistic variable (or constant), for example,"" or "". Here the angle brackets indicate these are meta-linguistic variables. However, some arguments use the angle brackets terally. For example, an actual reverse-path is enclosed in angle brackets, i.e.,"" is an instance of (the angle brackets are actually transmitted in the command or reply).

3 THE SMTP PROCEDURES This section presents the procedures used in SMTP in several parts. First comes the basic mail procedure defined as a mail transaction. Following this are descriptions of forwarding mail, verifying mailbox names and expanding mailing lists, sending to terminals instead of or in combination with mailboxes, and the opening and closing exchanges. At the end of this section are comments on relaying, a note on mail domains, and a discussion of changing roles. 3.1 MAIL There are three steps to SMTP mail transactions. The transaction is started with a MAIL command which gives the sender identification. A series of one or more RCPT commands follows giving the receiver information. Then a DATA command gives the mail data. And finally, the end of mail data indicator confirms the transaction. The first step in the procedure is the MAIL command. The contains the source mailbox. MAIL FROM :< reverse-path> This command tells the SMTP-receiver that a new mail transaction is starting and to reset all its state tables and buffers, including any recipients or mail data. It gives the reverse-path which can be used to report errors. If accepted, the receiverSMTP returns a 250 OK reply. The can contain more than just a mailbox. The is a reverse source outing list of hosts and source mailbox. The first host in the should be the host sending this command. The second step in the procedure is the RCPT command. RCPT TO :< forward-path> This command gives a forward-path identifying one recipient. If accepted, the receiver-SMTP returns a 250 OK reply, and stores the forward-path. If the recipient is unknown the receiver-SMTP returns a 550 Failure reply. This second step of the procedure can be repeated any number of times.

The can contain more than just a mailbox. The is a source routing list of hosts and the

destination mailbox. The first host in the should be the host receiving this command.command. The third step in the procedure is the DATA DATA

If accepted, the receiver-SMTP returns a 354 Intermediate reply and considers all succeeding lines to be the message text. When the end of text is received and stored the SMTP-receiver sends a 250 OK reply. Since the mail data is sent on the transmission channel the end of the mail data must be indicated so that the command and reply dialog can be resumed. SMTP indicates the end of the mail data by sending a line containing only a period. A transparency procedure is used to prevent this from interfering with the user's text. Please note that the mail data includes the memo header items such as Date, Subject, To, Cc, from [2].The end of mail data indicator also confirms the mail transaction and tells the receiverSMTP to now process the stored recipients and mail data. If accepted, the receiver-SMTP returns a 250 OK reply. The DATA command should fail only if the mail transaction was incomplete (for example, no recipients), or if resources are not available. The above procedure is an example of a mail transaction. These commands must be used only in the order discussed above. Example 1 (below) illustrates the use of these commands in a mail transaction. Example of the SMTP Procedure This SMTP example shows mail sent by Smith at host Alpha.ARPA, to Jones, Green, and Brown at host Beta.ARPA. Here we assume that host Alpha contacts host Beta directly. S: MAIL FROM :< [email protected]> R: 250 OK S: RCPT TO :< [email protected]> R: 250 OK

S: RCPT TO :< [email protected]> R: 550 No such user here

S: RCPT TO :< [email protected]> R: 250 OK S: DATA R: 354 Start mail input; end with . S: Blah blah blah... S: ...etc. etc. etc. S: . R: 250 OK The mail has now been accepted for Jones and Brown. Green did not have a mailbox at host Beta in Example 1. 3.2. FORWARDING There are some cases where the destination information in the is incorrect, but the receiver-SMTP knows the correct destination. In such cases, one of the following replies should be used to allow the sender to contact the correct destination. 251 -User not local; will forward to This reply indicates that the receiver-SMTP knows the user's mailbox is on another host and indicates the correct forward-path to use in the future. Note that either the host or user or both may be different. The receiver takes responsibility for delivering the message. 551 User not local; please try This reply indicates that the receiver-SMTP knows the user's mailbox is on another host and indicates the correct forward-path to use. Note that either the host or user or both may be different. The receiver refuses to accept mail for this user, and the sender must either redirect the mail according to the information provided or return an error response to the originating user.

3.3 VERIFYING AND EXPANDING SMTP provides as additional features, commands to verify a user name or expand a mailing list. This is done with the VRFY and EXPN commands, which have character string arguments. For the VRFY command, the string is a user name, and the response may include the full name of the user and must include the mailbox of the user. For the EXPN command, the string identifies a mailing list, and the multilane response may include the full name of the users and must give the mailboxes on the mailing list. "User name" is a fuzzy term and used purposely. If a host implements the VRFY or EXPN commands then at least local mailboxes must be recognized as "user names". If a host chooses to recognize other strings as "user names" that is allowed. In some hosts the distinction between a mailing list and an alias for a single mailbox is a bit fuzzy, since a common data structure may hold both types of entries, and it is possible to have mailing lists of one mailbox. If a request is made to verify a mailing list a positive response can be given if on receipt of a message so addressed it will be delivered to everyone on the list, otherwise an error should be reported (e.g., "550 That is a mailing list, not a user"). If a request is made to expand a user name returning a list containing one name can form a positive response, or an error can be reported (e.g., "550 That is a user name, not a mailing list"). In the case of a multiline reply (normal for EXPN) exactly one mailbox is to be specified on each line of the reply. In the case of an ambiguous request, for example, "VRFY Smith", where there are two Smith's the response must be "553 User ambiguous". The case of verifying a user name is straightforward as shown in examp 3. The character string arguments of the VRFY and EXPN commands cannot be further restricted due to the variety of implementations of the user name and mailbox list concepts. On some systems it may be appropriate for the argument of the EXPN command to be a file name for a file containing a mailing list, but again there is a variety of file naming conventions in the Internet. The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

3.4. SENDING AND MAILING The main purpose of SMTP is to deliver messages to user's mailboxes. A very similar service provided by some hosts is to deliver messages to user's terminals (provided the user is active on the host). The delivery to the user's mailbox is called "mailing", the delivery to the user's terminal is called "sending". Because in many hosts the implementation of sending is nearly identical to the implementation of mailing these two functions are combined in SMTP. However the sending commands are not included in the required minimum implementation (Section 4.5.1). Users should have the ability to control the writing of messages on their terminals. Most hosts permit the users to accept or refuse such messages. The following three commands are defined to support the sending options. These are used in the mail transaction instead of the MAIL command and inform the receiver-SMTP of the special semantics of this transaction: SEND FROM: The SEND command requires that the mail data be delivered to the user's terminal. If the user is not active (or not accepting terminal messages) on the host a 450 reply may returned to a RCPT command. The mail transaction is successful if the message is delivered the terminal. SOML FROM: The Send or Mail command requires that the mail data be delivered to the user's terminal if the user is active (and accepting terminal messages) on the host. If the user is not active (or not accepting terminal messages) then the mail data is entered into the user's mailbox. The mail transaction is successful if the message is delivered either to the terminal or the mailbox. SAML FROM: The Send and Mail command requires that the mail data be delivered to the user's terminal if the user is active (and accepting terminal messages) on the host. In any case the mail data is entered into the user's mailbox. The mail transaction is successful if the message is delivered the mailbox. The same reply codes that are used for the MAIL commands are used for these commands.

INTRODUCTION: The JavaMail API provides a set of abstract classes defining objects that comprise a system. The API defines classes like Message, Store and Transport. The API can be extended and can be subclassed to provide new protocols and to add functionality when necessary. In addition, the API provides concrete subclasses of the abstract classes. These subclasses, including MimeMessage and MimeBodyPart, implement widely used Internet mail protocols and conform to specifications RFC822 and RFC2045. They are ready to be used in application development. GOALS AND DESIGN PRINCIPLES: The JavaMail API is designed to make adding electronic mail capability to simple applications easy, while also supporting the creation of sophisticated user interfaces.It includes appropriate convenience classes which encapsulate common mail functions and protocols. It fits with other packages for the Java platform in order to facilitate its use with other Java APIs, and it uses familiar programming models. The JavaMail API is therefore designed to satisfy the following development and runtime requirements: Simple, straightforward class design is easy for a developer to learn and implement. Use of familiar concepts and programming models support code development that interfaces well with other Java APIs. Uses familiar exception-handling and JDK 1.1 event-handling programming models. Uses features from the JavaBeans Activation Framework (JAF) to handle access to data based on data-type and to facilitate the addition of data types and commands on those data types. The JavaMail API provides convenience functions to simplify these coding tasks. Lightweight classes and interfaces make it easy to add basic mail-handling tasks to any application. Supports the development of robust mail-enabled applications, that can handle a variety of complex mail message formats, data types, and access and transport protocols. The JavaMail API draws heavily from IMAP, MAPI, CMC, c-client and other messaging system APIs: many of the concepts present in these other systems are also present in the JavaMail API. It is simpler to use because it uses features of the Java programming language not available to these other APIs, and because it uses the Java programming languages object model to shelter applications from implementation complexity.

The JavaMail API supports many different messaging system implementationsdifferent message stores, different message formats, and different message transports.The JavaMail API provides a set of base classes and interfaces that define the API for client applications. Many simple applications will only need to interact with the messaging system through these base classes and interfaces. JavaMail subclasses can expose additional messaging system features. For instance,the MimeMessage subclass exposes and implements common characteristics of an Internet mail message, as defined by RFC822 and MIME standards. Developers cansubclass JavaMail classes to provide the implementations of particular messaging systems, such as IMAP4, POP3, and SMTP. ARCHITECTURE The JavaMail architectural components are layered as shown below: 1. The Abstract Layer declares classes, interfaces and abstract methods intended to support mail handling functions that all mail systems support. API elements comprising the Abstract Layer are intended to be subclassed and extended as necessary in order to support standard data types, and to interface with message access and message transport protocols as necessary. 2. The internet implementation layer implements part of the abstract layer using internet standards - RFC822 and MIME. 3 JavaMail uses the JavaBeans Activation Framework (JAF) in order to encapsulate message data, and to handle commands intended to interact with that data. Interaction with message data should take place via JAF-aware JavaBeans, which are not provided by the JavaMail API. JavaMail clients use the JavaMail API and Service Providers implement the JavaMail API. The layered design architecture allows clients to use the same JavaMail API calls to send, receive and store a variety of messages using different data-types from different message stores and using different message transport protocols.

JAVA MAIL CLASS HIERARCHY: The figure below shows major classes and interfaces comprising the JavaMail API.


The JavaMail API is intended to perform the following functions, which comprise the standard mail handling process for a typical client application:

MAJOR JAVA MAIL API COMPONENTS: This section reviews major components comprising the JavaMail architecture.The Message Class The Message class is an abstract class that defines a set of attributes and a content for a mail message. Attributes of the Message class specify addressing information and define the structure of the content, including the content type. The content is represented as a DataHandler object that wraps around the actual data. The Message class implements the Part interface. The Part interface defines attributes that are required to define and format data content carried by a Message object, and to interface successfully to a mail system. The Message class adds From, To, Subject, Reply-To, and other attributes necessary for message routing via a message transport system. When contained in a folder, a Message object has a set of flags associated with it. JavaMail provides Message subclasses that support specific messaging implementations. Message Storage and Retrieval Messages are stored in Folder objects. A Folder object can contain subfolders as well as messages, thus providing a tree-like folder hierarchy. The Folder class declares methods that fetch, append, copy and delete messages. A Folder object can also send events to components registered as event listeners.

Message Composition and Transport A client creates a new message by instantiating an appropriate Message subclass. It sets attributes like the recipient addresses and the subject, and inserts the content into the Message object. Finally, it sends the Message by invoking the Transport.send method. The Session Class The Session class defines global and per-user mail-related properties that define the interface between a mail-enabled client and the network. JavaMail system components use the Session object to set and get specific properties. The Session class also provides a default authenticated session object that desktop applications can share. The Session class is a final concrete class. It cannot be subclassed. Using the JavaMail API This section defines the syntax and lists the order in which a client application calls some JavaMail methods in order to access and open a message located in a folder: 1. A JavaMail client typically begins a mail handling task by obtaining the default JavaMail Session object. Session session = Session.getDefaultInstance(props, authenticator); 2. The client uses the Session objects getStore method to connect to the default store. The getStore method returns a Store object subclass that supports the access protocol defined in the user properties object, which will typically contain per-user preferences. Store store = session.getStore(); store.connect(); 3. If the connection is successful, the client can list available folders in the Store, and then fetch and view specific Message objects. // get the INBOX folder Folder inbox = store.getFolder("INBOX"); // open the INBOX folder; Message m = inbox.getMessage(1); // get Message # 1 String subject = m.getSubject(); // get Subject Object content = m.getContent(); // get content ... ... 4. Finally, the client closes all open folders, and then closes the store. inbox.close(); // Close the INBOX store.close(); // Close the Store

DESIGN PRINCIPLES & METHODOLOGYTo produce the design for large module can be extremely complex task. The design principles are used to provide effective handling the complexity of the design process, it will not reduce to the effort needed for design but can also reduce the scope of introducing errors during design. For solving the large problems, the problem is divided into smaller pieces, using the time-tested principle of divide and conquer. This system problem divides into smaller pieces, so that each piece can be conquered separately. For software design, the problem is to divide into manageable small pieces that can be solved separately. This divide principle is used to reduce the cost of the entire problem that means the cost of solving the entire problem is more than the sum of the cost of solving all the pieces. When partitioning is high, then also arises a problem due to the cost of partitioning. In this situation to know the judgement about when to stop partitioning. In design, the most important quality criteria are simplicity and understandability. In this each the part is easily related to the application and that each piece can be modified separately. Proper partitioning will make the system to maintain by making the designer to understand problem partitioning also aids design verification. Abstraction is essential for problem partitioning and is used for existing components as well as components that are being designed, abstracting of existing component plays an important role in the maintenance phase. ding design process of the system. In the functional abstraction, the main four modules to taking the details and computing for further actions. In data abstraction it provides some services.

The system is a collection of modules means components. The highest-level component corresponds to the total system. For design this system, first following the top-down approach to divide the problem in modules. In top-down design methods often result in some form of stepwise refinement after divide the main modules, the bottom-up approach is allowed to designing the most basic or primitive components to higher-level components. The bottom-up method operations starting from very bottom. In this system, the system is main module, because it consists of discrete components such that each component supports a welldefined abstraction and if a change to the component has minimal impact on other components. The modules are highly coupled and coupling is reduced in the system. Because the relationships among elements in different modules is minimized.

Design Objectives These are some of the currently implemented features: Complete portability Apache James is a 100% pure Java application based on the Java 2 platform and the Java Mail 1.3 API. Protocol abstraction unlike other mail engines, protocols are seen only like "communication languages" ruling communications between clients and the server. Apache James is not be tied to any particular protocol but follow an abstracted server design (like Java Mail did on the client side) Complete solution the mail system is able to handle both mail transport and storage in a single server application. Apache James works alone without the need for any other server or solution. Mailet support Apache James supports the Apache Mailet API. A Mailet is a discrete piece of mail-processing logic which is incorporated into a Mailet-compliant mail-server's processing. This easy-to-write, easy-to-use pattern allows developers to build powerful customized mail systems. Examples of the services a Mailet might provide include: a mail-to-fax or mail-to-phone transformer, a

filter, a language translator, a mailing list manager, etc. Several Mailets are included in the JAMES distribution. Resource abstraction like protocols, resources are abstracted and, accessed through defined interfaces (Java Mail for transport, JDBC for spool storage or user accounts in RDBMS's, Apache Mailet API). The server is highly modular and reuses solutions from other projects. Secure and multi-threaded design Based on the technology developed for the Apache JServ servlet engine, Apache James has a careful, security-oriented, full multi-threaded design, to allow performance, scalability and mission-critical use.

System design is the process of applying various techniques and principles for the purpose of definition a system in sufficient detail to permit its physical realization. Software design is the kernel of the software engineering process. Once the software requirements have been analyzed and specified, the

design is the first activity. The flow of information during this process is as follows. Information domain details

Function specification

Behavioral specification

Desi gn

Other requirement modules

CodeProcedural design



Software design is the process through which requirements are translated into a representation of software.

Primary design is concerned with the transformation of requirements into data and software architecture. Detailed design focuses on refinements to the architectural representations that lead to detailed data structure and algorithmic representation for software. In the present project report only preliminary design is given more emphasis.

System design is the bridge between system & requirements analysis and system implementation. Some of the essential fundamental concepts involved in the design of as applications are Abstraction Modularity Verification Abstraction is used to construct solutions to problems without having to take account of the intricate details of the various component subprograms. Abstraction allows system designer to make step-wise refinements by which attach stage of the design unnecessary details annunciate with representation or implementation may be hidden from the surrounding environment. Modularity is concerned with decomposing of main module into welldefined, manageable units with well-defined interfaces among the units. This enhances design clarity, which in turn eases implementation, debugging, testing, and documentation maintaining of the software product. Modularity viewed in this senses vital tool in the construction of large software projects. Verification is fundamental concept in software design. A design is verification. It can be demonstrated that the design will result in an implementation, which satisfied the customers requirements. Some of the important factors of quality that are to be considered in the design of application are: The software should behave strictly according to the original specification of satisfying customers requirements and should function smoothly under normal and possible abnormal conditions. This product is highly reliable, can handle any number of mails to filter. The design of the system must be such a way that any new additions to the information functional and behavioral domain may be done easily and should be adapted to new specifications. We provided this extensibility to this product. you can add any number of filters to your product in the future.

System design is the process of developing specification for the candidate system that meets the criteria established during the phase of system analysis. Major step in the design is the preparation of input forms and design of output reports in a form acceptable to the user. These steps in turn lead to a successful implementation of the system.

In this project we focus on Privacy-Aware Collaborative Spam Filtering document, which is a part of our James Server. We configure our logic in that place to work. Actually Privacy-Aware Collaborative Spam Filtering document is the main key to implement our filters. First It considers our filters and then based on the logic in those filters it takes the decision to drop the messages or not. Following is the design document: Privacy-Aware Collaborative Spam Filtering document: Privacy-Aware Collaborative Spam Filtering document is an application to download your email through protocols like POP3 and IMAP. It also allows you to retrive your news messages through NNTP. In addition to the simple feature of downloading mail, Mail Fetch has the concept of mail filters. A filter has the single job of deciding whether or not to download a single message. The actual decision of whether to download a mail or not is made through a sequence of filters. There can be a global set of filters as well as a per maildrop one. A maildrop represents your mailbox from which you want to download your mail. Privacy-Aware Collaborative Spam Filtering document is written in the Java Programming language and has an extensible XML based configuration. Privacy-Aware Collaborative Spam Filtering document is very easy to configure. All that has to be done is edit the plain text configuration file. I have been written a fair amount of documentation, so that should help. Privacy-Aware Collaborative Spam Filtering document can process multiple maildrops with individual filter mechanisms and poll times.

Features:Following are the list of features provided by Privacy-Aware Collaborative Spam Filtering document:

POP3 and IMAP Protocol Support Can handle any number of Maildrops Polling mechanism to periodically check maildrops for new

messages Filtering system for downloading mail Standard filters provided like Size, Message-id, Sender Easy pluggability of user defined filters Runs on all platforms supported by Java2 Configurable logging mechanism to keep track of mails downloaded Multiple delivery options provided - like Mailbox and SMTP Delivery Delivery options accessible at the filter level Experimental NNTP Support

Modules:Core Module: This module helps in interacting with the XML and reads the required information. After the reading the information it can interact with the specified mail boxes as you require and download the mails. It also co-ordinates other modules. Filtering Module: This module deals applying the filter on thespecified mail boxes and unwanted mails. It follows sequence of applying the filter as we have specified in the XML file. It also allows applying the filters globally and locally according the employee customization.

Delivery Agents Module: This module deals with sending theremaining mails in the specified mail boxes to delivery agent, Each delivery agent the backup copy of the mails at a targeted location.

Configuring and Extending Privacy-Aware Collaborative Spam Filtering document:Privacy-Aware Collaborative Spam Filtering document uses XML for configuration. The configuration file is MailFetch.xml. This file exists in conf directory The configuration file is accompanied by a detailed document instructing one on how to configure Privacy-Aware Collaborative Spam Filtering document. I would recommend referring to that document whenever you have some problem following what Im saying. This document is called Configuration.txt.

Essentially, there are maildrops to download mail from - they contain all the information about accessing a maildrop. There is a global sequence of filters, which are checked for each maildrop before the maildrop-local sequence of filters. Filters can be configured through the configuration file. For example a size-based filter would like to know what size it should filter at and also what action it should take when a message is of a greater size. Each of the filters themselves may have some additional configuration options. The additional configuration is totally dependent on the filter itself. You could add your own filter and want to be configured from the configuration file. I shall expand on that later in this section. There is also the option of delivery agents. After a message passes through all the filters and none of them have an objection with it being downloaded, it is downloaded and sent to Delivery Agent who is responsible to delivering it (to a mailbox, maildir, SMTP host etc). Mail Fetch supports different kinds of delivery agents and you can choose one of them for delivery of your mail. You can go so far as to make each of your maildrops deliver messages to a different delivery agent! A Maildrop itself needs to specify its delivery agent when all the filters let the message pass through. Each delivery agent has an id - it can thereafter be referred to by its id. Some filters support delivering messages to a delivery agent specified in their configuration. For example, all messages from the [email protected] would go to the execve mailbox if the SenderMailFilter is configured. You can implement your own filters by implementing certain interfaces, a user can very easily add his/her own filter to the current set of provided filters. Examples of filters are spam control, size restrictions etc. Mail Fetch downloads the email if it matches the criteria and then can deliver it using one of its delivery options. Currently, one can choose to deliver mail to a mailbox or to an SMTP Server You will need to specify the name of the class you have implemented in the configuration, so that Privacy-Aware Collaborative Spam Filtering document can initialize it as required. Note that the class has to be in the system classpath. This can be easily achieved by putting the class in a jar and putting it in the lib directory. The script picks up all the jars from the directory and places them in the classpath before invoking Privacy-Aware Collaborative Spam Filtering document. All the delivery agents specified in the configuration are available to the filters through a Privacy-Aware Collaborative Spam Filtering document. delivery. DeliveryManager object. This object allows access to these agents based on their ids. NOTE that the id of the agent has to known by the filter requesting for the agent. A Delivery Event is generated when a message is delivered after passing through all the filters. NOTE that there is no event generated when a

filter itself delivers a message through an agent. The easiest way to get a hang of how to implement the filter of your choice is to get a hold of the source and checkout some of the implemented filters (like NullMailFilter!!) Thats about it in terms of Privacy-Aware Collaborative Spam Filtering document configuration. Go on, open the conf/JFetch.xml file in the Mail directory and play with it. Do let me know of any problems you face; let me know even if you dont. Table of Contents =========== 1. Introduction 2. Some Definitions 2.1 Maildrops 2.2 Filters 2.3 DeliveryAgents 2.4 Events 3. Detailed Configuration 3.1 Maildrop 3.2 Mailfilters 3.2.1 Global filters 3.2.2 Local filters 3.2.3 All filters explained 3.3 Delivery Agents 3.4 Miscellaneous Configuration 4. Sample configuration file 5. Advanced usage

1. Introduction: Privacy-Aware Collaborative Spam Filtering document is an application to access your remote email. It supports popular mail protocols like POP3 and IMAP. You can download your mail to your local machine and use an email client to read it. Mail Fetch also comes with a very powerful and flexible filtering system. In fact, PrivacyAware Collaborative Spam Filtering document comes with a range of filters out of the box; so you can get started immediately. These filters range from those, which prevent you from getting the same message twice to those which help in spam filtering. Privacy-Aware Collaborative Spam Filtering document is written in Java and so has the advantage of running on most platforms. Configuration is text-based and is an XML file. This document tells you how to configure Privacy-Aware Collaborative Spam Filtering

document. It details out the various configuration options available and also provides a sample configuration file. 2. Some Definitions: 2.1 Maildrops A Maildrop is the mailbox from where you download your mail. Characteristics of a maildrop are the protocol (POP3, IMAP, NNTP), the username, password, hostname, port number, the default Delivery Agent for that maildrop, any filters for that maildrop and finally any protocol-specific configuration for the maildrop. For example an NNTP maildrop would contain newsgroup information, which is not used by a POP3 or IMAP maildrop. 2.2 Filters A Filter is the core of the decision making in Privacy-Aware Collaborative Spam Filtering document. A filter decides on a per-mail basis whether the message should be downloaded or not. A pipeline of filters is setup (yes, again setup in the configuration) and a message which needs to be downloaded is passed through this pipeline. At any point of the pipeline, a filter could indicate that the message should not be processed through the pipeline anymore. For example a SPAM filter (sender based) could find a match from the list of spammers it has and reject the message. There are two kinds of filters -- global and local. These are not an attribute of a filter itself, but rather depend on the usage of a filter. Local filters are associated to a maildrop whereas global filters are applicable to all maildrops. For example, you might want a Message-ID filter to be applicable to all maildrops whereas keep a sender-based filter only for the maildrop where you expect mail from that sender. 2.3 DeliveryAgents A Delivery Agent has the responsibility of delivering mail. The current supported mediums are SMTP and mailbox. Delivery Agents are identified by a unique ids in the configuration. Maildrops have a default Delivery Agent configured which is used if the message passes through the Filter pipeline successfully. Some filters also accept a Delivery Agent attribute in the configuration. What this implies is that if the message matches the Filter's criteria, the Filter delivers the message using this Delivery Agent. This also allows for simple filtering mechanisms. For example you might want all likely SPAM to be delivered to special mailbox where you can then later check for any false positives. 2.4 Events

Event is an internal concept of Privacy-Aware Collaborative Spam Filtering document. If you are only going to use Privacy-Aware Collaborative Spam Filtering document and the filters it provides out of the box, you don't need to understand this concept. If you are extending Privacy-Aware Collaborative Spam Filtering document by developing your own Filters, you will need to understand this concept. Whether you actually use it, depends on the Privacy-Aware Collaborative Spam Filtering document functionality itself. Events are a mechanism by which a Privacy-Aware Collaborative Spam Filtering document can be notified when something interesting happens in Privacy-Aware Collaborative Spam Filtering document. Currently, we only generate events for the delivery of a message. Let us take the example of the MessageIDMailFilter. This filter rejects messages with message-ids which have already been downloaded. This avoids receiving duplicate messages for example when you are subscribed to two mailing lists and a cross-posting happens. It maintains a list of message-ids which we have already downloaded. The list is saved on the disk after the download of every message so that if the session is interrupted due to any reason, the message is not re-downloaded. So, the Privacy-Aware Collaborative Spam Filtering document implements a DeliveryLister and hence, gets the delivery event. 3. Detailed Configuration: 3.1 Maildrop You can have more than one maildrops for Privacy-Aware Collaborative Spam Filtering document to download mail from. Privacy-Aware Collaborative Spam Filtering document downloads mail for them in the order in which they are configured. Here is a sample maildrop configuration: 110 myusername mypass true

The protocol attribute can be one of pop3, imap or nntp (EXPERIMENTAL). The mda attribute specifies the default delivery agent when the message is ready to be downloaded. See Delivery Agents for more information. This requires a delivery agent called "smtp" to be configured. Host, port, user and password are attributes for the connection and authentication. Setting delete to true makes delete messages from the maildrop once they are downloaded. The filters configured *inside* the Maildrop element are the local maildrop filters and will not affect other maildrops. Some Maildrops like NNTP, have some extra configuration parameters like the newsgroups which have to be downloaded. Please note that NNTP support is EXPERIMENTAL and is not yet stable. 3.2 Mailfilters: 3.3 Global filters All filters which are configured outside the maildrop elements are called global filters.These filters affect all the maildrops.The configuration is the same for both global and local filters.

Here is a sample global filters configuration:

See "All filters explained" for detailed explanation of all provided filters. 3.4 Local filters All filters which are configured inside the maildrop elements are called local filters. These filters affect only the maildrop associated with them. The configuration is the same for both global and local filters. Here is a sample local filters configuration: 3.5 All filters explained FILTER NAME : HeaderMailFilter DESCRIPTION: Matches a header in the message. This requires the name of the header and the value of the header CLASS NAME : MailFetch.filters.HeaderMailFilter SAMPLE CONFIGURATION:

EXPLANATION: This filter allows to filter messages based on the value of a particular header. The mda attribute is optional and allows you to direct the message to the delivery agent specified if the message matches the criteria. FILTER NAME : MessageIDMailFilter DESCRIPTION : Filters messages if they contain a duplicate Messageid. This Filter stores the list of downloaded messageids in the specified file. CLASS NAEM : MailFetch.filters.MessageIDMailFilter SAMPLE CONFIGURATION: EXPLANATION: The name of the storage element is a friendly name of the repository. limit specifies the maximum number of elements to allow in the list. The destination attribute is the actual file in which the list is stored. FILTER NAME :Null Filters DESCRIPTION :his filter consumes all messages. It also marks them for deletion. ClASS NAME :MailFetch.filters.NullMailFilter SAMPLE CONGIGURATION: EXPLANATION :This filter is a special filter; it could be used to clean up the maildrop for example. It is also a DANGEROUS filter, you have been warned. FILTER NAME :RecipientMailFilter DESCRIPTION :This filter matches the recipients of the message against those provided in a list. CLASS NAME : MailFetch.filters.RecipientMailFilter SAMPLE CONFIGURATION:

EXPLANATION: This filter checks if the recipients of the message (TO and CC) exist in the defined list. The mda attribute is optional. blocklist is the file containing the recipient addresses (one on each line). FILTER NAME : SenderMailFilter DESCRIPTION : This filter matches the sender of the message against those provided in a list. CLASS NAME : MailFetch.filters.SenderMailFilter SAMPLE CONFIGURATION: EXPLANATION : This filter checks if the sender of the message exist in the defined list. The mda attribute is optional. blocklist is the file containing the sender addresses (one on each line). FILTER NAME : SizeMailFilter DESCRIPTION: This filters messages based on their size. CLASS NAME : MailFetch.filters.SizeMailFilter SAMPLE CONFIGURATION: EXPLANATION: max-size is the maximum size of the message which is permitted to be downloaded. The size is in bytes. A max-size of 0 indicates that the size restriction is lifted. FILTER NAME : SubjectMailFilter DESCRIPTION : This filter does subject based filtering based on a list CLASS NAME : MailFetch.filters.SubjectMailFilter SAMPLE CONFIGURATION:

EXPLANATION: This filter is again similar to the sender/recipient filters except that it does filtering based on the subject of the message. The mda attribute is optional. 3.3 Delivery Agents After passing through the filter pipeline, mail is delivered using a DeliveryAgent. Currently we provide two main delivery mechanisms: mbox and smtp. SMTP is the most reliable mechanism although it requires that you have an MTA configured for delivery. DELIVERY AGENT NAME: Mailbox ESCRIPTION : Delivers a message to the specified mbox. CLASS NAME : SAMPLE CONFIGURATION: /home/gautam/Mail/junkmail EXPLANATION: The destination element identifies the location of the box where the delivery is made. Some basic dot-locking functionality is provided by the mbox provider to avoid multiple ccess to the mbox. DELIVERY AGENT NAME : SMTP DESCRIPTION : Deliver the message to a configured SMTP host CLASS NAMES : MailFetch.filters.SMTPDeliveryAgent SAMPLE CONFIGURATION : localhost.localdomain 25 gautam localhost EXPLANATION: The localuser element defines who the email is directed to. The domain is the domain of the local user. In this case, the email is dispatched to [email protected] user and password are used if your server requires SMTP Authentication.

DELIVERY AGENT NAEM : NULL DESCRIPTION : A Null Delivery Agent does nothing. So basically equivalent to dumping into /dev/null. CLASS NAMES : MailFetch.filters.NullDeliveryAgent SAMPLE CONFIGURATION : None EXPLANATION :There is no configuration for this delivery Agent. Please use with care, as you could very easily lose all your mails due to a misconfiguration. 3.4 Miscellaneous Configuration Polling: Polling time is the time Mail Fetch waits between mail downloading sessions. For example 120 Specifies the polling time as 120 seconds (2 minutes). A nonpositive polling time indicates that Mail Fetch should just run through the maildrop list and download messages once. Logging: I would recommend turning logging on as it gives you a very good idea as to what is happening in the system. All exceptions are logged, so nothing would escape your eye. Mail Fetch does a light-medium logging in the DEBUG state. The target attribute specifies the file where Privacy-Aware Collaborative Spam Filtering document should log all its data. The priority attribute specifies the logging priority. Priorities of logging are DEBUG, INFO, WARN, ERROR, FATAL_ERROR. The enabled attribute is optional and is treated as true by default.

4. Sample configuration fileA sample configuration file is included along with the Privacy-Aware Collaborative Spam Filtering document distribution. You will need to customize the configuration file according to your needs and requirements. Refer to this document to configure the file. Below is a small configuration file to give you some idea as to how to go about modifying the configuration.

600 localhost 25 gautam localhost /home/gautam/Mail/spam /home/gautam/Mail/personal 110 myusername mypass true The above section is just a sample configuration file. You will need to customize your configuration depending on what kind of filtering meets your requirements.

5. Advanced usageIn case you find that you need some customized filtering, you may want to write your own Filters. The easiest way to understand how to do this is to look at the filters which are available in the PrivacyAware Collaborative Spam Filtering document distribution. Good filters to start with are NullMailFilter, SizeMailFilter, SubjectMailFilter and MessageIDMailFilter. That should cover most common uses. Once you have written your Filter, you need to include it in the filter configuration. In addition, Privacy-Aware Collaborative Spam Filtering document requires it to be in the system classpath to be able to load it. It can simply be achieved by putting the relevant classes in a jar and putting it in the lib directory. The run scripts loadup all the jars in the classpath. Privacy-Aware Collaborative Spam Filtering document is an application to download your email through protocols like POP3 and IMAP. The decision of whether to download a mail or not is made through a sequence of filters. By implementing certain interfaces, a user can very easily add his/her own filter to the current set of provided filters. Examples of filters are spam control, size

restrictions etc. Privacy-Aware Collaborative Spam Filtering document downloads the email if it matches the criteria and then can deliver it using one of its delivery options. One can choose to deliver mail to a mailbox or to an SMTP Server. Privacy-Aware Collaborative Spam Filtering document can process multiple maildrops with individual filter mechanisms and delivery options. Privacy-Aware Collaborative Spam Filtering document is written in the Java Programming language and has an extensible XML based configuration. Configuration TO SET UP Privacy-Aware Collaborative Spam Filtering document FOLLOW THE FOLLOWING STEPS: If you have the source, compile using the build. bat batch file * Now, enter the dist directory and edit the conf/JFetch.xml file. This is the configuration file for Privacy-Aware Collaborative Spam Filtering document. Refer to the Configuration.txt file in the docs directory for a detailed description of the configuration file. * Now you can run Privacy-Aware Collaborative Spam Filtering document by executing the run. bat file in the dist directory.

Input design is the process of converting user-originated information to computer-based format. The goal of designing input data is to make data entry as easier and error free as possible. An input format should be easy to understand. In this product inputs are nothing but messages i.e. mails. Every mail has some properties like sender, subline, body, message-id and so on. By taking these inputs automatically from the message, which are inside the mailbox, we do the process to decide whether to drop the message or not. The output design relays on input, which is used to the output. Hence input design needs some special attention.

Output reflects image of the organization. The output design involves designing forms layout, making lists, making well designed reports

etc., and reports are main outputs of the proposed system. Here the outputs are : LOG FILES, which record every thing handle by the server relevant to this project including error messages.

Databases and database management systems and explores how to use relationships in a pool of data when developing methods for data storage and retrieval. Databases allow data to be shared among different applications. Database in not used in this product. we simply record the details of how a particular transaction is handled by the server in some log files. We store those log files in permanent disk at specified location.

UML Diagrams


TestingTesting is one of the most important phases in the software development activity. In software development life cycle (SDLC), the

main aim of testing process is the quality; the developed software is tested against attaining the required functionality and performance. During the testing process the software is worked with some particular test cases and the output of the test cases are analyzed whether the software is working according to the expectations or not. The success of the testing process in determining the errors is mostly depends upon the test case criteria, for testing any software we need to have a description of the expected behaviour of the system and method of determining whether the observed behaviour confirmed to the expected behaviour. Since the errors in the software can be injured at any stage. So, we have to carry out the testing process at different levels during the development. The basic levels of testing are Unit, Integration, System and Acceptance Testing. The Unit Testing is carried out on coding. Here different modules are tested against the specifications produced during design for the modules. In case of integration testing different tested modules are combined into sub systems and tested in case of the system testing the full software is tested and in the next level of testing the system is tested with user requirement document prepared during SRS. There are two basic approaches for testing. They are In Functional Testing test cases are decided solely on the basis of requirements of the program or module and the internals of the program or modules are not considered for selection of test cases. This is also called Black Box Testing

In Structural Testing test cases are generated on actual code of the program or module to be tested. This is called White Box Testing. A number of activities must be performed for testing software. Testing starts with test plan. Test plan identifies all testing related activities that need to be performed along with the schedule and guide lines for testing. The plan also specifies

the levels of testing that need to be done, by identifying the different testing units. For each unit specified in the plan first the test cases and reports are produced. These reports are analyzed. Test plan is a general document for entire project, which defines the scope, approach to be taken and the personal responsible for different activities of testing. The inputs for forming test plane are Project plan Requirements document System design Although there is one test plan for entire project test cases have to be specified separately for each test case. Test case specification gives for each item to be tested. All test cases and outputs expected for those test cases. The steps to be performed for executing the test cases are specified in separate document called test procedure specification. This document specify any specify requirements that exist for setting the test environment and describes the methods and formats for reporting the results of testing. Unit testing mainly focused first in the smallest and low level modules, proceeding one at a time. Bottom-up testing was performed on each module. As developing a driver program, that tests modules by developed or used. But for the purpose of testing, modules themselves were used as stubs, to print verification of the actions performed. After the lower level modules were tested, the modules that in the next higher level those make use of the lower modules were tested. Each module was tested against required functionally and test cases were developed to test the boundary values. Integration testing is a systematic technique for constructing the program structure, while at the same time conducting tests to uncover errors associated with interfacing. As the system consists of the number of modules the interface to be tested were between the edges of the two modules. The software tested under this was incremental bottom-up approach. Bottom-up approach integration strategy was implemented with the following steps.

Low level modules were combined into clusters that perform specific software sub functions. The clusters were then tested. System testing is a series of different tests whose primary purpose is to fully exercise the computer-based system. It also tests to find discrepancies between the system and its original objective, current specifications.

Privacy-Aware Collaborative Spam Filtering document System Test Cases & System Test ReportThe system test cases mentioned below are expected to work and give the expected behaviour if the explorer is configured to run jar files as mentioned in the project folder. The necessary library files and standard jar files are in the appropriate project directories and the path and classpath environment variables are appropriately set.

Tes C.No .



Observe d behaviou r

Status P= Passed F = Failed


Send a Mail with size less than what we specify in .xml and apply size filter Send a Mail with size more than what we specify in .xmland apply size filter

The mail should reach the destination without

any hurdles





Check the log file For above two

The mail should not be Reached to destination Just becoz of size mailfilter has to delete It. It should contain info about mail sizes




and what mail is deleted




Add one more mail Drop in xml file by Adding one more maildrop tag




Add subject filter in xml file by adding one more filter tag in filters tag i.e global filters area. Add subject filter in xml file by adding one more filter tag in filters tag i.e global filters area. Add null filter in xml file by adding one more filter tag in filters tag i.e global filters area. Add Header filter in xml file by adding one more filter tag in filters tag i.e global filters area. Add SMTP Delivery agent in xml and give

Our application should Interact with the specified mailboxes and Download all the mails from them Our application should the mails which are having the subject words What we specify in subject blocklist file Our application should the mails which are a senders what we specify in sender blocklist file Our application should delete all the mails Irrespective of the criteria. Our application should the mails which are a header name is equal to header value what we specify in xml file Each and every copy of non deleted mails



-d o-








That id in the maildrop tag 9

1 0

Add MailBox Delivery agent in xml and give That id in the maildrop tag

Should send another copy to some other user What we specify in SMTP Delivery agent Each and every copy of non deleted mails Should send another copy to a directory




Configuring FiltersIt is the duty of the Administrator to configure the filters. For this purpose First place the our Jfetch directory in a Mail server administrator required. After that you can find an XML file in a sub director named conf. That file is easily readable by this administrator can change the corresponding values to configure to his chosen Mailserver. you can see the main part of that file below: localhost 110 stud2 pass2 false

here you can observe we configure it to James server which is running on POP3 protocol and which is placed in our local system at port number 110. These filters are applied only on stud2 maildrop or mailbox. After configuration completed administrator have to create mailboxes for company personnel in a Mail server using Telnet Tool and configure those mailboxes to your local Mail client relevant to this configuration we did it before. Open MailClient used by you and follow the instruction given by that MailClient to configure those earlier created mailboxes in Mailserver. At one time it is asking for to specify incoming mail server and outgoing mail server then you have to specify the IP-address of server in that you configured your filters earlier. In case of MS-Outlook Express screen seems to be like this After that your Local mail client creates a new accounts for you specified mailboxes. Thus you can access those mail boxed from your local mailclient and can organize those mailboxes as you like. A part from this configuration your installed filters worked on all the mailboxes you specified in above configuration file here names as conf.xml .

Privacy-Aware Collaborative Spam Filtering document is a tool, lot of efforts were put to make it filter perfectly and efficiently. The developed system is tested with real data and the users are satisfied with the performance of the system and reports. This project is developed using JAVA MAIL API, one of the J2EE technologies, with the help of XML language. By using this tool we can drop the unwanted mails or messages automatically by specify our restrictions in corresponding files. By this lot of work load will be reduced to the administrator and also a copy of deleted message can be directed to specified location which is for verifications. This tool is very useful for administrating department our company It provides extendibility also. So you can add your own filters in future very simply

without disturbing the existing code. This tool reduces the manual work. Time as well as manpower saved. The time for processing and producing reports is considerably reduced. All the features are implemented and developed as per the requirements.

Basic Java Concepts Java Mail API

: Thinking in JAVA ( Bruce Eckel ) : Wrox Publications Volume I and II : Pankaj Jalote : I.T.Hawryszkiewycz : UML in 24 Hours Book :

An Integrated Approach to Software Engineering Introduction to System Analysis and Design For UML diagrams Some preferred websites