Privacy-Aware Collaborative Spam Filtering document

1. ABSTRACT

2. INTRODUCTION

3. ABOUT ORGANISATION

4. SRS DOCUMENT

5. DESIGN PRINCIPLES & EXPLANATION

6. DESIGN DOCUMENT

6.1 SYSTEM DESIGN

7. PROJECT DICTIONARY

7.1 UML DIAGRAMS

8. FORMS & REPORTS

8.1 I/O SPECIMENS

8.2 I/O SAMPLES

9. TESTING

9.1 TEST CRITERIA & TEST CASES

9.2 TEST REPORT & ANALYSIS

10. IMPLEMENTATION & USER MANUALS

11. CONCLUSION

12. BIBLIOGRAPHY

Abstract:

A Filter is the core of the decision making in Privacy-Aware Collaborative

Spam Filtering document. A filter decides on a per-mail basis whether the

message should be downloaded or not. A pipeline of filters is setup (yes,

again setup in the configuration) and a message, which needs to be

downloaded, is passed through this pipeline. At any point of the pipeline, a

filter could indicate that the message should not be processed through the

pipeline anymore. For example a filter (sender based) could find a match

from the list of spammers it has and reject the message.

There are two kinds of filters -- global and local. These are not an attribute of

a filter itself, but rather depend on the usage of a filter. Local filters are

associated to a maildrop whereas global filters are applicable to all maildrops.

For example, you might want a Message-ID filter to be applicable to all

maildrops whereas keep a sender-based filter only for the maildrop where

you expect mail from that sender.

A filter has the single job of deciding whether or not to download a single

message. The actual decision of whether to download a mail or not is made

through a sequence of filters. There can be a global set of filters as well as

per maildrop one. A maildrop represents your mailbox from which you want

to download your mail.

In this project we define totally six filters. But we can define more than that

number of filters as our requirements. In this project our focus is on main and

basic filters like HeaderMailFilter, MessageIDMailFilter, NullFilter,

ReceipientMailFilter, SenderMailFilter, SizeMailFilter and SubjectMailFilter.

A project titled, “Privacy-Aware Collaborative Spam Filtering

document” is proposed to be developed with WINDOWS-2000 Server as the

operating system and Java Mail API of J2EE Technologies. This package will

have provision for creating your own filters and using those generated filters

in appropriate places.

System Analysis:Existing System:

The existing system is not computerized. All the mails were being done manually. To make this laborious job simple it is to be computerized.

The administrator maintains all the mailboxes of employees of our organization. He is the responsible to organize those mailboxes. But incase of deleting unwanted mails he deletes them manually by marking after checking which mails are unwanted based on some facts like large sized mail, userID and so on.

Proposed System:

The first step of analysis process involves the identification of need. The success of a system depends largely on how accurately a problem is defined, thoroughly investigated and properly carried out through the choice of solution.

This package has been developed in order to overcome the difficulties encountered while using the manual system. Faster and timely deletion of mails which are unwanted is another motivating factor for the development of this package.

Project Scope and Objectives:

Privacy-Aware Collaborative Spam Filtering document is a tool to delete

unwanted mails. Lot of effort was put

to make it user friendly.

Optimum utilization of tool is possible. All basic filters are provided.

Reduces the user interaction work.

The wastage of time is reduced.

It also helps in optimum distribution of funds by the management among

user groups for procurement of new equipment.

More flexible it means user (administrator)can add his own number of

filters if he interested easily.

Company Profile:Global Interactive Solutions has emerged to be a world-class solutions and

products organization with clientele spread across geographies. It has time-

and-again taken up challenges for accomplishing the mission of customer

satisfaction armored with a focused vision and technical expertise.

Our growth and success has evolved from our ability to foresee customer

challenges and address them with apt solutions. Our teams, comprising of

research innovators, architects and developers have constantly worked on

developing products, solutions and mission-critical applications.

We started with Visual SHIFT, our initial product that addressed the Y2K

problem. It received global acclamation and was awarded "Product of the

Year" by Datamation under Y2K product category. Gartner Group, the

research and consulting organization, rated Visual SHIFT as "Best in Class". It

also won the accolades of being the "Best Product" from HYSEA (Hyderabad

Software Exporters Association).

Global Interactive Solutions Technologies is a global software development

firm specializing in software testing and product development services

catering to technology companies across diverse industry segments.

Our flexible delivery model helps us offer focused IT solutions, which help our

clients respond quickly to their business opportunities. Our clients engage

with us to enable them stay ahead in the technology adoption curve and to

develop and protect their Intellectual Property Assets. Our Technology

Excellence Groups embrace new technologies as they emerge to provide

clients with solutions that give them a competitive edge in their businesses.

Leveraging our strengths in Research & Development and expertise in

Component Based Application Development, we have been successfully

providing our global clientele with software testing and product development

services. With such technology foresight and sophisticated product

development and testing expertise, we credit our success to commitment,

performance, delivery and customer delight.

Our world-class practices and methodologies make us the preferred

technology partner for many technology companies. Our growth comes from

the unique business model and integration of people, processes and

technology. We have continually demonstrated our commitment to develop

Cost-effective, quality products and custom-applications built on strict time-

lines by adopting industry standard processes.

People, experience and skill sets are the ultimate competitive differentiators

when it comes to finalizing a Strategic Offshore Outsourcing deal. Global

Interactive Solutions is an IT services company that adapts solutions to the

market requirements. Its people are well qualified and experienced in the

technology platforms they work. Personnel are trained and retrained, that

make them as masters in the chosen area.

Consolidating our capabilities in diverse technologies, and our solid

foundation in product and application development, we built expertise in

delivering end-to-end solutions and providing Enterprise Application

Integration.

Our technical expertise coupled with functional know-how equipped us to

collaborate with global organizations to deliver enterprise-wide solutions for

business verticals such as Insurance, Retail and Distribution, Consumer

Electronics, Healthcare and Utilities. Our clientele comprise of organizations

of varied sizes - from small and medium companies to Fortune 100

corporations. We act as strategic technology partners for global

conglomerates and also provide R&D outsourcing services to international

technology labs.

Requirements Specification Document

The Privacy-Aware Collaborative Spam Filtering document is developed with

the aim of automatically deleting the unwanted mails based on our

definitions from the specified maildrops. The Privacy-Aware Collaborative

Spam Filtering document takes all the necessary definitions, in which we

define some facts based on those mails are deleted automatically. The

administrator can define those facts to delete the unwanted mails.

1. Introduction

1.1 Purpose: The purpose of this document is to describe all external requirements for the Privacy-Aware Collaborative Spam Filtering document. It also describes the interfaces for the system. It is

a. To implement Privacy-Aware Collaborative Spam Filtering document we need a mail Server, which is capable of storing mail in corresponding mailboxes. In our project we implement or tested our filters on James server as it is available open.

b. As a user interface we used Microsoft Outlook Express. Because it user-friendly and easy to access, read and maintain our mails.

c. To send mails we need a protocol capable to send or deliver mails. And for receiving mails we need another protocol to get those mails from our mailboxes. In our project we used SMTP for sending the mails and POP3 for receiving the mails. These both are available in single mail server i.e. our James mail server we used.

d. Using XML language and basic java we can write script or code for filters. Because XML provides application interoperability.

1.2 Scope: This document describes the requirements of the system. It is meant for use by the developers, and will also be the basis for validating the final system. Any changes made to the requirements in the future will have to go through a formal change approval process. The developer is responsible for asking for clarifications. When necessary and will not make any alterations without the permission of client.

This project work intends to delete the not required mails from the mailboxes of organization personnel. In this lot of effort was put to make it perfect. Work Load to delete mails was avoided. The time for processing and deleting mails is considerably reduced. It helps a lot to the administrator by saving his valuable time. Thus he can allot that for other important activities. It provides more extendibility. Besides the existing filters Administrator can add his own filters if needed in future easily. We can apply these filters on any other Mail Severs to drop unwanted mails from specified maildrops. Administrator has two options to delete the mails: one is to run the filters manually whenever he wanted. Other one is he can set those filters to run automatically on schedule base.

1.3 Definition: A Filter is the core of the decision making in Privacy-Aware Collaborative Spam Filtering document. A filter decides on a per-mail basis whether the message should be downloaded or not.

1.4 Reference: Not Applicable.

1.5 Developers Responsibilities overview: The points that mentioned in system requirements specification are

1. An introductory nature describing mainly the

Purpose of the system requirements specifications document. Outlining the scope of the envisaged application.

2. Describes the iterations of the system with its environment without going into the internals of the system. Also describes the constraints imposed on the system. Thus it is out side the envisaged application. The assumptions made are also listed. It is supported by the

UML Diagrams

3. It also describes the internal behaviour of the system in response to the inputs and while generating the outputs.

This document is also supported with detailed level UML diagrams, list of inputs, process explanation and list of output.

4. Contains external interface requirements, which includes the user, hardware and software interfaces.

5. Deals with performance requirements of the system. Contains the design constraints composing of software constraints and hardware constraints.

1.6 Product function’s overview: In the Organization every employee has a mailbox. To this mailbox any one can send any number of mails for that mailbox owner. Some times we are suffering from spam mails, lengthy mails which may occupy all the memory allotted for our mail box and so on. These kind of mails are controlled by our company administrator as he is the responsible to manage all these mailboxes. He can set some constraints on those mailboxes like drop these kinds of mails if any. Here those constraints are nothing but our filters. By embedding these filters in company’s mail server he can restrict the mails. There no need to delete the mails manually after marking the delete mails. In this project administrator has to run those filters on specified mailboxes manually when ever he wants. There is one more option that is he can set those filters to run periodically without taking the permission from administrator.

When ever you run these filters they simply apply the logic we have written it already in a java file on every mail in a all mailboxes or specified mailboxes. Based on this logic decides whether to down load the mail or not. This functionality automates the function of deleting the mails.

1.7 User characteristics: In our project user is an administrator. He must have the knowledge of how to implement or embed these filters on MailServer.

1.8 General constraints: The system should run on Pentium, under windowsNT/2000 professional or server or forward versions of Microsoft operating systems with minimum 16 MB RAM for better performance. Actually these filters can apply on any kind of Mail servers.

1.9 Assumptions and Dependencies:

a. It is assumed that the James is real Mail Server

resource and required information already existed with the system.b. It is assumed that mail client is MicroSoft Outlook Express or Netscape Communicator.c. All the details produced by the user are correct.d. User will ask for new filters when he wants to filter mails more deeply or any situation, to filter like this come.

2. Function Requirements

Functional requirements specify which outputs should be produced from the given inputs. They describe the relationship between the input and output of the system, for each functional requirement a detailed description of all data inputs and their source and the range of valid inputs must be specified.

All the operations to be performed on the input data to obtain the output should be specified.

2.1 Inputs:

1. Null Filter: It deletes all kind of mails irrespective of characteristics of mails. This filter consumes all messages. It also marks them for deletion

2. Header Mail Filter: Matches a header in the message. This requires the name of the header and value of the header.

3. MessageID Mail Filter: Filters messages if they contain a duplicate Message-id. This Filter stores the list of downloaded message-ids in the specified file

4. Recipient Mail Filter: This filter matches the recipients of the message against those provided in a list.

5. Sender Mail Filter: This filter matches the sender of the message against those provided in a list

6. Size Mail Filter: This filters messages based on their size

2.2 Outputs:

1. Log Files: It writes the log files according to the operations server handled. It writes also error message if any failure occurred to indicate fault where happened. It represents all this information in the form codes assigned for each and every operation.

3. External Interface Requirements

3.1 User Interface: After the filters are embedded in Mail server and making all of them working properly no need of user interaction in case of administrator set those filters to run periodically. Otherwise it is the responsibility of the administrator to run them when he required. Totally the user interaction is very low.

3.2 Software Interfaces: these interface requirements should specify the inter face with other. Software which the system will use or which will use the system, this includes the interface with the operating system and other applications.

The message content and format of each interface should be given.

3.3 Hardware Interfaces: Hardware interface is very important to the documentation. If the software is execute on existing hardware or on the pre-determined hardware, all the characteristics of the hardware, including memory restrictions, should be specified. In addition, the current use and load characteristics of the hardware should be given.

4. Performance Requirements

All the requirements relating to the performance characteristics of the system must be clearly specified. There are two types of performance requirements – static and dynamic. Static Requirements are those that do not impose constraint on the execution characteristics of the system. These include requirements like the number of terminals to be supported, and number simultaneous users to be supported, number of files, and their sizes that the system has to process. These are also called capacity of the system. Dynamic requirements specify constraints on execution behavior of the system. These typically include response time and throughput constraints on the system.

The processing speed, respective resource consumption throughput and efficiency measure performance. For achieving good performance

Few requirements like reducing code, less use of controls, minimum involvement of repeated data etc., are to be followed. Each real-time system, software what provides required function but does not conform to performance of software requirements is acceptable. These requirements are used to test run time performance of software with the context of an integrated system.

5. Design constraints

5.1 Software constraints :

Operating System : Windows2000 Server/ NT or any Mail server

Reports : Log files

Other Applications : James Server

5.2 Hardware Constraints :

Pentium Processor : Pentium IV 2.0 GHZ

RAM : 256 MB

Hard Disk : 40 GB

Floppy Disk : 1.44 MB

CD/ROM Drive : 52 Bit

VDU : VGA

Key Board : 101 Standards

6. Acceptance Criteria

Before accepting the system, the developer must demonstrate that the system works on the details of the user email-ids entered in the corresponding files. The developer will have to show through test cases that all conditions are satisfied.

The Java Apache Mail Enterprise Server (a.k.a. Apache James) is a 100% pure Java SMTP and POP3 Mail server and NNTP News server designed to be a complete and portable enterprise mail engine solution. James is based on currently available open protocols.The James server also serves as a mail application platform. The James project hosts the Apache Mailet API, and the James server is a Mailet container. This feature makes it easy to design, write, and deploy custom applications for mail processing. This modularity and ease of customization is one of James' strengths, and can allow administrators to produce powerful applications surprisingly easily. James is built on top of version 4.1.3 of the Avalon Application Framework. This framework encourages a set of good development practices such as Component Oriented Programming and Inversion of Control. The standard distribution of James includes version 4.0.1 of the Phoenix Avalon Framework container. This stable and robust container provides a strong foundation for the James server.

This documentation is intended to be an introduction to the concepts behind the James implementation, as well as a guide to installing, configuring, (and for developers) building the James server.

The James ServerJames is an open source project intended to produce a robust, flexible, and powerful enterprise class server

http://avalon.apache.org/phoenix

http://avalon.apache.org/phoenix

http://avalon.apache.org/

http://avalon.apache.org/

that provides email and email-related services. It is also designed to be highly customizable, allowing administrators to configure James to process email in a nearly endless variety of fashions.

The James server is built on top of the Avalon Framework. The standard James distribution deploys inside the Phoenix Avalon Framework container. In addition to providing a robust server architecture for James, the use of Phoenix allows James administrators to deploy their own applications inside the container. These applications can then be accessed during mail processing.

The James server is implemented as a complete collection of servers and related components that, taken together, provide an email solution. These components are described below.

POP3 ServiceThe POP3 protocol allows users to retrieve email messages. It is the method most commonly used by email clients to download and manage email messages.

The James version of the POP3 service is a simple and straightforward implementation that provides full compliance with the specification and maximum compatibility with common POP3 clients. In addition, James can be configured to require SSL/TLS connections for POP3 client connecting to the server.

SMTP Service

SMTP (Simple Mail Transport Protocol) is the standard method of sending and delivering email on the internet. James provides a full-function implementation of the SMTP specification, with support for some optional features such as message size limits, SMTP auth, and encrypted client/server communication.

NNTP Service

NNTP is used by clients to store messages on and retrieve messages from news servers. James provides the server side of this interaction by implementing the NNTP specification as well as an appropriate repository for storing news messages. The server implementation is simple

and straightforward, but supports some additional features such as NNTP authentication and encrypted client/server communication.

Fetch POP

Fetch POP, unlike the other James components, is not an implementation of an RFC. Instead, it's a component that allows the administrator to configure James to retrieve email from a number of POP3 servers and deliver them to the local spool. This is useful for consolidating mail delivered to a number of accounts on different machines to a single account.

The Spool Manager, Matchers, and Mailets

James separates the services that deliver mail to James (i.e. SMTP, Fetch POP) from the engine that processes mail after it is received by James. The Spool Manager component is James' mail processing engine. James' Spool Manager component is a Mailet container. It is these mailets and matchers that actually carry out mail processing.

Repositories

James uses a number of different repositories to both store message data (email, news messages) and user information. User repositories store user information, including user names, authentication information, and aliases. Mail repositories store messages that have been delivered locally. Spool repositories store messages that are still being processed. Finally, news repositories are used to store news messages. Aside from what type of data they store, repositories are distinguished by where they store data. There are three types of storage - File, Database, and DBFile.

Remote Manager

James provides a simple telnet-based interface for control. Through this interface you can add and delete users, configure per-user aliases and forward addresses, and shut down the server.

Maillet API:

The Mailet API is a simple API used to build mail processing applications. James is a Mailet container, allowing administrators to deploy Mailers (both custom and pre-made) to carry out a variety of complex mail processing tasks. In the default configuration James uses Mailers to carry out a number of tasks that are carried out deep in the source code of other mail servers (i.e. list processing, remote and local delivery).

As it stands today, the Mailet API defines interfaces for both Matchers and Mailets.Matchers, as their name would suggest, match mail messages against certain conditions. They return some subset (possibly the entire set) of the original recipients of the message if there is a match. An inherent part of the Matcher contract is that a Matcher should not induce any changes in a message under evaluation.

Mailets are responsible for actually processing the message. They may alter the message in any fashion, or pass the message to an external API or component. This can include delivering a message to its destination repository or SMTP server.

The Mailet API is currently in its second revision. Although, the Mailet API is expected to undergo substantial changes in the near future, it is our aim that existing Mailets that abided purely by the prior Mailet API interfaces will continue to run with the revised specification.James bundles a number of Matchers and Mailets in its distribution.

1... INTRODUCTION

The objective of Simple Mail Transfer Protocol (SMTP) is to

transfer mail reliably and efficiently. SMTP is independent of the

particular transmission subsystem and requires only a reliable ordered

data stream channel.

An important feature of SMTP is its capability to relay mail

across transport service environments. A transport service provides an

interposes communication environment (IPCE). An IPCE may cover one

network, several networks, or a subset of a network. It is important to

realize that transport systems (or IPCEs) are not one-to-one with

networks. A process can communicate directly with another process

through any mutually known IPCE. Mail is an application or use of

interposes communication. Mail can be communicated between

processes in different IPCEs by relaying through a process connected

to two (or more) IPCEs. More specifically, mail can be relayed between

hosts on different transport systems by a host on both transport

systems.

2. THE SMTP MODEL

The SMTP design is based on the following model of communication: as the result of a user mail request, the sender-SMTP establishes a two-way transmission channel to a receiver-SMTP. The receiver-SMTP may be either the ultimate destination or an intermediate. SMTP commands are generated by the sender-SMTP and sent to the receiver-SMTP. SMTP replies are sent from the receiver-SMTP to the sender-SMTP in response to the commands.

Once the transmission channel is established, the SMTP-sender sends a MAIL command indicating the sender of the mail. If the SMTP-receiver can accept mail it responds with an OK reply. The SMTP-sender then sends a RCPT command identifying a recipient of the mail. If the

SMTP-receiver can accept mail for that recipient it responds with an OK reply; if not, it responds with a reply rejecting that recipient (but not the

Whole mail transaction). The SMTP-sender and SMTP-receiver may negotiate several recipients. When the recipients have been negotiated the SMTP-sender sends the mail data, terminating with a special sequence. If the SMTP-receiver successfully processes the mail data it responds with an OK reply. The dialog is purposely lock-step, one-at-a-time.

+----------+ +----------+ +------+ | | | | | User |<-->| | SMTP | | +------+ | Sender- |Commands/Replies| Receiver-| +------+ | SMTP |<-------------->| SMTP | +------+ | File |<-->| | and Mail | |<-->| File | |System| | | | | |System| +------+ +----------+ +----------+ +------+ Sender-SMTP Receiver-SMTP Model for SMTP Use Figure 1-------------------------------------------------------

The SMTP provides mechanisms for the transmission of mail;

directly from the sending user's host to the receiving user's host when the August 1982 Simple Mail Transfer Protocol two host are connected to the same transport service, or via one or more relay SMTP-servers when the source and destination hosts are not connected to the same transport service.

To be able to provide the relay capability the SMTP-server must be supplied with the name of the ultimate destination host as well as the destination mailbox name.

The argument to the MAIL command is a reverse-path, which specifies who the mail is from. The argument to the RCPT command is a forward-path, which specifies who the mail is to. The forward-path is a source route, while the reverse-path is a return route (which may be Used to return a message to the sender when an error occurs with a relayed message).

When the same message is sent to multiple recipients the SMTP encourages the transmission of only one copy of the data for all the recipients at the same destination host. The mail commands and replies have a rigid syntax. Replies also have a numeric code. Commands and replies are not case sensitive. That is, a command or reply word may be upper case, lower case, or any mixture of upper and lower case. Note that this is not true of mailbox user names. For some hosts the user name is case sensitive, and SMTP implementations must take case to preserve the case of user names as they appear in mailbox arguments. Host names are not case sensitive.

Commands and replies are composed of characters from the ASCII character set [1]. When the transport service provides an 8-bit byte (octet) transmission channel, each 7-bit character is transmitted right justified in an octet with the high order bit cleared to zero.

When specifying the general form of a command or reply, an argument or special symbol will be denoted by a meta-linguistic variable (or constant), for example,"<string>" or "<reverse-path>". Here the angle brackets indicate these are meta-linguistic variables.

However, some arguments use the angle brackets terally. For example, an actual reverse-path is enclosed in angle brackets, i.e.,"<[email protected]>" is an instance of <reverse-path> (the angle brackets are actually transmitted in the command or reply).

3 THE SMTP PROCEDURES

This section presents the procedures used in SMTP in several parts. First comes the basic mail procedure defined as a mail transaction. Following this are descriptions of forwarding mail, verifying mailbox names and expanding mailing lists, sending to terminals instead of or in combination with mailboxes, and the opening and closing exchanges. At the end of this section are comments on relaying, a note on mail domains, and a discussion of changing roles.

3.1 MAIL

There are three steps to SMTP mail transactions. The transaction is started with a MAIL command which gives the

sender identification. A series of one or more RCPT commands follows giving the receiver information. Then a DATA command gives the mail data. And finally, the end of mail data indicator confirms the transaction.

The first step in the procedure is the MAIL command. The <reverse-path> contains the source mailbox.

MAIL <SP> FROM :< reverse-path> <CRLF>

This command tells the SMTP-receiver that a new mail transaction is starting and to reset all its state tables and buffers, including any recipients or mail data. It gives the reverse-path which can be used to report errors. If accepted, the receiver-SMTP returns a 250 OK reply.

The <reverse-path> can contain more than just a mailbox. The <reverse-path> is a reverse source outing list of hosts and source mailbox. The first host in the <reverse-path> should be the host sending this command.

The second step in the procedure is the RCPT command. RCPT <SP> TO :< forward-path> <CRLF>

This command gives a forward-path identifying one recipient. If accepted, the receiver-SMTP returns a 250 OK reply, and stores the forward-path. If the recipient is unknown the receiver-SMTP returns a 550 Failure reply. This second step of the procedure can be repeated any number of times.

The <forward-path> can contain more than just a mailbox. The <forward-path> is a source routing list of hosts and the destination mailbox. The first host in the <forward-path> should be the host receiving this command.

The third step in the procedure is the DATA command.

DATA <CRLF>

If accepted, the receiver-SMTP returns a 354 Intermediate reply and considers all succeeding lines to be the message text. When the end of text is received and stored the SMTP-receiver sends a 250 OK reply. Since the mail data is sent on the transmission channel the end of the mail data must be indicated so that the command and reply dialog can be resumed. SMTP indicates the end of the mail data by sending a line containing

only a period. A transparency procedure is used to prevent this from interfering with the user's text.

Please note that the mail data includes the memo header items such as Date, Subject, To, Cc, from [2].The end of mail data indicator also confirms the mail transaction and tells the receiver-

SMTP to now process the stored recipients and mail data. If accepted, the receiver-SMTP returns a 250 OK reply. The DATA command should fail only if the mail transaction was incomplete (for example, no recipients), or if resources are not available.

The above procedure is an example of a mail transaction.

These commands must be used only in the order discussed above. Example 1 (below) illustrates the use of these commands in a mail transaction.

Example of the SMTP Procedure

This SMTP example shows mail sent by Smith at host

Alpha.ARPA, to Jones, Green, and Brown at host Beta.ARPA. Here we assume that host Alpha contacts host Beta directly.

S: MAIL FROM :< [email protected]> R: 250 OK S: RCPT TO :< [email protected]> R: 250 OK

S: RCPT TO :< [email protected]> R: 550 No such user here

S: RCPT TO :< [email protected]> R: 250 OK S: DATA R: 354 Start mail input; end with <CRLF>.<CRLF>

S: Blah blah blah... S: ...etc. etc. etc. S: <CRLF>.<CRLF> R: 250 OK

The mail has now been accepted for Jones and Brown. Green did not have a mailbox at host Beta in Example 1.

3.2. FORWARDING

There are some cases where the destination information in

the <forward-path> is incorrect, but the receiver-SMTP knows the correct destination. In such cases, one of the following replies should be used to allow the sender to contact the correct destination.

251 -User not local; will forward to <forward-path>

This reply indicates that the receiver-SMTP knows the user's mailbox is on another host and indicates the correct forward-path to use in the future. Note that either the host or user or both may be different. The receiver takes responsibility for delivering the message.

551 User not local; please try <forward-path>

This reply indicates that the receiver-SMTP knows the user's mailbox is on another host and indicates the correct forward-path to use. Note that either the host or user or both may be different. The receiver refuses to accept mail for this user, and the sender must either redirect the mail according to the information provided or return an error response to the originating user.

3.3 VERIFYING AND EXPANDING

SMTP provides as additional features, commands to verify a user name or expand a mailing list. This is done with the VRFY and EXPN commands, which have character string arguments. For the VRFY command, the string is a user name, and the response may include the full name of the user and must include the mailbox of the user. For the EXPN command, the string identifies a mailing list, and the multilane response may include the full name of the users and must give the mailboxes on the mailing list. "User name" is a fuzzy term and used purposely. If a host implements the VRFY or EXPN commands then at least local mailboxes must be recognized as "user names". If a host chooses to recognize other strings as "user names" that is allowed. In some hosts the distinction between a mailing list and an alias for a single mailbox is a bit fuzzy, since a common data structure may hold both types of entries, and it is possible to have mailing lists of one mailbox. If a request is made to verify a mailing list a positive response can be given if on receipt of a message so addressed it will be delivered to everyone on the list, otherwise an error should be reported (e.g., "550 That is a mailing list, not a

user"). If a request is made to expand a user name returning a list containing one name can form a positive response, or an error can be reported (e.g., "550 That is a user name, not a mailing list").

In the case of a multiline reply (normal for EXPN) exactly one mailbox is to be specified on each line of the reply. In the case of an ambiguous request, for example, "VRFY Smith", where there are two Smith's the response must be "553 User ambiguous". The case of verifying a user name is straightforward as shown in examp 3.

The character string arguments of the VRFY and EXPN commands cannot be further restricted due to the variety of implementations of the user name and mailbox list concepts. On some systems it may be

appropriate for the argument of the EXPN command to be a file name for a file containing a mailing list, but again there is a variety of file naming conventions in the Internet.

The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

3.4. SENDING AND MAILING

The main purpose of SMTP is to deliver messages to user's mailboxes. A very similar service provided by some hosts is to deliver messages to user's terminals (provided the user is active on the host). The delivery to the user's mailbox is called "mailing", the delivery to the user's terminal is called "sending". Because in many hosts the implementation of sending is nearly identical to the implementation of mailing these two functions are combined in SMTP. However the sending commands are not included in the required minimum implementation (Section 4.5.1). Users should have the ability to control the writing of messages on their terminals. Most hosts permit the users to accept or refuse such messages.

The following three commands are defined to support the sending options. These are used in the mail transaction instead of the MAIL command and inform the receiver-SMTP of the special semantics of this transaction:

SEND <SP> FROM: <reverse-path> <CRLF>

The SEND command requires that the mail data be delivered to the user's terminal. If the user is not active (or not accepting terminal messages) on the host a 450 reply may returned to a RCPT command.

http://www.cis.ohio-state.edu/cgi-bin/rfc/#sec-4.5.1

http://www.cis.ohio-state.edu/cgi-bin/rfc/#sec-4.5.1

The mail transaction is successful if the message is delivered the terminal.

SOML <SP> FROM: <reverse-path> <CRLF>

The Send or Mail command requires that the mail data be delivered to the user's terminal if the user is active (and accepting terminal messages) on the host. If the user is not active (or not accepting terminal messages) then the mail data is entered into the user's mailbox. The mail transaction is successful if the message is delivered either to the terminal or the mailbox.

SAML <SP> FROM: <reverse-path> <CRLF>

The Send and Mail command requires that the mail data be delivered to the user's terminal if the user is active (and accepting terminal messages) on the host. In any case the mail data is entered into the user's mailbox. The mail transaction is successful if the message is delivered the mailbox. The same reply codes that are used for the MAIL commands are used for these commands.

INTRODUCTION:

The JavaMail API provides a set of abstract classes defining objects that comprise a system. The API defines classes like Message, Store and Transport. The API can be extended and can be subclassed to provide new protocols and to add functionality when necessary. In addition, the API provides concrete subclasses of the abstract classes. These subclasses, including MimeMessage and MimeBodyPart, implement widely used Internet mail protocols and conform to specifications RFC822 and RFC2045. They are ready to be used in application development.

GOALS AND DESIGN PRINCIPLES:

The JavaMail API is designed to make adding electronic mail capability to simple applications easy, while also supporting the creation of sophisticated user interfaces.It includes appropriate convenience classes which encapsulate common mail functions and protocols. It fits with other packages for the Java platform in order to facilitate its use with other Java APIs, and it uses familiar programming models.

The JavaMail API is therefore designed to satisfy the following development and runtime requirements:

Simple, straightforward class design is easy for a developer to learn and implement.

Use of familiar concepts and programming models support code development that interfaces well with other Java APIs.

Uses familiar exception-handling and JDK 1.1 event-handling programming models.

Uses features from the JavaBeans Activation Framework (JAF) to handle access to data based on data-type and to facilitate the addition of data types and commands on those data types. The JavaMail API provides convenience functions to simplify these coding tasks.

Lightweight classes and interfaces make it easy to add basic mail-handling tasks to any application.

Supports the development of robust mail-enabled applications, that can handle a variety of complex mail message formats, data types, and access and transport protocols.

The JavaMail API draws heavily from IMAP, MAPI, CMC, c-client and other messaging system APIs: many of the concepts present in these other systems are also present in the JavaMail API. It is simpler to use because it uses features of the Java programming language not available to these other APIs, and because it uses the Java programming language’s object model to shelter applications from implementationcomplexity.

The JavaMail API supports many different messaging system implementations—different message stores, different message formats, and different message transports.The JavaMail API provides a set of base classes and interfaces that define the API for client applications. Many simple applications will only need to interact with the messaging system through these base classes and interfaces. JavaMail subclasses can expose additional messaging system features. For instance,the MimeMessage subclass exposes and implements common characteristics of an Internet mail message, as defined by RFC822 and MIME standards. Developers cansubclass JavaMail classes to provide the implementations of particular messaging systems, such as IMAP4, POP3, and SMTP.

ARCHITECTURE

The JavaMail architectural components are layered as shown below:

1. The Abstract Layer declares classes, interfaces and abstract methods intended to support mail handling functions that all mail systems support. API elements comprising the Abstract Layer are intended to be subclassed and extended as necessary in order to support standard data types, and to interface with message access and message transport protocols as necessary.

2. The internet implementation layer implements part of the abstract layer using internet standards - RFC822 and MIME.

3 JavaMail uses the JavaBeans Activation Framework (JAF) in order to encapsulate message data, and to handle commands intended to interact with that data. Interaction with message data should take place via JAF-aware JavaBeans, which are not provided by the JavaMail API.JavaMail clients use the JavaMail API and Service Providers implement the JavaMail API. The layered design architecture allows clients to use the same JavaMail API calls to send, receive and store a variety of messages using different data-types from different message stores and using different message transport protocols.

JAVA MAIL CLASS HIERARCHY:The figure below shows major classes and interfaces comprising the JavaMail API.

THE JAVA MAIL FRAME WORK:

The JavaMail API is intended to perform the following functions, which comprise the standard mail handling process for a typical client application:

MAJOR JAVA MAIL API COMPONENTS:

This section reviews major components comprising the JavaMail architecture.The Message Class The Message class is an abstract class that defines a set of attributes and a content for a mail message. Attributes of the Message class specify addressing information and define the structure of the content, including the content type. The content is represented as a DataHandler object that wraps around the actual data.The Message class implements the Part interface. The Part interface defines attributes that are required to define and format data content carried by a Message object, and to interface successfully to a mail system. The Message class adds From, To, Subject, Reply-To, and other attributes necessary for message routing via a message transport system. When contained in a folder, a Message object has a set of flags associated with it. JavaMail provides Message subclasses that support specific messaging implementations.

Message Storage and RetrievalMessages are stored in Folder objects. A Folder object can contain subfolders as well as messages, thus providing a tree-like folder hierarchy. The Folder class declares methods that fetch, append, copy and delete messages. A Folder object can also send events to components registered as event listeners.

Message Composition and Transport

A client creates a new message by instantiating an appropriate Message subclass. It sets attributes like the recipient addresses and the subject, and inserts the content into the Message object. Finally, it sends the Message by invoking the Transport.send method.

The Session Class

The Session class defines global and per-user mail-related properties that define the interface between a mail-enabled client and the network. JavaMail system components use the Session object to set and get specific properties. The Session class also provides a default authenticated session object that desktop applications can share. The Session class is a final concrete class. It cannot be subclassed.

Using the JavaMail API

This section defines the syntax and lists the order in which a client application calls some JavaMail methods in order to access and open a message located in a folder:

1. A JavaMail client typically begins a mail handling task by obtaining the default JavaMail Session object.Session session = Session.getDefaultInstance(props, authenticator);

2. The client uses the Session object’s getStore method to connect to the default store. The getStore method returns a Store object subclass that supports the access protocol defined in the user properties object, which will typically contain per-user preferences.Store store = session.getStore();store.connect();3. If the connection is successful, the client can list available folders in the Store, and then fetch and view specific Message objects.// get the INBOX folderFolder inbox = store.getFolder("INBOX");// open the INBOX folderinbox.open(Folder.READ_WRITE);Message m = inbox.getMessage(1); // get Message # 1String subject = m.getSubject(); // get SubjectObject content = m.getContent(); // get content... ……...4. Finally, the client closes all open folders, and then closes the store.inbox.close(); // Close the INBOXstore.close(); // Close the Store

DESIGN PRINCIPLES & METHODOLOGY

To produce the design for large module can be extremely complex task. The design principles are used to provide effective handling the complexity of the design process, it will not reduce to the effort needed for design but can also reduce the scope of introducing errors during design.

For solving the large problems, the problem is divided into smaller pieces, using the time-tested principle of “divide and conquer”. This system problem divides into smaller pieces, so that each piece can be conquered separately. For software design, the problem is to divide into manageable small pieces that can be solved separately. This divide principle is used to reduce the cost of the entire problem that means the cost of solving the entire problem is more than the sum of the cost of solving all the pieces.

When partitioning is high, then also arises a problem due to the cost of partitioning. In this situation to know the judgement about when to stop partitioning.

In design, the most important quality criteria are simplicity and understandability. In this each the part is easily related to the application and that each piece can be modified separately. Proper partitioning will make the system to maintain by making the designer to understand problem partitioning also aids design verification.

Abstraction is essential for problem partitioning and is used for existing components as well as components that are being designed, abstracting of existing component plays an important role in the maintenance phase. ding design process of the system.

In the functional abstraction, the main four modules to taking the details and computing for further actions. In data abstraction it provides some services.

The system is a collection of modules means components. The

highest-level component corresponds to the total system. For design

this system, first following the top-down approach to divide the

problem in modules. In top-down design methods often result in some

form of stepwise refinement after divide the main modules, the

bottom-up approach is allowed to designing the most basic or primitive

components to higher-level components. The bottom-up method

operations starting from very bottom.

In this system, the system is main module, because it consists of discrete components such that each component supports a well-defined abstraction and if a change to the component has minimal impact on other components. The modules are highly coupled and coupling is reduced in the system. Because the relationships among elements in different modules is minimized.

Design Objectives

These are some of the currently implemented features:

Complete portability Apache James is a 100% pure Java application based on the Java 2 platform and the Java Mail 1.3 API.

Protocol abstraction unlike other mail engines, protocols are seen only like "communication languages" ruling communications between clients and the server. Apache James is not be tied to any particular protocol but follow an abstracted server design (like Java Mail did on the client side)

Complete solution the mail system is able to handle both mail transport and storage in a single server application. Apache James works alone without the need for any other server or solution.

Mailet support Apache James supports the Apache Mailet API. A Mailet is a discrete piece of mail-processing logic which is incorporated into a Mailet-compliant mail-server's processing. This easy-to-write, easy-to-use pattern allows developers to build powerful customized mail systems. Examples of the services a Mailet might provide include: a mail-to-fax or mail-to-phone transformer, a filter, a language translator, a mailing list manager, etc. Several Mailets are included in the JAMES distribution.

Resource abstraction like protocols, resources are abstracted and, accessed through defined interfaces (Java Mail for transport, JDBC for spool storage or user accounts in RDBMS's, Apache Mailet API). The server is highly modular and reuses solutions from other projects.

Secure and multi-threaded design Based on the technology developed for the Apache JServ servlet engine, Apache James has a careful, security-oriented, full multi-threaded design, to allow performance, scalability and mission-critical use.

System design is the process of applying various techniques and principles for the purpose of definition a system in sufficient detail to permit its physical realization.

Software design is the kernel of the software engineering process. Once the software requirements have been analyzed and specified, the design is the first activity. The flow of information during this process is as follows.

Information domain details

Function specification

Behavioral specification

Other requirement modules Program

Procedural design

Software design is the process through which requirements are translated into a representation of software.

Design

Code

Test

Primary design is concerned with the transformation of requirements into data and software architecture.

Detailed design focuses on refinements to the architectural representations that lead to detailed data structure and algorithmic

representation for software. In the present project report only preliminary design is given more emphasis.

System design is the bridge between system & requirements analysis and system implementation. Some of the essential fundamental concepts involved in the design of as applications are

Abstraction Modularity Verification

Abstraction is used to construct solutions to problems without having to take account of the intricate details of the various component sub-programs. Abstraction allows system designer to make step-wise refinements by which attach stage of the design unnecessary details annunciate with representation or implementation may be hidden from the surrounding environment.

Modularity is concerned with decomposing of main module into well-defined, manageable units with well-defined interfaces among the units. This enhances design clarity, which in turn eases implementation, debugging, testing, and documentation maintaining of the software product. Modularity viewed in this senses vital tool in the construction of large software projects.

Verification is fundamental concept in software design. A design is verification. It can be demonstrated that the design will result in an implementation, which satisfied the customer’s requirements.

Some of the important factors of quality that are to be considered in the design of application are:

The software should behave strictly according to the original specification of satisfying customer’s requirements and should function smoothly under normal and possible abnormal conditions. This product is highly reliable, can handle any number of mails to filter.

The design of the system must be such a way that any new additions to the information functional and behavioral domain may be done easily and should be adapted to new specifications. We provided this extensibility to this product. you can add any number of filters to your product in the future.

System design is the process of developing specification for the candidate system that meets the criteria established during the phase of system analysis. Major step in the design is the preparation of input forms and design of output reports in a form acceptable to the user. These steps in turn lead to a successful implementation of the system.

In this project we focus on Privacy-Aware Collaborative Spam Filtering document, which is a part of our James Server. We configure our logic in that place to work. Actually Privacy-Aware Collaborative Spam Filtering document is the main key to implement our filters. First It considers our filters and then based on the logic in those filters it takes the decision to drop the messages or not. Following is the design document:

Privacy-Aware Collaborative Spam Filtering document: Privacy-Aware Collaborative Spam Filtering document is an application to download your email through protocols like POP3 and IMAP. It also allows you to retrive your news messages through NNTP. In addition to the simple feature of downloading mail, Mail Fetch has the concept of mail filters. A filter has the single job of deciding whether or not to download a single message. The actual decision of whether to download a mail or not is made through a sequence of filters. There can be a global set of filters as well as a per maildrop one. A maildrop represents your mailbox from which you want to download your mail.

Privacy-Aware Collaborative Spam Filtering document is written in the Java Programming language and has an extensible XML based configuration. Privacy-Aware Collaborative Spam Filtering document is very easy to configure. All that has to be done is edit the plain text configuration file. I have been written a fair amount of documentation, so that should help.

Privacy-Aware Collaborative Spam Filtering document can process multiple maildrops with individual filter mechanisms and poll times.

http://www.execve.net/jfetch/configure.html

Features:

Following are the list of features provided by Privacy-Aware Collaborative Spam Filtering document:

POP3 and IMAP Protocol Support Can handle any number of Maildrops Polling mechanism to periodically check maildrops for new

messages Filtering system for downloading mail Standard filters provided like Size, Message-id, Sender Easy pluggability of user defined filters Runs on all platforms supported by Java2 Configurable logging mechanism to keep track of mails

downloaded Multiple delivery options provided - like Mailbox and SMTP

Delivery Delivery options accessible at the filter level Experimental NNTP Support

Modules:

Core Module: This module helps in interacting with the XML and reads the required information. After the reading the information it can interact with the specified mail boxes as you require and download the mails. It also co-ordinates other modules.

Filtering Module: This module deals applying the filter on the specified mail boxes and unwanted mails. It follows sequence of applying the filter as we have specified in the XML file. It also allows applying the filters globally and locally according the employee customization.

Delivery Agents Module: This module deals with sending the remaining mails in the specified mail boxes to delivery agent, Each delivery agent the backup copy of the mails at a targeted location.

Configuring and Extending Privacy-Aware Collaborative Spam Filtering document:

Privacy-Aware Collaborative Spam Filtering document uses XML for configuration. The configuration file is MailFetch.xml. This file exists in conf directory

The configuration file is accompanied by a detailed document instructing one on how to configure Privacy-Aware Collaborative Spam

Filtering document. I would recommend referring to that document whenever you have some problem following what I’m saying. This document is called Configuration.txt.

Essentially, there are maildrops to download mail from - they contain all the information about accessing a maildrop. There is a global sequence of filters, which are checked for each maildrop before the maildrop-local sequence of filters. Filters can be configured through the configuration file. For example a size-based filter would like to know what size it should filter at and also what action it should take when a message is of a greater size. Each of the filters themselves may have some additional configuration options. The additional configuration is totally dependent on the filter itself. You could add your own filter and want to be configured from the configuration file. I shall expand on that later in this section.

There is also the option of delivery agents. After a message passes through all the filters and none of them have an objection with it being downloaded, it is downloaded and sent to Delivery Agent who is responsible to delivering it (to a mailbox, maildir, SMTP host etc). Mail Fetch supports different kinds of delivery agents and you can choose one of them for delivery of your mail. You can go so far as to make each of your maildrops deliver messages to a different delivery agent! A Maildrop itself needs to specify its delivery agent when all the filters let the message pass through. Each delivery agent has an id - it can thereafter be referred to by its id. Some filters support delivering messages to a delivery agent specified in their configuration. For example, all messages from the [email protected] would go to the execve mailbox if the SenderMailFilter is configured.

You can implement your own filters by implementing certain interfaces, a user can very easily add his/her own filter to the current set of provided filters. Examples of filters are spam control, size restrictions etc. Mail Fetch downloads the email if it matches the criteria and then can deliver it using one of its delivery options. Currently, one can choose to deliver mail to a mailbox or to an SMTP Server

You will need to specify the name of the class you have implemented in the configuration, so that Privacy-Aware Collaborative Spam Filtering document can initialize it as required. Note that the class has to be in the system classpath. This can be easily achieved by putting the class in a jar and putting it in the lib directory. The script picks up all the jars from the directory and places them in the classpath before invoking Privacy-Aware Collaborative Spam Filtering document. All the delivery agents specified in the configuration are available to the filters through a Privacy-Aware Collaborative Spam Filtering document. delivery. DeliveryManager object. This object allows access to these agents based on their ids. NOTE that the id of the agent has to known by the filter requesting for the agent. A

Delivery Event is generated when a message is delivered after passing through all the filters. NOTE that there is no event generated when a filter itself delivers a message through an agent. The easiest way to get a hang of how to implement the filter of your choice is to get a hold of the source and checkout some of the implemented filters (like NullMailFilter!!)

That’s about it in terms of Privacy-Aware Collaborative Spam Filtering document configuration. Go on, open the conf/JFetch.xml file in the Mail directory and play with it. Do let me know of any problems you face; let me know even if you don’t.

Table of Contents===========

1. Introduction2. Some Definitions 2.1 Maildrops 2.2 Filters 2.3 DeliveryAgents 2.4 Events3. Detailed Configuration 3.1 Maildrop 3.2 Mailfilters 3.2.1 Global filters 3.2.2 Local filters 3.2.3 All filters explained 3.3 Delivery Agents 3.4 Miscellaneous Configuration4. Sample configuration file5. Advanced usage

1. Introduction:

Privacy-Aware Collaborative Spam Filtering document is an application to access your remote email. It supports popular mail protocols like POP3 and IMAP. You can download your mail to your local machine and use an email client to read it. Mail Fetch also comes with a very powerful and flexible filtering system. In fact, Privacy-Aware Collaborative Spam Filtering document comes with a range of filters out of the box; so you can get started immediately. These filters range from those, which prevent you from getting the same message twice to those which help in spam filtering.

Privacy-Aware Collaborative Spam Filtering document is written in Java and so has the advantage of running on most platforms. Configuration is text-based and is an XML file. This document tells you how to configure Privacy-Aware Collaborative Spam Filtering

http://www.execve.net/jfetch/download.html

document. It details out the various configuration options available and also provides a sample configuration file.

2. Some Definitions:2.1 Maildrops

A Maildrop is the mailbox from where you download your mail. Characteristics of a maildrop are the protocol (POP3, IMAP, NNTP), the username, password, hostname, port number, the default Delivery Agent for that maildrop, any filters for that maildrop and finally any protocol-specific configuration for the maildrop. For example an NNTP maildrop would contain newsgroup information, which is not used by a POP3 or IMAP maildrop.

2.2 Filters

A Filter is the core of the decision making in Privacy-Aware

Collaborative Spam Filtering document. A filter decides on a per-mail basis whether the message should be downloaded or not. A pipeline of filters is setup (yes, again setup in the configuration) and a message which needs to be downloaded is passed through this pipeline. At any point of the pipeline, a filter could indicate that the message should not be processed through the pipeline anymore. For example a SPAM filter (sender based) could find a match from the list of spammers it has and reject the message.

There are two kinds of filters -- global and local. These are not an attribute of a filter itself, but rather depend on the usage of a filter. Local filters are associated to a maildrop whereas global filters are applicable to all maildrops. For example, you might want a Message-ID filter to be applicable to all maildrops whereas keep a sender-based filter only for the maildrop where you expect mail from that sender.

2.3 DeliveryAgents

A Delivery Agent has the responsibility of delivering mail. The current supported mediums are SMTP and mailbox. Delivery Agents are identified by a unique ids in the configuration. Maildrops have a default Delivery Agent configured which is used if the message passes through the Filter pipeline successfully. Some filters also accept a Delivery Agent attribute in the configuration. What this implies is that if the message matches the Filter's criteria, the Filter delivers the message using this Delivery Agent. This also allows for simple filtering mechanisms. For example you might want all likely SPAM to be delivered to special mailbox where you can then later check for any false positives.

2.4 Events

Event is an internal concept of Privacy-Aware Collaborative Spam Filtering document. If you are only going to use Privacy-Aware Collaborative Spam Filtering document and the filters it provides out of the box, you don't need to understand this concept. If you are extending Privacy-Aware Collaborative Spam Filtering document by developing your own Filters, you will need to understand this concept. Whether you actually use it, depends on the Privacy-Aware Collaborative Spam Filtering document functionality itself.

Events are a mechanism by which a Privacy-Aware Collaborative Spam Filtering document can be notified when something interesting happens in Privacy-Aware Collaborative Spam Filtering document. Currently, we only generate events for the delivery of a message. Let us take the example of the MessageIDMailFilter. This filter rejects messages with message-ids which have already been downloaded. This avoids receiving duplicate messages for example when you are subscribed to two mailing lists and a cross-posting happens. It maintains a list of message-ids which we have already downloaded. The list is saved on the disk after the download of every message so that if the session is interrupted due to any reason, the message is not re-downloaded. So, the Privacy-Aware Collaborative Spam Filtering document implements a DeliveryLister and hence, gets the delivery event.

3. Detailed Configuration:

3.1 MaildropYou can have more than one maildrops for Privacy-Aware Collaborative Spam Filtering document to download mail from. Privacy-Aware Collaborative Spam Filtering document downloads mail for them in the order in which they are configured. Here is a sample maildrop configuration:

<maildrop protocol="pop3" mda="smtp"> <host>mail.somepopserver.com</host> <port>110</port> <user>myusername</user> <password>mypass</password> <delete>true</delete>

 <filters> </filters> </maildrop>

The protocol attribute can be one of pop3, imap or nntp (EXPERIMENTAL).

The mda attribute specifies the default delivery agent when the

message is ready to be downloaded. See Delivery Agents for more

information. This requires a delivery agent called "smtp" to be

configured.

Host, port, user and password are attributes for the connection and authentication. Setting delete to true makes delete messages from the maildrop once they are downloaded.

The filters configured *inside* the Maildrop element are the local maildrop filters and will not affect other maildrops. Some Maildrops like NNTP, have some extra configuration parameters like the newsgroups which have to be downloaded. Please note that NNTP support is EXPERIMENTAL and is not yet stable.

3.2 Mailfilters:

3.3 Global filtersAll filters which are configured outside the maildrop elements

arecalled global filters.These filters affect all the maildrops.The configuration is the same for both global and local filters.

Here is a sample global filters configuration: <filters>

 <filter class="MailFetch.filters.SizeMailFilter" max-size="1548576" delete="false"> </filter>

 <filter class="MailFetch.filters.SenderMailFilter" delete="true" blocklist="/home/gautam/MailFetch/spool/blocklist" mda="junk"> </filter>

 <filter class="MailFetch.filters.MessageIDMailFilter" delete="true"> <storage name="msgid.cache" limit="8192" destination="spool/msgid.cache"/> </filter>

 <filter class="MailFetch.filters.SubjectMailFilter" delete="true"

blocklist="/home/gautam/MailFetch/spool/subject.blocklist" mda="junk">

</filter>

</filters>

See "All filters explained" for detailed explanation of all provided filters.

3.4 Local filters All filters which are configured inside the maildrop elements

are called local filters. These filters affect only the maildrop associated with them. The configuration is the same for both global and local filters.

Here is a sample local filters configuration:

<maildrop protocol="pop3" mda="smtp">   <filters>  <filter class="MailFetch.filters.SenderMailFilter" delete="true" blocklist="/home/gautam/MailFetch/spool/linuxlist" mda="linux"> </filter> </filters> </maildrop>

3.5 All filters explained

FILTER NAME : HeaderMailFilterDESCRIPTION: Matches a header in the message. This requires the name of the header and the value of the headerCLASS NAME : MailFetch.filters.HeaderMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.HeaderMailFilter" delete="true" name="X-Spam-Rating" value="SPAM" mda="spambox" > </filter>

EXPLANATION: This filter allows to filter messages based on the value of a particular header. The mda attribute is optional and allows you to direct the message to the delivery agent specified if the message matches the criteria.

FILTER NAME : MessageIDMailFilterDESCRIPTION : Filters messages if they contain a duplicate Message- id. This Filter stores the list of downloaded message- ids in the specified file.

CLASS NAEM : MailFetch.filters.MessageIDMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.MessageIDMailFilter" delete="true"> <storage name="msgid.cache" limit="8192" destination="spool/msgid.cache"/> </filter>

EXPLANATION: The name of the storage element is a friendly name of the repository. limit specifies the maximum number of elements to allow in the list. The destination attribute is the actual file in which the list is stored.

FILTER NAME :Null FiltersDESCRIPTION :his filter consumes all messages. It also marks them

for deletion.ClASS NAME :MailFetch.filters.NullMailFilterSAMPLE CONGIGURATION:

<filter class="MailFetch.filters.NullMailFilter" />

EXPLANATION :This filter is a special filter; it could be used to clean up the maildrop for example. It is also a DANGEROUS filter, you have been warned.

FILTER NAME :RecipientMailFilterDESCRIPTION :This filter matches the recipients of the message

against those provided in a list.CLASS NAME : MailFetch.filters.RecipientMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.RecipientMailFilter" delete="true" blocklist="/home/gautam/MailFetch/spool/pers" mda="personal"> </filter> EXPLANATION: This filter checks if the recipients of the message (TO and CC) exist in the defined list. The mda attribute is optional. blocklist is the file containing the recipient addresses (one on each line).

FILTER NAME : SenderMailFilterDESCRIPTION : This filter matches the sender of the message against those provided in a list.CLASS NAME : MailFetch.filters.SenderMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SenderMailFilter" delete="true" blocklist="/home/gautam/MailFetch/spool/block" mda="junk"> </filter>

EXPLANATION : This filter checks if the sender of the message exist in the defined list. The mda attribute is optional. blocklist is the file containing the sender addresses (one on each line).

FILTER NAME : SizeMailFilterDESCRIPTION: This filters messages based on their size.CLASS NAME : MailFetch.filters.SizeMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SizeMailFilter" max-size="1548576" delete="false"> </filter>

EXPLANATION: max-size is the maximum size of the message which is permitted to be downloaded. The size is in bytes. A max-size of 0 indicates that the size restriction is lifted.

FILTER NAME : SubjectMailFilterDESCRIPTION : This filter does subject based filtering based on a listCLASS NAME : MailFetch.filters.SubjectMailFilterSAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SubjectMailFilter" delete="true" blocklist="spool/virus_list" mda="possible.virus"> </filter> EXPLANATION: This filter is again similar to the sender/recipient filters except that it does filtering based on the subject of the message. The mda attribute is optional.

3.3 Delivery Agents

After passing through the filter pipeline, mail is delivered using a DeliveryAgent. Currently we provide two main delivery mechanisms: mbox and smtp. SMTP is the most reliable mechanism although it requires that you have an MTA configured for delivery.

DELIVERY AGENT NAME: MailboxESCRIPTION : Delivers a message to the specified mbox.CLASS NAME : MailFetch.delivery.MailboxDeliveryAgentSAMPLE CONFIGURATION:

<mda class="MailFetch.delivery.MailboxDeliveryAgent" id="junk"> <destination>/home/gautam/Mail/junkmail</destination> </mda>

EXPLANATION: The destination element identifies the location of the box where the delivery is made. Some basic dot-locking functionality is provided by the mbox provider to avoid multiple ccess to the mbox.

DELIVERY AGENT NAME : SMTPDESCRIPTION : Deliver the message to a configured SMTP hostCLASS NAMES : MailFetch.filters.SMTPDeliveryAgentSAMPLE CONFIGURATION :

<mda class="MailFetch.delivery.SMTPDeliveryAgent" id="smtp"> <host>localhost.localdomain</host> <port>25</port> <localuser>gautam</localuser> <domain>localhost</domain> <user></user> <password></password> </mda>

EXPLANATION: The localuser element defines who the email is directed to. The domain is the domain of the local user. In this case, the email is dispatched to gautam@localhost. user and password are used if your server requires SMTP Authentication.

DELIVERY AGENT NAEM : NULLDESCRIPTION : A Null Delivery Agent does nothing. So basically equivalent to dumping into /dev/null.CLASS NAMES : MailFetch.filters.NullDeliveryAgentSAMPLE CONFIGURATION : None

<mda class="MailFetch.delivery.SMTPDeliveryAgent" id="smtp"> </mda>

EXPLANATION :There is no configuration for this delivery Agent. Please use with care, as you could very easily lose all your mails due to a misconfiguration.

3.4 Miscellaneous Configuration

Polling: Polling time is the time Mail Fetch waits between mail downloading sessions. For example

<poll>120</poll>

Specifies the polling time as 120 seconds (2 minutes). A non-positive polling time indicates that Mail Fetch should just run through the maildrop list and download messages once.

Logging: I would recommend turning logging on as it gives you a very good idea as to what is happening in the system. All exceptions are logged, so nothing would escape your eye. Mail Fetch does a light-medium logging in the DEBUG state.

<log target="logs/MailFetch.log" priority="DEBUG" enabled="true" />

The target attribute specifies the file where Privacy-Aware Collaborative Spam Filtering document should log all its data. The priority attribute specifies the logging priority. Priorities of logging are DEBUG, INFO, WARN, ERROR, FATAL_ERROR. The enabled attribute is optional and is treated as true by default.

4. Sample configuration file A sample configuration file is included along with the Privacy-Aware Collaborative Spam Filtering document distribution. You will need to customize the configuration file according to your needs and requirements. Refer to this document to configure the file. Below is a small configuration file to give you some idea as to how to go about modifying the configuration.

<MailFetch>  <poll>600</poll>

 <log target="logs/MailFetch.log" priority="DEBUG" enabled="true" />

  <mda class="MailFetch.delivery.SMTPDeliveryAgent" id="smtp"> <host>localhost</host> <port>25</port> <localuser>gautam</localuser> <domain>localhost</domain> <user></user> <password></password> </mda>

<mda class="MailFetch.delivery.MailboxDeliveryAgent" id="spam"> <destination>/home/gautam/Mail/spam</destination> </mda>

 <mda class="MailFetch.delivery.MailboxDeliveryAgent" id="pers"> <destination>/home/gautam/Mail/personal</destination> </mda>

 <filters>

 <filter class="MailFetch.filters.SizeMailFilter" max-size="102400" delete="true"> </filter>

 <filter class="MailFetch.filters.SenderMailFilter" delete="true" blocklist="/home/gautam/MailFetch/conf/blocklist" mda="spam"> </filter>

</filters>

 <maildrop protocol="pop3" mda="smtp"> <host>mail.somepopserver.com</host> <port>110</port> <user>myusername</user> <password>mypass</password> <delete>true</delete>

 <filters> <filter class="MailFetch.filters.SenderMailFilter" delete="true" blocklist="/home/gautam/MailFetch/conf/friendlist" mda="pers"> </filter> </filters> </maildrop> </MailFetch>

The above section is just a sample configuration file. You will need to customize your configuration depending on what kind of filtering meets your requirements.

5. Advanced usage

In case you find that you need some customized filtering, you may want to write your own Filters. The easiest way to understand how to do this is to look at the filters which are available in the Privacy-Aware Collaborative Spam Filtering document distribution. Good filters to start with are NullMailFilter, SizeMailFilter, SubjectMailFilter and MessageIDMailFilter. That should cover most common uses. Once you have written your Filter, you need to include it in the filter configuration. In addition, Privacy-Aware Collaborative Spam Filtering document requires it to be in the system classpath to be able to load it. It can simply be achieved by putting the relevant classes in a jar and putting it in the lib directory. The run scripts loadup all the jars in the classpath.

Privacy-Aware Collaborative Spam Filtering document is an application to download your email through protocols like POP3 and IMAP. The decision of whether to download a mail or not is made through a sequence of filters. By implementing certain interfaces, a user can very easily add his/her own filter to the current set of provided filters. Examples of filters are spam control, size restrictions etc. Privacy-Aware Collaborative Spam Filtering document downloads the email if it matches the criteria and then can deliver it using one of its delivery options. One can choose to deliver mail to a mailbox or to an SMTP Server.

Privacy-Aware Collaborative Spam Filtering document can process multiple maildrops with individual filter mechanisms and delivery options. Privacy-Aware Collaborative Spam Filtering document is written in the Java Programming language and has anextensible XML based configuration.

Configuration

TO SET UP Privacy-Aware Collaborative Spam Filtering document FOLLOW THE FOLLOWING STEPS:

If you have the source, compile using the build. bat batch file

* Now, enter the dist directory and edit the conf/JFetch.xml file. This is the configuration file for Privacy-Aware Collaborative Spam Filtering document. Refer to the Configuration.txt file in the docs directory for a detailed description of the configuration file.

* Now you can run Privacy-Aware Collaborative Spam Filtering document by executing the run. bat file in the dist directory.

Input design is the process of converting user-originated information to computer-based format. The goal of designing input data is to make data entry as easier and error free as possible. An input format should be easy to understand.

In this product inputs are nothing but messages i.e. mails. Every mail has some properties like sender, subline, body, message-id and so on. By taking these inputs automatically from the message, which are inside the mailbox, we do the process to decide whether to drop the message or not. The output design relays on input, which is used to the output. Hence input design needs some special attention.

Output reflects image of the organization. The output design involves designing forms layout, making lists, making well designed reports etc., and reports are main outputs of the proposed system. Here the outputs are : LOG FILES, which record every thing handle by the server relevant to this project including error messages.

Databases and database management systems and explores how to use relationships in a pool of data when developing methods for data storage and retrieval. Databases allow data to be shared among different applications. Database in not used in this product. we simply record the details of how a particular transaction is handled by the server in some log files. We store those log files in permanent disk at specified location.

UML Diagrams

Screens

Testing

Testing is one of the most important phases in the software development activity. In software development life cycle (SDLC), the main aim of testing process is the quality; the developed software is tested against attaining the required functionality and performance.

During the testing process the software is worked with some particular test cases and the output of the test cases are analyzed whether the software is working according to the expectations or not.

The success of the testing process in determining the errors is mostly depends upon the test case criteria, for testing any software we need to have a description of the expected behaviour of the system and method of determining whether the observed behaviour confirmed to the expected behaviour.

Since the errors in the software can be injured at any stage. So, we have to carry out the testing process at

different levels during the development. The basic levels of testing are Unit, Integration, System and Acceptance Testing.

The Unit Testing is carried out on coding. Here different modules are tested against the specifications produced during design for the modules. In case of integration testing different tested modules are combined into sub systems and tested in case of the system testing the full software is tested and in the next level of testing the system is tested with user requirement document prepared during SRS.

There are two basic approaches for testing. They are

In Functional Testing test cases are decided solely on the basis of requirements of the program or module and the internals of the program or modules are not considered for selection of test cases. This is also called Black Box Testing

In Structural Testing test cases are generated on actual code of the program or module to be tested. This is called White Box Testing.

A number of activities must be performed for testing software. Testing starts with test plan. Test plan identifies all testing related activities that need to be performed along with the schedule and guide lines for testing. The plan also specifies the levels of testing that need to be done, by identifying the different testing units. For each unit specified in the plan first the test cases and reports are produced. These reports are analyzed.

Test plan is a general document for entire project, which defines the scope, approach to be taken and the personal responsible for different activities of testing. The inputs for forming test plane are

Project planRequirements documentSystem design

Although there is one test plan for entire project test cases have to be specified separately for each test case. Test case specification gives for each item to be tested. All test cases and outputs expected for those test cases.

The steps to be performed for executing the test cases are specified in separate document called test procedure specification. This document specify any specify requirements that exist for setting the test environment and describes the methods and formats for reporting the results of testing.

Unit testing mainly focused first in the smallest and low level modules, proceeding one at a time. Bottom-up testing was performed on each module. As developing a driver program, that tests modules by developed or used. But for the purpose of testing, modules themselves were used as stubs, to print verification of the actions performed. After the lower level modules were tested, the modules that in the next higher level those make use of the lower modules were tested.

Each module was tested against required functionally and test cases were developed to test the boundary values.

Integration testing is a systematic technique for constructing the program structure, while at the same time conducting tests to uncover errors associated with interfacing. As the system consists of the number of modules the interface to be tested were between the edges of the two modules. The software tested under this was incremental bottom-up approach.

Bottom-up approach integration strategy was implemented with the following steps. Low level modules were combined into clusters that perform

specific software sub functions. The clusters were then tested.

System testing is a series of different tests whose primary purpose is to fully exercise the computer-based system. It also tests to find discrepancies between the system and its original objective, current specifications.

Privacy-Aware Collaborative Spam Filtering documentSystem Test Cases & System Test Report

The system test cases mentioned below are expected to work and give the expected behaviour if the explorer is configured to run jar files as mentioned in the project folder. The necessary library files and

standard jar files are in the appropriate project directories and the path and classpath environment variables are appropriately set.

TesC.No

.INPUT EXPECTED BEHAVIOUR

Observed behaviour

StatusP =

PassedF = Failed

1

Send a Mail with size less than whatwe specify in .xmland apply size filter

The mail should reach the destination withoutany hurdles

-do- P

2

Send a Mail with size more than whatwe specify in .xmland apply size filter

The mail should not beReached to destinationJust becoz of size mailfilter has to deleteIt.

-do- P

3Check the log fileFor above two mails

It should contain info about mail sizes and what mail is deleted

-do- P

4

Add one more mailDrop in xml file byAdding one more maildrop tag

Our application shouldInteract with the specified mailboxes andDownload all the mailsfrom them

-do- P

Add subject filter Our application should the mails

5in xml file by adding one more filter tagin filters tag i.e globalfilters area.

which are having the subject wordsWhat we specify in subject blocklist file

-do-

P

6

Add subject filter in xml file by adding one more filter tagin filters tag i.e globalfilters area.

Our application should the mails which are a senders what we specify in sender blocklist file

-do- P

7

Add null filter in xml file by adding one more filter tagin filters tag i.e globalfilters area.

Our application should delete all the mailsIrrespective of the criteria.

-do- P

8 Add Header filter in xml file by adding one more filter tagin filters tag i.e globalfilters area.

Our application should the mails which are a header name is equal to header value what we specify in xml file

-do-

P

9

Add SMTP Delivery agent in xml and giveThat id in the maildrop tag

Each and every copy of non deleted mailsShould send another copy to some other userWhat we specify in SMTPDelivery agent

-do-

P

10

Add MailBox Delivery agent in xml and giveThat id in the maildrop tag

Each and every copy of non deleted mailsShould send another copy to a directory

-do- P

Configuring Filters

It is the duty of the Administrator to configure the filters. For this purpose First place the our Jfetch directory in a Mail server administrator required. After that you can find an XML file in a sub director named “conf”. That file is easily readable by this administrator can change the corresponding values to configure to his chosen Mailserver. you can see the main part of that file below:

<maildrop protocol="pop3" mda="ld"> <host>localhost</host> <port>110</port> <user>stud2</user> <password>pass2</password> <delete>false</delete>

 <filters> </filters> </maildrop>

here you can observe we configure it to James server which is running on POP3 protocol and which is placed in our local system at port number 110. These filters are applied only on stud2 maildrop or mailbox.

After configuration completed administrator have to create mailboxes for company personnel in a Mail server using Telnet Tool and configure those mailboxes to your local Mail client relevant to this configuration we did it before. Open MailClient used by you and follow the instruction given by that MailClient to configure those earlier created mailboxes in Mailserver. At one time it is asking for to specify incoming mail server and outgoing mail server then you have to specify the IP-address of server in that you configured your filters earlier. In case of MS-Outlook Express screen seems to be like this –

After that your Local mail client creates a new accounts for you specified mailboxes. Thus you can access those mail boxed from your local mailclient and can organize those mailboxes as you like. A part from this configuration your installed filters worked on all the

mailboxes you specified in above configuration file here names as conf.xml .

Privacy-Aware Collaborative Spam Filtering document is a tool, lot of

efforts were put to make it filter perfectly and efficiently. The

developed system is tested with real data and the users are satisfied

with the performance of the system and reports.

This project is developed using JAVA MAIL API, one of the J2EE

technologies, with the help of XML language. By using this tool we can

drop the unwanted mails or messages automatically by specify our

restrictions in corresponding files. By this lot of work load will be

reduced to the administrator and also a copy of deleted message can

be directed to specified location which is for verifications. This tool is

very useful for administrating department our company It provides

extendibility also. So you can add your own filters in future very simply

without disturbing the existing code. This tool reduces the manual

work. Time as well as manpower saved. The time for processing and

producing reports is considerably reduced. All the features are

implemented and developed as per the requirements.

Basic Java Concepts : Thinking in JAVA

( Bruce Eckel )

Java Mail API : Wrox Publications Volume I and II

An Integrated Approach to

Software Engineering : Pankaj Jalote

Introduction to System

Analysis and Design : I.T.Hawryszkiewycz

For UML diagrams : UML in 24 Hours Book

Some preferred websites : www.bruceeckel.com

www.sun.com/j2ee/mailapi

www.sun.com/j2se

http://www.sun.com/j2se

http://www.sun.com/j2ee/mailapi

http://www.bruceeckel.com/

Privacy-Aware Collaborative Spam Filtering document

Documents

Transcript of Privacy-Aware Collaborative Spam Filtering document