7/29/2019 combining documentation.docx
1/56
INTRODUCTION
7/29/2019 combining documentation.docx
2/56
INTRODUCTIONSYNOPSIS
7/29/2019 combining documentation.docx
3/56
PROJECT DESCRIPTION
Project description:
Module Names:1. Search(user)
2. Query result page
3. Novel data extraction & alignment
4. Nested structure algorithm
5. Accurate data extraction.
Module Description:
1. Search(user)The web users gather information from web databases using keyword queries.
Web databases extract number of pages called as output page to the querying users. Many
usersmay wants to collect useful informations from number of extracting pages. All the
extracting pages could not contain exact informations and also it will contains some
auxiliary or un relevant informations. And also may not disclose or store all accurat e
informations in a single page.
2. Query result page :
Web databases generate query result pages based on a users query. Automatically
extracting the data from these query result pages is very important for many applications,
such as data integration, which need to cooperate with multiple web databases. Query
result pages datas are may be accurate or auxiliary but they are most important for users
to separate the informations is a difficult task.
3.Novel data extraction & alignment :
In general, a query result page contains not only the actual data, but also other
information, such as navigational panels, advertisements, comments, information about
hosting sites, and so on. The goal of web database data extraction is to remove any
7/29/2019 combining documentation.docx
4/56
irrelevant information from the query result page, extract the query result records (QRRs)
from the page, and align the extracted QRRs into a table such that the data values
belonging to the same attribute are placed into the same table column.
We employ the following two-step method, called Combining Tag and Value Similarity
(CTVS), to extract the QRRs from a query result page p.
1. Record extraction identifies the QRRs in p and involves data region identification and
the actual segmentation step.
2. Record alignment aligns the data values of the QRRs in p into a table so that data
values for the same attribute are aligned into the same table column.
4. Nested structure algorithm:
The proposed nested structure algorithm is mainly used for extracting more than
one relevant data (or) information from the query result pages. When the user querying
result page contains extra informations or relevant informations for the searching
keyword then it will also store the value similarity in same tag within the same column.
5. Accurate data extraction:
Our proposed techniques gives accurate data informations in the situation of
QRRs are not contiguous, which may be due to the presence of auxiliary information,
such as a comment, recommendation or advertisement, and for handling any nested
structure that may exist in the QRRs. We also design a new record alignment algorithm
that aligns the attributes in a record, first pair wise and then holistically, by combining the
tag and data value similarity information. Our results show that CTVS achieves high
precision and outperforms existing data extraction methods by producing accurate results.
7/29/2019 combining documentation.docx
5/56
SYSTEM ANALYSIS
7/29/2019 combining documentation.docx
6/56
7/29/2019 combining documentation.docx
7/56
Existing System:
Online databases, called web databases, comprise the deep web. Compared with
webpages in the surface web, which can be accessed by a unique URL, pages in the deep web
are dynamically generated in response to a user query submitted through the query interface of a
web database. Upon receiving a users query, a web database returns the relevant data, either
structured or semi structured, encoded in HTML pages. Many web applications, such as meta
querying, data integration and comparison shopping, need the data from multiple web databases.
For these applications automatic data extraction is necessary.
Only when the data are extracted and organized in a structured manner, such as tables,
can they be compared and aggregated. Hence, accurate data extraction is vital for these
applications to perform correctly. The problem of automatically extracting data records that are
encoded in the query result pages generated by web databases. State-of-the-art data extraction
methods are not produce expected results.
Disadvantages:
Also Extract auxiliary/advertisements in results
Data region cannot identify.
Nested information could not store in results.
employ wrapper induction can perform poorly when the format of a query result page
changes
7/29/2019 combining documentation.docx
8/56
Proposed System:
This paper focuses on the problem of automatically extracting data records that are
encoded in the query result pages generated by web databases. In general, a query result page
contains not only the actual data, but also other information, such as navigational panels,
advertisements, comments, information about hosting sites, and so on. The goal of web database
data extraction is to remove any irrelevant information from the query result page, extract the
query result records (referred to as QRRs in this paper) from the page, and align the extracted
QRRs into a table such that the data values1 belonging to the same attribute are placed into the
same table column.
We employ the following two-step method, called Combining Tag and Value Similarity (CTVS),
to extract the QRRs from a query result page p.
1. Record extraction identifies the QRRs in p and involves two substeps: data region2
identification and the actual segmentation step.
2. Record alignment aligns the data values of the QRRs in p into a table so that data values for
the same attribute are aligned into the same table column.
Advantages:
CTVS improves data extraction accuracy in three ways.
1. An adapted data region identification method and merge method
2. A novel method
3. A new nested-structure processing algorithm
4. Produce accurate data extraction in results.
7/29/2019 combining documentation.docx
9/56
TECHNICAL FEASIBILITY
7/29/2019 combining documentation.docx
10/56
SYSTEM SPECIFICATION
7/29/2019 combining documentation.docx
11/56
7/29/2019 combining documentation.docx
12/56
SOFTWARE SPECIFICATION
FRONT END
ASP.NET
ASP.NET is part of the .NET framework. ASP.NET programs are centralized
applications hosted on one or more Web servers that respond dynamically to client requests. The
responses are dynamic because ASP.NET intercepts requests for pages with a specific extension
(.aspx or .ascx) and hands off the responsibility for answering those requests to just-in-time (JIT)
compiled code files that can build a response on-the-fly.
ASP.NET deals specifically with configuration (web.config and machine.config)
files, Web Services (ASMX) files, and Web Forms (ASPX) files. The server doesnt serve any
of these file typesit returns the appropriate content type to the client. The configuration file
types contain initialization and settings for a specific application or portion of an application.
Another configuration file, called machine.web, contains machine-level initialization and
settings. The server ignores requests for web files, because serving them might constitute a
security breach.
Client requests for these file types cause the server to load, parse, and execute code
to return a dynamic response. For Web Forms, the response usually consists of HTML or WML.
Web Forms maintain state by round-tripping user interface and other persistent values between
the client and the server automatically for each request.
A request for a Web Form can use View State, Session State, or Application State to
maintain values between requests. Both Web Forms and Web Services requests can takeadvantage of ASP. Nets integrated security and data access through ADO.NET, and can run
code that uses system services to construct the response. So the major difference between a static
request and a dynamic request is that a typical Web request references a static file. The server
reads the file and responds with the contents of the requested file.
7/29/2019 combining documentation.docx
13/56
4.1.2ASP.NET Events:
Every time an ASP.NET page is viewed, many tasks are being performed behind the scenes.
Tasks are performed at key points ("events") of the page's execution lifecycle.The most common
events are:
In Figure 1-1, the layer on top of the CLR is a set of framework
base classes, followed by an additional layer of data and XML classes, plus another
layer of classes intended for web services, Web Forms, and Windows Forms.
Collectively, these classes are known as the Framework Class Library (FCL), one of the
largest class libraries in history and one that provides an object-oriented API to all the
functionality that the .NET platform encapsulates. With more than 4,000 classes, the
FCL facilitates rapid development of desktop, client/server, and other web services and
applications.
4.1.2.1OnInit
The first event in our list to be raised is OnInit. When this event is raised, all of the
page's server controls are initialized with their property values. Post Back values are not applied
to the controls at this time.
7/29/2019 combining documentation.docx
14/56
ASP.NET supports all the .NET languages (currently C#, C++, VB.NET, and JScript,
but there are well over 20 different languages in development for .NET), so you will eventually
be able to write Web applications in your choice of almost any modern programming language.
The .NET Framework sits on top of the operating system, which can be any flavor of Windows,2
and consists of a number of components. Currently, the .NET Framework consists
Fig4.1.1.Interoperability
Increases in speed and power, ASP.NET provides substantial development
improvements, like seamless server-to-client debugging, automatic validation of form data.
4.1.2.2On Load
The next event to be raised is On Load, which is the most important event of them all
as all the pages server controls will have their Post Back values now. At the time of the page willpost in the database we can use the on load method. The user will perform the event(like to click
the button to perform the on load method).
4.1.2.3Post Back Events
7/29/2019 combining documentation.docx
15/56
7/29/2019 combining documentation.docx
16/56
Rich library of Web Controls
Separation of layout (HTML) and logic (e.g. C#)
Compiled languages instead of interpreted languages
GUI can be composed interactively with Visual Studio .NET
Better state management
4.1.4 Namespaces
ASP.NET uses a concept called namespaces. Namespaces are hierarchical object
models that support various properties and methods. For example, HTML server controls reside
in "System.web.UI.HtmlControls" namespace, web server controls reside in
System.web.UI.WebControls" namespace and ADO+ resides in "System. Data" namespace.
4.1.4 Language Independent
An ASP.NET page can be created in any language supported by .NET framework.
Currently .NET framework supports VB, C#, JScript and Managed C++. .NET includes a
Common Language Specification (CLS), which provides a series of basic rules that are required
for language integration.
4.1.5 ASP.NET Server controls
Using ASP.NET Server Controls, browser variation is handled because these
controls output the HTML themselves based on the browser requesting the page. Even if we plan
to use web controls exclusively, its worth reading through this section to master the basics html
controls. Along the way, youll get an introduction to a few ASP .NET essentials that apply to allkinds of server controls, including view state, post backs, and event handling.
4.1.6Types of controls
7/29/2019 combining documentation.docx
17/56
ASP.NET has two basic types of controls: HTML server controls and Web server
controls.HTML Server Controls are generated around specific HTML elements and the
ASP.NET engine changes the attributes of the elements based on server-side code that you
provide. Web server controls revolve more around the functional you need on the page. The
ASP.NET engine takes the extra steps to decide based upon the container of the requester, what
HTML to output.
Figure: web controls
Server Explorer
Server Explorer window enables to perform a number of functions such as database
connectivity, performance monitoring, and interacting with event logs. By using Server Explorer
you can log on to a remote server and view database and system data about that server. Many of
the functions that are performed with the Enterprise Manager in SQL Server can now be
executed in the Server Explorer.
Solution Explorer
Solution Explorer provides an organized view of the projects in the application. The
toolbar within the Solution Explorer enables to
7/29/2019 combining documentation.docx
18/56
7/29/2019 combining documentation.docx
19/56
Properties window provides the properties of an item that is part of the application.
This enables to control the style and behavior of the item selected to modify.
Dynamic Help
Dynamic Help window shows a list of help topics. The help topics change based on
the item selected or the action being taken. The Dynamic Help window shows the help items
displayed when you have a Button control on the page selected. After the item is selected, a list
of targeted help topic is displayed. The topics are organized as a list of links. Clicking one of the
links in the Dynamic Help window opens the selected help topic in the Document window. the
help about the all details about the all information. The result is produced in the lower pane of
the Document window.
Document window
7/29/2019 combining documentation.docx
20/56
The Document window is the main window within Visual Studio.NET where the
applications are built. The Document window shows open files in either Design or HTML mode.
Each open file is represented by a tab at the top of the Document window. Any number of files
can be kept open at the same time, and you can switch between the open files by clicking the
appropriate tab.
Design mode versus HTML mode
Visual Studio.NET offers two modes for viewing and building files: Design and
HTML. By clicking the Design tab at the bottom of the Document window, you can see how the
page will view to the user. The page is built in the Design mode by dragging and dropping
elements directly onto the design page or form. Visual Studio .NET automatically generates the
appropriate code. When the page is viewed in HTML mode, it shows the code for the page. It
enables to directly modify the code to change the way in which the page is presented.
Working with SQL Server through the Server Explorer
Using Visual Studio.NET, there is no need to open the Enterprise Manager from
SQL Server. Visual Studio.NET has the SQL Servers tab within the Server Explorer that gives a
list of all the servers that are connected to those having SQL Server on them. Opening up a
particular server tab gives five options:
Database Diagrams
Tables
Views
Stored Procedures
Functions
Database Diagrams
7/29/2019 combining documentation.docx
21/56
To create a new diagram right click Database diagrams and select New Diagram.
The Add Tables dialog enables to select one to all the tables that you want in the visual
diagram you are going to create. Visual Studio .NET looks at all the relationships between
the tables and then creates a diagram that opens in the Document window. Each table is
represented in the diagram and a list of all the columns that are available in that particular
table. Each relationship between tables is represented by a connection line between those
tables. The properties of the relationship can be viewed by right clicking the relationship line.
Tables
The Server Explorer allows working directly with the tables in SQL Server. It
gives a list of tables contained in the particular database selected.
By double clicking one of the tables, the table is seen in the Document window. This grid
of data shows all the columns and rows of data contained in the particular table. The data
can be added or deleted from the table grid directly in the Document window. To add a
new row of data, move to the bottom of the table and type in a new row of data after
selecting the first column of the first blank row. You can also delete a row of data from
the table by right clicking the gray box at the left end of the row and selecting Delete.
By right clicking the gray box at the far left end of the row, the primary key can be set for
that particular column. The relationships to columns in other tables can be set by
selecting the Relationships option. To create a new table right-click the Tables section
within the Server Explorer and selecting New Table. This gives the design view that
enables to start specifying the columns and column details about the table.
To run queries against the tables in Visual Studio .NET, open the view of the query
toolbar by choosing View->Toolbars->Query. To query a specific table, open that tablein the Document window.
Views
7/29/2019 combining documentation.docx
22/56
To create a new view, right-click the View node and select New View. The Add Table
dialog box enables to select the tables from which the view is produced. The next pane enables to
customize the appearance of the data in the view.
C#.NET:
The C# language is disarmingly simple, with only about 80 keywords and a dozen
built-in data types, but C# is highly expressive when it comes to implementing modern
programming concepts. C# includes all the support for structured, component-based, object-
oriented programming that one expects of a modern language built on the shoulders of C++ and
Java.
The C# language was developed by a small team led by two distinguished
Microsoft engineers, Anders Hejlsberg and Scott Wiltamuth. Hejlsberg is also known for
creating Turbo Pascal, a popular language for PC programming, and for leading the team that
designed Borland Delphi, one of the first successful integrated development environments for
Client/server programming. At the heart of any object-oriented language is its support for
defining and working with classes. Classes define new types, allowing you to extend the
language to better model the problem you are trying to solve.
C# contains keywords for declaring new classes and their methods and properties,
and for implementing encapsulation, inheritance, and polymorphism, the three pillars of object-
oriented programming. In C# everything pertaining to a class declaration is found in the
declaration itself. C# class definitions do not require separate header files or Interface Definition
Language (IDL) files. Moreover, C# supports a new XML style of inline documentation that
greatly simplifies the creation of online and print reference documentation for an application. C#
also supports interfaces, a means of making a contract with a class for services that the interface
stipulates. In C#, a class can inherit from only a single parent, but a class can implement multiple
interfaces. When it implements an interface, a C# class in effect promises to provide the
functionality the interface specifies.
The two dominant languages for Windows development in the pre-.NET world were
C++ and Visual Basic 6 (VB6). Both had sizable user populations (although VB6's user base was
much larger), and so Microsoft needed to find a way to make both groups as happy as possible
7/29/2019 combining documentation.docx
23/56
with their new environment. How, for example, could the large number of Windows developers
who know (and love) C++ be brought forward to use the .NET Framework? One answer is to
extend C++, an option described later. Another approach, one that has proven more appealing for
most C++ developers, is to create a new language based on the CLR but with a syntax derived
from C++. This is exactly what Microsoft did in creating C#.
The results of this commitment to date are impressive. For one thing, the
scope of .NET is huge. The platform consists of four separate product groups:
A set of languages, including C# and Visual Basic .NET; a set of development tools,
Including Visual Studio .NET; a comprehensive class library for building web services
And web and Windows applications; as well as the Common Language Runtime (CLR)
To execute objects built within this framework.
A set of .NET Enterprise Servers, formerly known as SQL Server 2000, Exchange
2000, BizTalk 2000, and so on, that provide specialized functionality for relational
Data storage, email, B2B commerce, etc.
An offering of commercial web services, called .NET My Services; for a fee,
Developers can use these services in building applications that require knowledge of
User identity, etc.
New .NET-enabled non-PC devices, from cell phones to game boxes.
C# provides component-oriented features, such as properties, events, and
declarative constructs (called attributes). Component-oriented programming is supported by the
CLRs support for storing metadata with the code for the class. The metadata describes the class,
7/29/2019 combining documentation.docx
24/56
Including its methods and properties, as well as its security needs and other attributes, such as
whether it can be serialized; the code contains the logic necessary to carry out its
functions
The Common Language Runtime
The Common Language Runtime (CLR) is the foundation for everything else in the
.NET Framework. To understand .NET languages such as C# and Visual Basic (VB), you must
understand the CLR. To understand the .NET Framework class libraryASP.NET, ADO.NET,
and the rest you must understand the CLR. And since the .NET Framework has become the
default foundation for new Windows software, anybody who plans to work in the Microsoft
environment needs to come to grips with the CLR.Software built on the CLR is referred to as managed code, and the CLR provides a
range of things that support creating and running this code. Perhaps the most fundamental is a
standard set of types that are used by languages built on the CLR, along with a standard format
for metadata, which is information about software built using those types. The CLR also
provides technologies for packaging managed code and a runtime environment for executing
managed code. As the most elemental part of the .NET Framework, the CLR is unquestionably
the place to start in understanding what the Framework offers.
FORMS
A form is used to view and edit information in the database record by record .A form
displays only the information we want to see in the way we want to see it. Forms use the familiar
controls such as textboxes and checkboxes. This makes viewing and entering data easy.
Views of Form
We can work with forms in several primarily there are two views,
7/29/2019 combining documentation.docx
25/56
1. Design View
2. Form View
Design View
`To build or modify the structure of a form, we work in forms design view. We can
add control to the form that are bound to fields in a table or query, includes textboxes, option
buttons, graphs and pictures. The form view which displays the whole design of the form the
user will view the document as the way of we like.
Main Features of ASP .NET
Successor of Active Server Page (ASP), but completely different architecture.
1. Object-oriented, Event-based
2. Rich library of web controls, Better state management
3. Separation of layout (HTML) and logic
4. Complied languages instead of interpreted languages
5. GUI can be composed interactively with Visual Studio .NET
An assembly is a collection of files that appear to the programmer to be a single
dynamic link library (DLL) or executable (EXE). In .NET, an assembly is the basic unit of reuse,
Versioning, security, and deployment. The CLR provides a number of classes for manipulating
Assemblies.
REPORT
7/29/2019 combining documentation.docx
26/56
A report is used to vies and print information from the database. The report can
ground records into many levels and compute totals and average by checking values from many
records at once. Also the report is attractive and distinctive because we have control over the size
and appearance of it.
MODULE & MACRO
A macro is a set of actions. Each action in macros does something. Such as opening
a form or printing a report .We write macros to automate the common tasks the work easy and
save the time.
Modules are units of code written in access basic language. We can write and use
module to automate and customize the database in very sophisticated ways.
A final note about C# is that it also provides support for directly accessing
memory using C++ style pointers and keywords for bracketing such operations as
unsafe, and for warning the CLR garbage collector not to collect objects referenced by
pointers until they are released.
4.2 BACK END
4.2.1SQL Server 2005:
Several new features and capabilities have been added to SQL Server 2005. Some of
the most notable features include native XML storage and query support, and integration with
the .NET Common Language Runtime. The comparative editions of this version of SQL Server
haven't really changed much. In addition to the Standard, Developer, and Enterprise editions,
there is a variety of the product called the SQL Server 2005 Express Edition. This is essentially
the replacement for the SQL Server 2000 Desktop Engine (MSDE) that shipped with versions of
Office and Access in the past. It's a lightweight version of the SQL Server engine, intended to
run on a desktop computer with a limited number of connections. As our friends at Microsoft
continue to gently nudge users away from the Access JET database engine and toward SQL
Server, their products will continue to become more aligned and standardized. Like the more
serious editions.
7/29/2019 combining documentation.docx
27/56
SQL Server Express can be managed from within Access, Visual Studio, or
the SQL Server client tools. The SQL language has been enhanced in a few places but is
generally unchanged. Because Transact-SQL conforms to the industry standard ANSI SQL
standard, you will find only a few minor additions to the supported syntax in SQL Server 2005.
A generation of smaller-scale database products evolved to fill the void left the
casual application developer and power user. Products such as the following became the norm
for department-level application because they were accessible and inexpensive.
1. dBase
2. FoxPro
3. Paradox
4. Clipper
5. Clarion
6. FileMaker
7. Access
The big databases were in another class and were simply not available outside
of formal IT circles. They were complicated and expensive. Database administrators and
designers used cumbersome command-line script to create and manage database.
It was a full-time job; DBAs wrote the script to manage the databases and
application developers wrote the code for the application that ran against them. Life was good.
Everyone was happy.
However, there is only one real constant in the IT world and that is change. In the
past five years, there have been significant changes in the world of application development,
database design, and management.
The most popular language for querying and manipulating databases is SQL,
usually pronounced "sequel." SQL is a declarative language, as opposed to a procedural
language, and it can take a while to get used to working with a declarative language when you
are used to languages such as C#. The heart of SQL is the query. A query is a statement that
7/29/2019 combining documentation.docx
28/56
returns a set of records from the database. For example, you might like to see all the Company
Names and CustomerIDs of every record in the Customers table in which the customer's address
is in London. To do so, write:
Select CustomerID, Company Name from Customers where city = 'London'
SQL joins are inner joins by default. Writing join orders is the same as writing inner join
orders. The SQL statement goes on to ask the database to create an inner join with Products,
getting every row in which the ProductID in the Products table is the same as the ProductID in
the Order Details table. Then create an inner join with customers for those rows where the
CustomerID is the same in both the Orders table and the Customer table.
The ADO.NET Object Model
The ADO.NET object model is rich, but at its heart it is a fairly straightforward set of
classes. The most important of these is the DataSet. The DataSet represents a subset of the entire
database, cached on your machine without a continuous connection to the database. Periodically,
you'll reconnect the DataSet to its parent database, update the database with changes you've
made to the DataSet, and update the DataSet with changes in the database made by other
processes.
This is highly efficient, but to be effective the DataSet must be a robust subset of
the database, capturing not just a few rows from a single table, but also a set of tables with all the
Metadata necessary to represent the relationships and constraints of the original database. This
Is, not surprisingly, what ADO.NET provides.
The DataSet is composed of Data Table objects as well as DataRelation objects.
These are accessed as properties of the DataSet object. The Tables property returns a
DataTableCollection, which in turn contains all the Data Table objects.
Data Tables and Data Columns
7/29/2019 combining documentation.docx
29/56
The Data Table can be created programmatically or as a result of a query against the
database. The Data Table has a number of public properties, including the Columns collection,
which returns the DataColumnCollection object, which in turn consists of DataColumn objects.
Each DataColumn object represents a column in a table.
DataRelations
In addition to the Tables collection, the DataSet has a Relations property, which
returns a DataRelationCollection consisting of DataRelation objects. Each DataRelation
represents a relationship between two tables through DataColumn objects. For example, in the
North wind database the Customers table is in a relationship with the Orders table through the
CustomerID column. The nature of the relationship is one-to-many, or parent-to-child. For any
given order, there will be exactly one customer, but any given customer might be represented in
any number of orders.
Rows
Data Tables Rows collection returns a set of rows for any given table. Use this
collection to examine the results of queries against the database, iterating through the rows to
examine each record in turn. Programmers experienced with ADO are often confused by the
absence of the Record Set with its move Next and move previous commands. With ADO.NET,
you do not iterate through the DataSet; instead, access the table you need, and then you can
iterate through the Rows collection, typically with a for each loop.
Data Adapter
The DataSet is an abstraction of a relational database. ADO.NET uses a Data Adapter as
a bridge between the DataSet and the data source, which is the underlying database. Data
Adapter provides the Fill( ) method to retrieve data from the database and populate the DataSet.
DBCommand and DBConnection
The DBConnection object represents a connection to a data source. This connection can be
shared among different command objects. The DBCommand object allows you to send a
7/29/2019 combining documentation.docx
30/56
command (typically a SQL statement or a stored procedure) to the database. Often these objects
are implicitly created when you create your DataSet, but you can explicitly access these objects,
as you'll see in a subsequent example.
The DataAdapter Object
Rather than tie the DataSet object too closely to your database architecture, ADO.NET
uses a DataAdapter object to mediate between the DataSet object and the database. This
decouples the DataSet from the database and allows a single DataSet to represent more than one
database or other data source.
7/29/2019 combining documentation.docx
31/56
SYSTEM DESIGN
7/29/2019 combining documentation.docx
32/56
SYSTEM DESIGN
DATA FLOW DIAGRAM
1.Search(user)
2.Query result page :
7/29/2019 combining documentation.docx
33/56
3.Novel data extraction & alignment :
7/29/2019 combining documentation.docx
34/56
4.Nested structure algorithm:
7/29/2019 combining documentation.docx
35/56
5.Accurate data extraction:
7/29/2019 combining documentation.docx
36/56
7/29/2019 combining documentation.docx
37/56
UML DIAGRAMS
USECASE DIAGRAM
SEQUENCE DIAGRAM:
User
Browser
Set user
query
Extract user
query
Web database
User BrowserWeb database
Search user query
Search results/response
Extract query result records
Perform actual segmentation
Align data values
Exact results
7/29/2019 combining documentation.docx
38/56
7/29/2019 combining documentation.docx
39/56
COLLABORATION DIAGRAM:
User
Browser
1. Search user query
Web databaseExtract query result
page
1.1 search
query
CTVS (Combining tag and
value similarity)
Accurate data extraction
2.1 result page1
2.2 result page2
3.1 Actual segmentation
3.2 align data values
4.1 final search
results
7/29/2019 combining documentation.docx
40/56
SYSTEM TESTING
7/29/2019 combining documentation.docx
41/56
7/29/2019 combining documentation.docx
42/56
Debugging is eliminating the cause of known errors. Commonly used debugging
techniques are induction, deduction and backtracking. Debugging by induction involves the
following steps:
1. Collect all the information about test details and test results
2. Look for patterns
3. Form one or more hypotheses and rank /classify them.
4. Prove/disprove hypotheses. Re examine
5. Implement appropriate corrections
6. Verify the corrections. Re run the system and test again until satisfactory
Debugging by deduction involves the following steps:
1. List possible causes for observed failure
2. Use the available information to eliminate various hypotheses
3. Prove/disprove the remaining hypotheses
4. Determine the appropriate corrections
5. Carry out the corrections and verify
Debugging by backtracking involves working backward in the source code from Point
where the error was observed. Run additional test cases and collect more information.
SYSTEM TESTING
System testing involves two activities: Integration testing and Acceptance testing.
Integration strategy stresses on the order in which modules are written, debugged and unit tested.
Acceptance test involves functional tests, performance tests and stress tests to verify
requirements fulfillment. System checking checks the interfaces, decision logic, control flow,
recovery procedures, and throughput, capacity and timing characteristics of the entire system.
7/29/2019 combining documentation.docx
43/56
INTEGRATION TESTING
Integration testing strategies include bottom-up (traditional), top-down and sandwich
strategies. Bottom-up integration consists of unit testing, followed by sub system testing,
followed by testing entire system. Unit testing tries to discover errors in modules. Modules are
tested independently in an artificial environment known as a test harness. Test harnesses
provide data environments and calling sequences for the routines and subsystem that are being
tested in isolation.
Disadvantages of bottom-up testing include that harness preparation, which can
sometimes take almost 50% or more of the coding and debugging effort for a smaller product.
After testing all the modules independently and in isolation, they are linked and executed in one
single integration and in isolation; they are linked and executed in one single integration run.
This known as Big bang approach to integration testing. Isolating sources of errors is difficult
in big bang approach.
Top-down integration starts with the main routine and one or two immediately next lower
level routines. After a thorough checking the top level becomes a test harness to its immediate
subordinate routines. Top-down integration offers the following advantages.
1. System integration is distributed throughout the implementation phase. Modules are
integrated as they are developed.
2. Top-level interfaces are first test
3. Top-level routines provide a natural test harness for lower-level routines.
4. Errors are localized to the new modules and interfaces that are being added.
Though top-down integrations seem to offer better advantages, it may not be applicable
in certain situations. Sometimes it may be necessary to test certain critical low-level modules
first. In such situations, a sandwich strategy is preferable. Sandwich integration is mostly top-
down, but bottom-up techniques are used on some modules and sub systems. This mixed
approach retains the advantages of both strategies.
7/29/2019 combining documentation.docx
44/56
ACCEPTANCE TESTING
Acceptance testing involves planning and execution of functional tests, performance tests
and stress tests in order to check whether the system implemented satisfies the requirements
specifications. Quality assurance people as well as customers may simultaneously develop
acceptance tests and run them. In addition to functional and performance tests, stress test are
performed to determine the limits/limitations of the system developed. For example, a complier
may be tested for its symbol table overflows or a real-time system may be tested for its symbol
table overflows or a real-time system may be tested to find how it responds to multiple interrupts
of different/same priorities.
Acceptance test tools include a test coverage analyzer, and a coding standards checker.
Test coverage analyzer records the control paths followed for each test case. A timing analyzer
reports the time spent in various regions of the source code under different test cases. Coding
standards are stated in the product requirements. Manual inspection is usually not an adequate
mechanism from detecting violations of coding standards.
SYSTEM TESTING
Software testing is an important element of software quality assurance and represents theultimate review of specification design and coding. The user tests the developed system and
changes are made according the needs. The testing phase involves testing of developed system
using various kinds of data. An elaborated test data is prepared and system using the data. Whole
testing is noted and corrections are made.
Testing Objectives
Testing is a process of executing a program with the intent of finding on errors .
A good test is on that has a high probability of finding an undiscovered errors .
Testing is vital to the success of the system. System testing is the state of implementation,
which ensures that the system works accurately before live operations commence. System testing
7/29/2019 combining documentation.docx
45/56
makes a logical assumption that the system is correct and that the goals are successfully
achieved.
EFFECTIVE TESTING PREREQUISITES
1) Types of Testing Done
Integration Testing
An overall test plan for the project is prepared before the start of coding .
Validation Testing
This project will be tested under this testing using sample data and produce the correct
sample output.
RECOVERY TESTING
This project will be tested under this testing using correct data input and its product and
the correct valid output without any errors.
SECURITY TESTING
This project contains password to secure the data.
TEST DATA AND INPUT
Taking various types of data we do the above testing. Preparation of test data plays a
vital role in system testing. After preparing the test data the system under study is tested using
the test data. While testing the system by using the above testing and correction methods. The
system has been verified and validated by running with both.
i) Run with live data
ii) Run with test data
7/29/2019 combining documentation.docx
46/56
RUN WITH TEST DATA
In the case the system was run with some sample data. Specification testing was also
done for each conditions or combinations for conditions.
RUN WITH LIVE DATA
The system was tested with the data of the old system for a particular period. Then the new
reports were verified with the old one.
TEST CASES
A test case in software engineering is a set of conditions or variables under which a
tester will determine whether an application or software system is working correctly or not. The
mechanism for determining whether a software program or system has passed or failed such a
test is known as a test oracle. In some settings, an oracle could be a requirement or use case,
while in others it could be a heuristic. It may take many test cases to determine that a software
program or system is considered sufficiently scrutinized to be released. Test cases are often
referred to as test scripts, particularly when written. Written test cases are usually collected into
test suites.
FORMAT TEST CASES
In order to fully test that all the requirements of an application are met, there must be at
least two test cases for each requirement: one positive test and one negative test. If a requirement
has sub-requirements, each sub-requirement must have at least two test cases. Keeping track of
the link between the requirement and the test is frequently done using a traceability matrix.
Written test cases should include a description of the functionality to be tested, and the
preparation required to ensure that the test can be conducted.A formal written test-case is
characterized by a known input and by an expected output, which is worked out before the test is
executed. The known input should test a precondition and the expected output should test a post
condition.
http://en.wikipedia.org/wiki/Software_engineeringhttp://en.wikipedia.org/wiki/Software_applicationhttp://en.wikipedia.org/wiki/Software_systemhttp://en.wikipedia.org/wiki/Oracle_%28software_testing%29http://en.wikipedia.org/wiki/Requirementhttp://en.wikipedia.org/wiki/Use_casehttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Test_scripthttp://en.wikipedia.org/wiki/Test_suitehttp://en.wikipedia.org/wiki/Traceability_matrixhttp://en.wikipedia.org/wiki/Preconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Preconditionhttp://en.wikipedia.org/wiki/Traceability_matrixhttp://en.wikipedia.org/wiki/Test_suitehttp://en.wikipedia.org/wiki/Test_scripthttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Use_casehttp://en.wikipedia.org/wiki/Requirementhttp://en.wikipedia.org/wiki/Oracle_%28software_testing%29http://en.wikipedia.org/wiki/Software_systemhttp://en.wikipedia.org/wiki/Software_applicationhttp://en.wikipedia.org/wiki/Software_engineering7/29/2019 combining documentation.docx
47/56
7/29/2019 combining documentation.docx
48/56
IMPLEMENTATION
7/29/2019 combining documentation.docx
49/56
IMPLEMENTATION
SAMPLE CODING
7/29/2019 combining documentation.docx
50/56
SCREEN LAYOUTS
7/29/2019 combining documentation.docx
51/56
FUTURE ENHANCEMENT
7/29/2019 combining documentation.docx
52/56
FUTURE ENHANCEMENT
7/29/2019 combining documentation.docx
53/56
CONCLUSION
7/29/2019 combining documentation.docx
54/56
7/29/2019 combining documentation.docx
55/56
BIBLIOGRAPHY
7/29/2019 combining documentation.docx
56/56
BIBLIOGRAPHY
BOOK REFERENCE
WEB REFERENCE
Top Related