Logic Based Pattern Discovery New
-
Upload
vishnu-prasad-goranthala -
Category
Documents
-
view
226 -
download
0
Transcript of Logic Based Pattern Discovery New
-
7/30/2019 Logic Based Pattern Discovery New
1/73
LOGIC BASED PATTERN DISCOVERY
A Project Report Submitted In Partial Fulfillment of the Requirements for the Award Of
MASTER OF TECHNOLOGY
IN
SOFTWARE ENGINEERING
BY
E.SREE LAKSHMI
(09C31D2503)
UNDER THE GUIDENCE OF
MRS RAZIYA
Assoc.Prof.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE
NARSAMPET, WARANGAL - 506331
2011-2012
-
7/30/2019 Logic Based Pattern Discovery New
2/73
DEPARTMENT OF INFORMATION TECHNOLOGY
BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE
NARSAMPET, WARANGAL- 506331
CERTIFICATE
This is to certify thatE.SREE LAKSHMI, Roll No 09C31D2503 of the M.Tech has
satisfactorily completed the dissertation work entitled LOGIC BASED PATTERN
DISCOVERY in the partial fulfillment of the requirements of the M. Tech degreeduring this academic year 2011-2012
Mr.D.Venkateshwarlu Mr. M.SRINIVAS
Supervisor(s) Head of the
Department
External
-
7/30/2019 Logic Based Pattern Discovery New
3/73
Abstract
Previous studies have presented convincing arguments that a frequent
pattern mining algorithm should not mine all frequent patterns but only the
closed ones, because the latter leads to not only a more compact yet complete
result set but also better efficiency. However, most of the previously developed
closed pattern mining algorithms work under the candidate maintenance-and test
paradigm, which is inherently costly in terms of runtime and space usage when the
support threshold is low or the patterns, become long. A new Pattern mining
algorithm will be proposed to discover domain knowledge report as coherent rules,
where coherent rules would be discovered based on the properties of inference
analysis approach. In this approach I can use the Back Scan Pruning technique.
-
7/30/2019 Logic Based Pattern Discovery New
4/73
CHAPTER 1
INTRODUCTION
Data is stored in databases, data warehouses and other information repositories. As
size of data increases, there is a bad need for data mining which is required to
extract knowledge of interest for the users. Thus, data mining is a process of
extracting knowledge from information repositories by extracting interesting data
patterns representing knowledge. We get these interesting data patterns by
evaluating data patterns (task relevant information) obtained from variousdatabases and data warehouses.
Data Mining is carried out using various data mining functionalities in which
Association Rule Mining (ARM) is commonly used to extract interesting data
patterns. It is used by marketing and retail communities in order to find
interesting association rules between frequent item sets which can boost the
sales of an item set in the market in order to make profits. Mining association
rules are useful for discovering relationships among items from large databases.
Association rule mining deals with market basket data analysis for finding
frequent item sets to generate valid and important association rules from them.
Association Rule Mining finds interesting data patterns based on association
relationship between various items of a data set by using association rules which
are used to specify the association relationship between various items of a data set.
{milk, bread} {Butter}, now in this association rule, there is correlation
between two item sets: {milk, bread} and {butter}.
A frequent item set is a set of items that appear together frequently in a data
set. Now the item sets whose frequency of occurrence is > = min_support threshold
value given by the domain experts are only considered to be frequent patterns.
Hence, Association Rule Mining is also called as Frequent pattern Mining.
-
7/30/2019 Logic Based Pattern Discovery New
5/73
An association between these frequent itemsets is interesting, if it satisfies
two interestingness measures called support and confidence. By using a
min_support value given by domain expert we lose some interesting association
rules, as this threshold value cannot be always correct. So we should shift from
existing support and confidence framework to a framework which uses a logic
principle in order to check the interestingness of an association rule. As this
framework relies completely on logic instead of min_support threshold value given
by domain expert, all the interesting association rules are discovered. This
framework is further enhanced by applying pruning technique in order to reduce
search space so that, time and space complexity are reduced.
1.1 OBJECTIVE
To eliminate the need of using the minimum support threshold for discovering
interesting association rules by obtaining association rules based on their support
value (i.e. their frequency of occurrence) as observed in the transactional data set
and then evaluating these association rules based on certain logic principles. This
process is followed in order to get only strong and interesting association rules and
completely eliminate uninteresting association rules which we get when mining is
performed based on minimum support threshold value given by a domain expert.
This process is to be further enhanced in order to reduce space and time
complexity.
1.2 PROBLEM STATEMENT
The use of minimum support threshold generally assumes that:-
A domain expert can provide this threshold value accurately, which is not always
accurate.
The knowledge of interest i.e. an interesting data pattern in the form of an
interesting association rule can be obtained within this threshold value.
This single threshold value is highly enough to get knowledge of interest
required by the user.
Because of these assumptions, we having following disadvantages:-
Loss of association rules involving frequently observed items.
-
7/30/2019 Logic Based Pattern Discovery New
6/73
Loss of association rules involving infrequently observed items.
In order to overcome these disadvantages, we need to shift from we need to
use a framework other than the existing support and confidence framework
which discovers interesting association rules
1.3DEFINITIONS
Data mining: - It is a process of discovering knowledge from various
information repositories by extracting interesting data patterns representing
knowledge. These interesting data patterns are obtained by evaluating data
patterns (i.e. task relevant information) obtained from various data sources like
databases and data warehouses. Task - relevant information obtained from
various data sources like databases and data warehouses is called data pattern. Association Rule Mining: - It is a data mining functionality which is used to
find interesting data patterns based on association or correlation relationship
between various items of a transactional data set.
Association Rule: - It is a rule used to specify association relationship between
items of a frequent itemset obtained from a transactional data set.
Let I = {i1, i2 , im} be a set of items. Let D, the task-relevant data, be a
set of database transactions where each transaction T is a set of items such that
T I. Each transaction is associated with an identifier, called TID. Let A be a set
of items. A transaction T is said to contain A, if and only if A T. An association
rule is an implication of the form A B, where A I, B I, and A B =
. The rule A B holds in the transaction set D with support s, where s is
the percentage of transactions in D that contain A U B (i.e., the union of sets A
and B, or say, both A and B). This is taken to be the probability, P (A U B). The
rule
A B has confidence c in the transaction set D, where c is the percentage of
transactions in D containing A that also contain B. This is taken to be the
conditional probability, P (B / A).
Support (A B) = P (A U B)(1.3.1)
Confidence (A B) = P (B / A)..(1.3.2)
-
7/30/2019 Logic Based Pattern Discovery New
7/73
Itemset: - A frequent itemset refers to a set of items that appear together
frequently in a transactional data set. For ex: {milk, bread}.
Subsequence: - A data pattern that appears in a sequential order in a data set
is called as a frequent sequential pattern or a frequently occurring subsequence.
For ex:- Pattern which shows that
customers tend to purchase first a PC followed by a digital camera and then a
memory card is a frequent sequential pattern or subsequence.
Implication: - If an association rule meets certain logic principles based on
values of a truth table then it is called as an implication.
An implication is formed by using two propositions p and q, from which we have
four implications:-
p q
p q
p q
p q
The symbol is used to describe the relation between p and q .
And the symbol
means a false proposition. Thus, an association rule X Y is mapped
to p q iff both X and Y are observed, here X is mapped to p and Y
is mapped to q.
Equivalence: - An equivalence is a mode of implication, where an implication
has to satisfy the following condition in order to qualify as an equivalence: -
p q iff (p xor q).(1.3.3)
Here p and q are implications and is equivalence symbol. The below
given truth table is the truth table for equivalence. Here p and q are implications
and is equivalence symbol. The below given truth table is the truth table for
equivalence.
Table 1.3.1 Truth Table for Logical Equivalence
-
7/30/2019 Logic Based Pattern Discovery New
8/73
Thus, the association rule whose implication satisfies the equivalence condition
given above is considered to be an interesting association rule.
CHAPTER 2
LITERATURE SURVEY
This section consists information about knowledge discovery process, architecture
of data mining system, data warehouse, association rule mining, apriori algorithm
and FP Growth Algorithm.
2.1 DATA MINING
It is the process of extracting knowledge from large amounts of data. Data
mining is an essential step in the process of knowledge discovery called KDD.
Which is shown as follows: -
-
7/30/2019 Logic Based Pattern Discovery New
9/73
Figure 2.1.1 KNOWLEDGE DISCOVERY FROM DATA (KDD)
The essential steps involved in data mining are:-
Data Preprocessing: - Data cleaning and data integration are steps involved in
datapreprocessing. In data cleaning, noisy data (data errors) is removed and in
data integration, data from multiple data sources in merged into a single unified
format.
Data selection: - Where data relevant to the analysis task are retrieved from
the database.
Data transformation: - Where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations.
Data mining: - It is an essential process where intelligent methods are applied
in order to extract data patterns.
Pattern evaluation: - In this step we identify the truly interesting patterns
representing knowledge based on some interestingness measures.
Knowledge presentation: - Here visualization and knowledge representation
techniques are used to present the mined knowledge to the user.Figure 2.1.2 Architecture of a Typical Data Mining System
-
7/30/2019 Logic Based Pattern Discovery New
10/73
Database, data warehouse and other information repository: - This is a
set of databases, data warehouses and other kinds of information repositories.
Data cleaning and data integration techniques are performed on the data.
Database or data warehouse server: - The database or data warehouse
server is responsible for fetching the relevant data, based on the users data
mining request.
Knowledge base: -This is the domain knowledge that is used to evaluate the
interestingness of resulting patterns. Such knowledge can include concept
hierarchies, used to organize attributes or attribute values into different levels of
abstraction.
Data mining engine: - This is essential to the data mining system and ideally
consists of a set of functional modules for tasks such as characterization,
association and correlation analysis, classification, prediction, cluster analysis,
outlier analysis, and evolution analysis.
Pattern evaluation module: -This component uses interestingness measures
and interacts with the data mining modules so as to focus the search towards
interesting patterns. It uses interestingness thresholds to filter out discovered
patterns.
-
7/30/2019 Logic Based Pattern Discovery New
11/73
User interface: - This module communicates between users and the data
mining system, allowing the user to interact with the system by specifying a
data mining query or task, providing information to help focus the search, and
performing exploratory data mining based on the intermediate data mining
results. In addition, this component allows the user to browse database and data
warehouse schemas or data structures, evaluate mined patterns, and visualize
the patterns in different forms.
2.2DATA WAREHOUSE
A Data Warehouse is a subject oriented, integrated, time variant and non
volatile collection of data which supports managerial decision making process.
The four keywords specified in the above definition can be described as follows:-
Subject Oriented: - A data warehouse provides a simple and concise view
about particular subject issues by excluding data which is not useful for decision
making process. Thus, it is specially designed to focus on the modeling and
analysis of data for decision makers. For ex: - A data warehouse is organized
around major subjects like customer, supplier, product and sales. Rather than
concentrating on day-to-day operations.
Integrated: -A data warehouse is constructed by integrating multiple
heterogenous sources like relational databases and Data Warehouses.
Time-Variant: - Data are stored to provide information from historical
perspective like a period from 5 to 10 years.
Non-Volatile: - A data warehouse is a permanent storage of data; it is a
physically separate store of data transformed from the application data found in
the operational environment. Due to this separation, a data warehouse does not
require transaction processing, recovering and concurrency control mechanisms.
It requires only two operations on data: -
Initial loading of data.
Access of data
-
7/30/2019 Logic Based Pattern Discovery New
12/73
Figure 2.2.1 Three-Tier Data Warehouse Architecture
1) Bottom Tier: - It is a data warehouse server. Back-end tools are used to feed
data into the bottom tier from operational databases or other external sources.
These tools perform data extraction, cleaning, integration and transformation. The
data is extracted using application program interface known as gateways, which is
supported by the underlying DBMS and allows client programs to generate SQL
code to be executed at a server. For ex: - ODBC (Open Database Connection).
2) Middle Tier: - The middle tier is an OLAP server implemented by using either a
relational OLAP model i.e. an extended relational DBMS that maps operations on
multi dimensional data to standard relational operations or a multidimensional
-
7/30/2019 Logic Based Pattern Discovery New
13/73
OLAP model, that is, a special-purpose server that directly implements
multidimensional data and operations.
3) Top Tier: -The top tier is a front-end client layer, which contains query and
reporting tools, analysis tools, and data mining tools.
2.3 ASSOCIATION RULE MINING
It is a data mining functionality or a data mining method used to find interesting
data patterns based on association or correlation relationship among a large set
of data items by using association rules which specify the association
relationship among data items. For ex:- {milk , bread} {butter}
The itemsets whose frequency of occurrence is equal to greater than or equal to
min_support count threshold given by domain experts are only considered to befrequent patterns or frequent itemsets. This threshold value is provided to start
the pattern discovery process. Thus, ARM is also called as frequent pattern
mining.
An association or correlation between the items of these frequent itemsets is
said to be interesting if it satisfies two interestingness measures called support
and confidence that are used to evaluate the interestingness of an association
rule.
Figure 2.3.1 Example Of Association Rule Mining
For association rule A B:
Support = support _count (A U B) / total no: of transactions = 2 / 4 = 50%
Confidence = support_count (A U B) / support_count (A) = 2 / 3 = 66.6%
So this rule is considered as interesting one.
The above example shows that the confidence of rule A B can be easily
derived from the support counts of Aand A U B. That is, once the support counts of
A, B and A U B are known found, it is straightforward to derive the corresponding
association rules
-
7/30/2019 Logic Based Pattern Discovery New
14/73
A B and B A and check whether they are strong. Thus the problem of mining
association rules can be reduced to that of mining frequent itemsets.
Association rule mining can be viewed as a two-step process:
1. Find all frequent itemsets: - Each of these itemsets will occur at least as
frequently as a predetermined minimum support count (min_sup).
2. Generate strong association rules from the frequent itemsets: - These
rules must satisfy minimum support and minimum confidence and then only they
are considered to be interesting association rules.
Types of Association Rules: The different types of association rules used inoperational and relational databases are: -
1. Quantitative Association rules
2. Single-Dimensional Association rules
3. Multi-Dimensional Association rules
4. Multi level Association rules
1) Quantitative Association rules: - This approach considers individual
numerical attributes as quantities so it is called dynamic multidimensional
association rule.
Here, Aquan1 ^ Aquan2 Acat, where Aquan1 and Aquan2 are tests on
quantitative attribute intervals (where the intervals are dynamically determined),
and Acat tests a categorical attribute from the task-relevant data. Such rules have
been referred to as two-dimensional quantitative association rules. For ex: - age(X,
30::: 39) ^ income(X, 42K:::48K) buys(X, HDTV)
2) Single-Dimensional Association rules: - It consists of only a single
dimension, which is used multiple times. For ex: - age(X, 20::: 29) ^ buys(X,
laptop) buys(X, HP printer)
-
7/30/2019 Logic Based Pattern Discovery New
15/73
3) Multi-Dimensional Association rules: - Association rules that involve two or
more dimensions or predicates can be referred to as multidimensional association
rules.
For ex: - age(X, 20::: 29) ^ occupation(X, student)
buys(X, laptop)
4) Multilevel Association rules: - When data mining is performed at multiple
levels of abstraction rule which are extracted are referred to as multilevel
association rules. This is done by using Concept Hierarchy.
CHAPTER 3
PROBLEM ANALYSIS
3.1 PROBLEM DESCRIPTION
The use of previous frequent pattern mining algorithms Apriori and FP Growth use a
minimum support threshold to find frequent itemsets in order to discover
interesting association rules and these algorithms are based on following
assumptions: -
The threshold value provided by the domain expert is very accurate.
The frequent patterns (frequent itemsets) must have occurred frequently at least
equal to the threshold.
Because of these assumptions, we have the following disadvantages: -
Loss of association rules involving frequently observed items.
Loss of association rules involving infrequently observed items.
No consideration for negative association rules.
Loss of Association rules involving Frequently Observed Items :-
-
7/30/2019 Logic Based Pattern Discovery New
16/73
Use of a minimum support threshold assumes that an ideal minimum support
threshold exists for frequent patterns, and that a user can identify this threshold
accurately. But it is unclear how to find this threshold as there is no universal
standard for setting this threshold value. Different minimum support thresholds
would result in inconsistent mining results, even when the mining process is
performed on the same data set. i.e. a lower minimum support threshold would
result in more unnecessary association rules being found, and a higher minimum
support threshold would result in fewer association rules being found. We consider
this situation as a case of losing association rules involving frequent items. The
problem of losing frequent association rules can be solved only by lifting the
minimum support threshold value.
Loss of Association Rules involving infrequently Observed Items :-
Typically, a data set contains items that appear frequently while other items
rarely occur. For Ex: - In a retail fruit market, fruits are frequently observed but
occasionally bread is also observed. Some items are rare in nature or infrequently
found in a data set. These items are called rare items. If a single minimum support
threshold is used and is set high, those association rules involving rare items will
not be discovered. Use of a single and lower minimum support threshold, on the
other hand, would result in too many uninteresting association rules having that
rare item. This is called the rare item problem.
No consideration for negative association rules:-
Algorithms like Apriori and FP Growth do not give importance to the absence of
items within a transactional data set. They can discover only positive association
rules.
For ex: - An association rule like {milk, bread} {butter}, which tells
about the absence of both antecedent and consequent parts of an association ruleare not considered to be discovered during the mining process.
3.2 EXISTING SYSTEM
The existing system can either implement Apriori algorithm or FP Growth algorithm.
The main input parameter given to the existing system is the minimum support
-
7/30/2019 Logic Based Pattern Discovery New
17/73
threshold value in order to to get frequent itemsets. An association or correlation
between these frequent itemsets is said to be interesting, if it satisfies two
interestingness measures called support and confidence, which are used to
evaluate the interestingness of an association rule. In this way we can discover
interesting association rules.
Demerits of existing system are:-
It is assumed that the domain expert provides an accurate minimum support
threshold value.
It is also assumed that the frequent patterns or frequent itemsets have occurred
frequently atleast equal to the threshold.
Negative association rules are not given any importance.
Loss of association rules involving frequently observed items.
Loss of association rules involving infrequently observed items.
3.3PROPOSEDSYSTEM
A novel framework is proposed which removes the demerits of the existing system
by removing the need for minimum support threshold value. Here associations are
discovered based on logical implications. The principle of this approach considers
that an association rule should be reported only when there is enough logical
evidence about it in the data set. To do this, we should consider both the presenceand absence of items during the mining process.
For ex: An association such as A B will be reported only when there are fewer
occurrences of A B, A B but more occurrences of A B
Figure 3.3.1 Framework of Association Rules Based On Pseudo
implications
-
7/30/2019 Logic Based Pattern Discovery New
18/73
In the first step, the association rules that are observed in the data set are
mapped to their implications based on comparison between their support count
values (i.e. their frequency of occurrence). The implications which are obtained in
this way are called as pseudo implications.
In the second step, these pseudo implications are mapped to a mode of
implication called equivalence based on some conditions. The pseudo implications
which satisfy all these conditions are called as pseudo implications of equivalences.
If only a pair of pseudo implications satisfy the same conditions, then they together
form a coherent rule.
Coherent rule: - If a pair of pseudo implications satisfy all the four conditions of
equivalence then those two pseudo implications form a coherent rule, these four
conditions are: -
S (X, Y) > S (X, Y)
-
7/30/2019 Logic Based Pattern Discovery New
19/73
S (X, Y) > S( X, Y)
S ( X, Y) > S (X, Y)
S ( X, Y) > S ( X, Y)
Association rules decoupled from coherent rules are interesting association
rules as they are related to true implication based only on logic but not on domain
knowledge. So coherent rules dont need users to preset the minimum support
threshold to get frequent patterns as they can be identified via truth table values.
Pseudo implications: - Association rules mapped to implications based on
comparison between their support count values are called pseudo implications.
These implications are called as pseudo as they resemble real implications. A
pseudo implication is judged true or false based on comparison between supports
but an implication is judged true or false based on binary values 1 or 0.Mapping association rules to Equivalences
The association rules can be mapped to equivalences in two ways: -
Mapping an association rule to equivalence using a single transaction record
Mapping an association rule to equivalence using multiple transaction records
Mapping an association rule to equivalence using a single transaction
record
The steps involved in this process are: -
Itemsets of an association rule are mapped to propositions of an implication:
For ex: - Presence of an itemset X is mapped to proposition p=T, iff X is
observed. Absence of an itemset X is mapped to p=F i.e. p iff X is not
observed. Thus, Itemsets X and Y are mapped to p and q=T iff both X and Y
are observed.
Mapping association rules to implications:
For ex: - An association rule X Y is mapped to implication p q, iff
X is observed but Y is not observed.
-
7/30/2019 Logic Based Pattern Discovery New
20/73
Mapping association rules to equivalences:
Association rules are mapped to equivalences based on truth table values
(T, F, F, T) for implications p q, p q, p q and p
q.
For ex: - Association rule X Y is mapped to p q iff
X Y is true (as p = T and q=T, therefore p equivalence q = T)
X Y is false
X Y is false and X Y is true.
Mapping an Association rule to equivalence using multiple transaction
records
In multiple transaction records, an itemset X is observed in many transaction
records. So based on comparison between presence and absence of an itemset,
each itemset can be mapped to propositions p and q as follows:-
If S(X) > S( X) then itemset X is mapped to p = T. X is considered interesting
as it is mostly observed in the dataset.
But if a union of itemsets such as (X, Y) is involved, then X Y can be mapped
to an implication p q only when union of itemsets i.e. (X, Y) is more observed
in transactions when compared to (X, Y), ( X, Y) and ( X, Y).
Thus, Association rules that are mapped to their implications by comparing their
support count values are called pseudo implications.
These pseudo implications can be mapped to equivalence when following
conditions are met:-
S (X ,Y) > S(X , Y)
S(X ,Y) > S( X , Y)
-
7/30/2019 Logic Based Pattern Discovery New
21/73
-
7/30/2019 Logic Based Pattern Discovery New
22/73
Predator (Boolean attribute)
Toothed (Boolean attribute)
Backbone (Boolean attribute)
Breathes (Boolean attribute)
Venomous (Boolean attribute)
Fins (Boolean attribute)
Legs (numeric attribute, integers value range: [0,2,4,5,6,8])
Tail ( Boolean attribute)
Domestic (Boolean attribute)
Cat size (Boolean attribute)
Type (Numeric attribute, integer value range: [1, 2, 3, 4, 5, 6, 7] which
represents each class of animals.
Table 3.3.1 Total Frequency of Occurrence for Each Class of Animals
Let the minimum support threshold be 5%, now the classes of animals whose
frequency of occurrence is greater than minimum support threshold are considered
to be frequent. And the classes of animals whose frequency of occurrence is less
than minimum support threshold are considered to be infrequent.
Thus, the classes of animals: Mammals, Birds, Fishes, Insects and Invertebrates are
considered to be frequent. And the classes of animals: Amphibians and Reptiles are
considered to be infrequent.
Problem with infrequent association rules
The rules that involve infrequent items i.e. infrequent classes of animals like
Amphibians and Reptiles are not discovered via Apriori approach though they are
-
7/30/2019 Logic Based Pattern Discovery New
23/73
interesting as their support count is less than minimum support threshold value. But
by using Coherent Rule Mining approach we can discover these kinds of association
rules.
Let us consider an association rule:-
{Eggs (1), toothed (1), breathes (1), tail (1)}) {Reptile (1)}
Let, X = {eggs (1), toothed (1), breathes (1), tail (1)}
And Y = {Reptiles (1)}
Now the association rule X Y is reported as an interesting association rule
only when there is enough logical evidence about it in the data set, i.e. if the
association rule satisfies all the four conditions for equivalence :-
S (X, Y) > S(X, Y)
S(X, Y) > S( X, Y)
S( X, Y) > S(X, Y)
S( X, Y) > S( X, Y)
-
7/30/2019 Logic Based Pattern Discovery New
24/73
Table 3.3.2This can be shown with the help of a table given below : -
Frequency of co-occurrences Consequent
Y
Y={Rept
ile(1)}
Not
Y={Reptile(
0)}
Tot
al
Anteced
ent X
X={eggs(1),
toothed(1),breathes(1),tail(1)}
3 1
4
Not
X={eggs(0),toothed(0),breathes(0
),tail(0)}
2 9
5
97
Total
5 96
10
1
Thus, the table given above shows that: -
S (X U Y) > S (X U Y) (3 > 1)
S (X U Y) > S ( X U Y) (3 > 2)
S ( X U Y) > S (X U Y) (95 > 1)
S ( X U Y) > S ( X U Y) (95 > 2)
Thus the coherent rule formed is: -
{Eggs (1), toothed (1), breathes (1), tail (1)} {Reptile (1)}
{Eggs (1), toothed (1), breathes (1), tail (1)} {Reptile (1)}
Thus, the given association rule is considered to be interesting as its interestingness
is based on pure logic i.e. it is logically correct. This coherent rule specifies that an
animal which lays eggs, which has teeth, which breathes through lungs and has a
-
7/30/2019 Logic Based Pattern Discovery New
25/73
tail is a reptile but an animal which does not have all these four attributes is not a
reptile. Problem with Frequent Association Rules
Let us consider the comparison between the two approaches: Coherent Rule Mining
and Apriori.
Table 3.3.3 Frequent rules found for class Mammal using the two
approaches
Now by using the approach called coherent rule mining, we get 5 coherent rules,
from which we get 10 association rules which are logically correct. But by using
Apriori approach, we get some unnecessary association rules that are not
interesting.
By using Apriori approach we get an association rule: -
Domestic (1) mammal (1) (whose support = 7.9 %, confidence =
61.5%), this can be shown through a table given below: -
Table 3.3.4Table for the above given association rule
Frequency of co-
occurrences
Consequent Y
Y={Mammal(1
)}
Not
Y={Mammal(0)}
Total
-
7/30/2019 Logic Based Pattern Discovery New
26/73
Antecede
nt
X
X={Domestic(1)}
8 5
13
Not
X={Domestic(0)}
33 55
88
Total 41 60 101
The above table shows that the association rule does not satisfy one condition for
equivalence as: -
S (X U Y) < S ( X U Y) (i.e. 8 < 33)
So this association rule is not considered to be interesting, as it does not satisfy one
condition for equivalence [i.e. S (X U Y) > S ( X U Y)]. We can also observe that, 33
out of 41 mammals i.e. 80.5% of mammals are not domestic, but this fact is ignored
and a weak association rule like
Domestic (1) mammal (1) is reported which when used in a business
application which leads to wrong decisions.
Enhancement to Proposed System
-
7/30/2019 Logic Based Pattern Discovery New
27/73
Here we are using the concept of pruning; Pruning is the process of removing
the super sets of the item sets that do not satisfy any one of the four conditions for
equivalence.
For ex: - let us consider the association rule Domestic (1) mammal (1) is not
considered to be interesting as it is logically incorrect. Therefore all its supersets are
pruned, this is called as downward closure property.
This downward closure property is used within the Forecast To Prune Technique,
where we calculate the coherent rule measure H for an itemset. Now by
considering an opening window value w% (i.e. minimum support threshold) we
calculate a moving window value mv, where mv = H (H * w %). If the H value ofa superset is not within the range (mv, H) of the itemset, then that superset is
pruned.
We calculate the coherent rule measure H of an association rule as follows:-
H = [min (Cov Y, m Cov Y) min (q1, q2) min (q3, q4)]
-
7/30/2019 Logic Based Pattern Discovery New
28/73
min (Cov Y, m Cov Y)
Here Cov Y = q1 + q3, m Cov Y = q2 + q4, q1 = X U Y, q2 = X U Y, q3 =
X U Y and q4 = X U Y.
Table 3.3.5 Table for the above association rule: - Milk (1) Mammal
(1)
Frequency of co-
occurrences
Consequent Y
Y={Mammal(1)
}
Not Y={Mammal(0)} Total
Anteceden
t X
X={milk(1)}
41 0 41
Not X={milk(0)}
0 60 60
Total 41 60 101
We calculate the coherent rule measure H of above association rule as per
above given table as follows:-
H = [min (41, 60) min (41, 0) min (0, 60)] = 1
min (41, 60)
and mv = H (H * 5%) = 1 (1* 5%) = 1 0.05 = 0.95
Now, we consider a superset of the itemset {milk (1), mammal (1)} which is:-
{milk (1), feathers(0), and mammal (1)}, whose association rule is:-
Milk (1), feathers (0) mammal (1) whose H value is 1. Which is
within the range of (mv, H) i.e. (0.95,1). Therefore it is not pruned.
-
7/30/2019 Logic Based Pattern Discovery New
29/73
CHAPTER 4
SYSTEM STUDY
4.1 Feasibility Study
4.1.1 Technical Feasibility
Evaluating the technical feasibility is the trickiest part of a feasibility study.
This is because, at this point in time, not too many detailed design of the system,making it difficult to access issues like performance, costs on (on account of the
kind of technology to be deployed) etc. A number of issues have to be considered
while doing a technical analysis.
i) Understand the different technologies involved in the proposed system:
Before commencing the project, we have to be very clear about what are the
technologies that are to be required for the development of the new system.
i) Find out whether the organization currently possesses the required
technologies:
Is the required technology available with the organization?
If so is the capacity sufficient?
For instance
Will the current printer be able to handle the new reports and forms required
for the new system?
4.1.2 Operational Feasibility
Proposed project is beneficial only if it can be turned into information systems
that will meet the organizations operating requirements. Simply stated, this test of
feasibility asks if the system will work when it is developed and installed. Are there
-
7/30/2019 Logic Based Pattern Discovery New
30/73
major barriers to Implementation? Here are questions that will help test the
operational feasibility of a project.
Is there sufficient support for the project from management from users? If the
current system is well liked and used to the extent that persons will not be able to
see reasons for change, there may be resistance. Are the current business methods
acceptable to the user? If they are not, Users may welcome a change that will bring
about a more operational and useful systems. Have the user been involved in the
planning and development of the project? Early involvement reduces the chances of
resistance to the system and in general and increases the likelihood of successful
project. Since the proposed system was to help reduce the hardships encountered.
In the existing manual system, the new system was considered to be operational
feasible.
4.1.3 Economical Feasibility
Economic feasibility attempts 2 weigh the costs of developing and
implementing a new system, against the benefits that would accrue from having the
new system in place. This feasibility study gives the top management the economic
justification for the new system.
A simple economic analysis which gives the actual comparison of costs and
benefits are much more meaningful in this case. In addition, this proves to be a
useful point of reference to compare actual costs as the project progresses. There
could be various types of intangible benefits on account of automation. These could
include increased customer satisfaction, improvement in product quality better
decision making timeliness of information, expediting activities, improved accuracy
of operations, better documentation and record keeping, faster retrieval of
information, better employee morale.
The system is completely based on the Model View Controller architecture.This architecture defines a pattern in which the three individual components will
work to-gather. The model is considered with all the business logic. The view will
consider the user interface design and controller will transfer the data between the
model and view.
-
7/30/2019 Logic Based Pattern Discovery New
31/73
In our system the view is designed using Java swing components provided
with java programming language. The model and controller are developed using
pure core java classes.
The following block diagram will show the MVC architecture.
CHAPTER 5
REQIREMENT ANALYSIS
5.1 Functional Requirements
Inputs:
The input to the system will be a dataset. Zoo dataset has taken
as an input dataset in this project. The inputs will be as follows.
Select type of animal:
Select one animal type among the given seven animal types.
Processing
The input data i.e. zoo data is processed by the model.
Output
The output will be coherent rules which satisfy the
propositional logic.
Performance requirements
-
7/30/2019 Logic Based Pattern Discovery New
32/73
Due to the high scope of the software, the performance
requirements are high. The speed at which the software is
required to operate is nominal.
Error message design
The design of error messages is an important part of the user
interface design. As user is bound to commit some errors or other
while designing a system the system should be designed to be
helpful by providing the user with information regarding the error
he/she has committed.
Error detection:
Even though every effort is make to avoid the occurrence of
errors , still a small portion of errors are always likely to occur ,
these type of errors can be discovered by using validations to
check input data.
The system is designed to be a user friendly one. In other words the system
has been designed to communicate effectively with the user. The system has been
designed with Button.
5.2 Non Functional Requirements
The major non-functional Requirements of the system are as follows
Usability
The system is designed with completely automated process hence there is no or
less user intervention.
Reliability
The system is more reliable because of the qualities that are inherited from the
chosen platform java. The code built by using java is more reliable.
Performance
-
7/30/2019 Logic Based Pattern Discovery New
33/73
This system is developing in the high level languages and using the advanced
front-end and back-end technologies it will give response to the end user on
client system with in very less time.
Supportability
The system is designed to be the cross platform supportable. The system is
supported on a wide range of hardware and any software platform, which is
having JVM, built into the system.
5.3 Hardware Requirements
The hardware used for the development of the project is:
PROCESSOR : A CPU with CORE2duo
RAM : 2 GB RAM
MONITOR : 17 COLOR
HARD DISK : 80 GB
5.4 Software Requirements
The software used for the development of the project is:
OPERATING SYSTEM : ANY OS
USER INTERFACE : AWT AND SWINGS
PROGRAMMING LANGUAGE : JAVA
IDE/WORKBENCH : MY ECLIPSE 6.0
CHAPTER 6
SYSTEM DESIGN
-
7/30/2019 Logic Based Pattern Discovery New
34/73
Design is multi-step process that focuses on data structure software
architecture, procedural details, (algorithms etc.) and interface between modules.
The design process also translates the requirements into the presentation of
software that can be accessed for quality before coding begins.
Computer software design changes continuously as new methods; better
analysis and broader understanding evolved. Software Design is at relatively early
stage in its revolution.
Therefore, Software Design methodology lacks the depth, flexibility and
quantitative nature that are normally associated with more classical engineering
disciplines. However techniques for software designs do exist, criteria for design
qualities are available and design notation can be applied.
6.1 Modules:
The system after careful analysis has been identified to be presented with the
following modules:
The Modules involved are
User Interface Module
Mapping Association rule Deriving Coherent Rules from mapped association rules
6.1.1 User Interface Module:
Rich user interface developed in order to select the type of animal from the
drop down list and a button to for generating coherent rules of that type.
6.1.2 Mapping Association rule
In this module we derive the approach of mapping Association rule to
equivalence. A complete mapping between the two is realized in three progressive
steps. Each step depends on the success of a previous step. In the first step, item
sets are mapped to propositions in an implication. Item sets can be either observed
or not observed in an association rule. Similarly, a proposition can either be true or
-
7/30/2019 Logic Based Pattern Discovery New
35/73
false in an implication. Analogously, the presence of an item set can be mapped to
a true proposition because this item set can be observed in transactional records.
6.1.3 Deriving Coherent Rules from mapped association rules
The pseudo implications of equivalences can be further defined into a
concept called coherent rules. We highlight that not all pseudo implications of
equivalences can be created using item sets X and Y. Nonetheless, if one pseudo
implication of equivalence can be created, then another pseudo implication of
equivalence also coexists. Two pseudo implications of equivalences always exist as
a pair because they are created based on the same, since they share the same
conditions, two pseudo implications of equivalences. Coherent rules meet the
necessary and sufficient conditions and have the truth table values of logical
equivalence, by definition; a coherent rule consists of a pair of pseudo implications
of equivalences that have higher support values compared to another two pseudo
implications of equivalences. Each pseudo implication of equivalence is an
association rule with the additional property that it can be mapped to a logical
equivalence.
-
7/30/2019 Logic Based Pattern Discovery New
36/73
6.2 Module Diagrams:
6.2.1 UML Diagrams
Use Case Diagram
-
7/30/2019 Logic Based Pattern Discovery New
37/73
-
7/30/2019 Logic Based Pattern Discovery New
38/73
Sequence Diagram:
-
7/30/2019 Logic Based Pattern Discovery New
39/73
Activity Diagram:
-
7/30/2019 Logic Based Pattern Discovery New
40/73
6.3 Algorithm used: Search Algorithm
We propose to search for coherent rules by exploiting the antimonotone property
found on the condition S(X, Y ) > S(~X, Y ) targeting at a preselected consequenceitem set Y .
6.3.1 Distinct Features of ChSearch
We list some features of ChSearch compared to a priori. Unlike a priori, ChSearch:
Does not require a preset minimum support threshold. ChSearch does not
require a preset a minimum support threshold to find association rules.
Coherent rules are found based on mapping to logical equivalences. From the
coherent rules, we can decouple the pair for two pseudoimplications of
equivalences. The latter can be used as association rules with the property
that each rule can be further mapped to a logical equivalence.
Does not need to generate frequent item sets. ChSearch does not need to
generate frequent item sets. Nor does it need to generate the association
rules within each item set. Instead, ChSearch finds coherent rules directly.
Coherent rules are found within the small number of candidate coherent rules
allowed through its constraints.
Identifies negative association rules. ChSearch, by default, also identifies negative
association rules. Given a set of transaction records that does not indicate item
absence, a priori cannot identify negative association rules. ChSearch finds the
-
7/30/2019 Logic Based Pattern Discovery New
41/73
negative pseudoimplications of equivalences and uses them to complement both
the positive and negative rules found.
6.3.2 Quality of Logic-Based Association Rules
Coherent rules are defined based on logic. This improves the quality of association
rules discovered because there are no missing association rules due to threshold
setting. A user can discover all association rules that are logically correct without
having to know the domain knowledge. This is fundamental to various application
domains. For example, one can discover the relations in a retail business without
having to study the possible relations among items. Any
Association rule that is not captured by coherent rules can be denied its
importance. These rules are either in contradiction with others (among the positiveand negative association rules) or less stringent compared to the definition of
logical equivalences.
As an example, consider that a nonlogic-based association rule is found within 100
transaction records between item i1 and item i2 with confidence at 75 percent and
support at 30 percent. This association rule is not important if the absence of the
same item i1 (i.e. ~i1) is found associated with item i2 with a higher confidence at
85 percent and a higher support at 51 percent. Without the further analysis, the
first discovery misleads decision makers to conclude that item i1 is associated with
item i2, whereas the relation having item ~i1 is, in fact, stronger. Coherent rules
avoid this problem all together based on logic.
-
7/30/2019 Logic Based Pattern Discovery New
42/73
CHAPTER 6
IMPLEMENTATION
Implementation is the most crucial stage in achieving a successful system
and giving the users confidence that the new system is workable and effective.
Implementation of a modified application to replace an existing one. This type of
conversation is relatively easy to handle, provide there are no major changes in the
system.
Each program is tested individually at the time of development using the
data and has verified that this program linked together in the way specified in theprograms specification, the computer system and its environment is tested to the
satisfaction of the user. The system that has been developed is accepted and
proved to be satisfactory for the user. And so the system is going to be
implemented very soon. A simple operating procedure is included so that the user
can understand the different functions clearly and quickly.
-
7/30/2019 Logic Based Pattern Discovery New
43/73
Initially as a first step the executable form of the application is to be created
and loaded in the common server machine which is accessible to the entire user
and the server is to be connected to a network. The final stage is to document the
entire system which provides components and the operating procedures of the
system.
6.1 SCREEN SHOTS
-
7/30/2019 Logic Based Pattern Discovery New
44/73
Fig.1.Animal Table
Screen description: the above figure represents the table that is used in this
project ,which is used for retrieval and comparison of attributes.
-
7/30/2019 Logic Based Pattern Discovery New
45/73
Fig.2. Main Window
Screen description: the above figure represents the mainwindow of this project,
through which we can select animal type that is mammal or reptile ete.
-
7/30/2019 Logic Based Pattern Discovery New
46/73
Fig.3. Selecting Mammal Type
Screen description: the above figure represents that mammal type selected from
dropdown list.
-
7/30/2019 Logic Based Pattern Discovery New
47/73
-
7/30/2019 Logic Based Pattern Discovery New
48/73
Fig.4. Coherent rules generated for Mammal type
Screen description: the above figure represents the output generated (Coherent
rules) for the mammal type.
-
7/30/2019 Logic Based Pattern Discovery New
49/73
6.2 SAMPLE CODE
Orderedpowerset.java
package coherent;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.StringTokenizer;
public class OrderedPowerSet {
private ArrayList list=new ArrayList();
Iterator it1=null;
ArrayList result=new ArrayList();
public ArrayList getSet(String[] src)
{
result.add(" ");
int source[]=new int[src.length];
for(int var=0;var
-
7/30/2019 Logic Based Pattern Discovery New
50/73
Iterator it1=list.iterator();
while(it1.hasNext())
{
ArrayList list1=getSetList(list,source,src);
list=new ArrayList();
list=list1;
list1=new ArrayList();
it1.next();
}
return result;
}
public ArrayList getSetList(ArrayList list,int[]
source,String[] src)
{
ArrayList res=new ArrayList();
it1=list.iterator();
while(it1.hasNext())
{
String s=(String)it1.next();
String ss=s;
int x=Integer.parseInt(getLastToken(s,","));
for(int i=x;i
-
7/30/2019 Logic Based Pattern Discovery New
51/73
String s1=ss+","+source[i];
res.add(s1);
addToResult(src,s1);
}
}
return res;
}
public void addToResult(String src[],String str)
{
StringTokenizer st=new StringTokenizer(str);
StringBuffer sb=new StringBuffer();
while(st.hasMoreTokens())
{
int loc=Integer.parseInt(st.nextToken(","));
if(st.hasMoreTokens())
{
sb=sb.append(src[loc-1]+",");
}
else
{
sb=sb.append(src[loc-1]);
}
}
String r=new String(sb);
-
7/30/2019 Logic Based Pattern Discovery New
52/73
result.add(r);
}
private String getLastToken(String strValue,String token )
{
String strlttoken = null;
String []strArray = strValue.split(token);
strlttoken = strArray[strArray.length-1];
return strlttoken;
}
}
Powerset.java
package coherent;
import java.sql.SQLException;
import java.util.*;
import java.io.FileWriter;
import java.io.IOException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
public class PowerSet {
-
7/30/2019 Logic Based Pattern Discovery New
53/73
ArrayList list;
int values[]=new int[4];
public void getSet(String args[],String filename) throws IOException {
list=new ArrayList();
OrderedPowerSet ops=new OrderedPowerSet();
list=ops.getSet(args);
FileWriter fw=new FileWriter(filename);
Iterator itr = list.iterator();
StringBuffer s=new StringBuffer();
int powercount=0;
while(itr.hasNext())
{
powercount+=1;
String item=itr.next().toString().replace("{","[").replace("}","]");
s=s.append("{"+item+"},");
}
String result=new String(s);
fw.write(result);
fw.flush();
fw.close();
System.out.println(powercount);
System.out.println("With "+filename+" ,PowerSet is generated
in current working directory");
s=new StringBuffer();
-
7/30/2019 Logic Based Pattern Discovery New
54/73
setList(list);
result=null;
}
int len;
Connection con=null;
Statement stmt=null;
CompareTable ct = new CompareTable();
public void doCalculation(ArrayList list1,ArrayList list2,int
selectedatr) throws SQLException{
ct.connectionEstablish();
Iterator it2=list2.iterator();
String qryatr2=new String();
while(it2.hasNext()){
String s=it2.next().toString();
s=s.replace('[', ' ');
s=s.replace(']', ' ');
s=s.trim();
if(!s.equals("")){
qryatr2=new String(s);
}
}
DBConnection db=new DBConnection();
int q1=0,q2=0,q3=0,q4=0;
try {
-
7/30/2019 Logic Based Pattern Discovery New
55/73
con = db.getConnection();
stmt=con.createStatement();
} catch (ClassNotFoundException ex) {
Logger.getLogger(PowerSet.class.getName()).log(Level.SEVERE, null, ex);
}
int totalcount=0;
Iterator it1=list1.iterator();
int lines=0;
while(it1.hasNext())
{
String s=it1.next().toString();
s=s.replace('[', ' ');
s=s.replace(']',' ');
s=s.trim();
if(!s.equals(""))
{
StringTokenizer st=new StringTokenizer(s,",");
int i=0;
len=0;
while(st.hasMoreTokens()){
st.nextToken();
len=len+1;
}
StringTokenizer st1=new StringTokenizer(s,",");
-
7/30/2019 Logic Based Pattern Discovery New
56/73
String qryatrs1[]=new String[len];
boolean legbo=false;
while(st1.hasMoreTokens())
{
qryatrs1[i]=st1.nextToken().trim();
if(qryatrs1[i].equals("LEGS"))
{
legbo=true;
}
i=i+1;
}
int legatr[]={0,2,4,5,6,8};
if(legbo)
{
for(int j=0;j
-
7/30/2019 Logic Based Pattern Discovery New
57/73
rs.next();
q1=rs.getInt(1);
rs.next();
q2=rs.getInt(1);
rs.next();
q3=rs.getInt(1);
rs.next();
q4=rs.getInt(1);
if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3)))
{
totalcount+=1;
String rel1=ct.displayOutput1ForLeg(qryatrs1,qryatr2,legatr[j]);
System.out.println(q1+" "+q2+" "+q3+" "+q4);
System.out.println(rel1);
values[0]=q1;
values[1]=q2;
values[2]=q3;
values[3]=q4;
this.setValues(values);
this.setResult(rel1);
}
}
}
-
7/30/2019 Logic Based Pattern Discovery New
58/73
else
{
String qry1=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,true,true);
String qry2=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, true, false);
String qry3=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, true);
String qry4=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, false);
String qry=qry1+" UNION ALL "+qry2+" UNION ALL "+qry3+" UNION ALL "+qry4;
ResultSet rs=stmt.executeQuery(qry);
rs.next();
q1=rs.getInt(1);
rs.next();
q2=rs.getInt(1);
rs.next();
q3=rs.getInt(1);
rs.next();
q4=rs.getInt(1);
if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3)))
{
String re1=ct.displayOutput1(qryatrs1,qryatr2);
System.out.println(q1+" "+q2+" "+q3+" "+q4);
System.out.println(re1);
values[0]=q1;
values[1]=q2;
values[2]=q3;
-
7/30/2019 Logic Based Pattern Discovery New
59/73
values[3]=q4;
this.setValues(values);
this.setResult(re1);
this.setX(qryatrs1);
this.setY(qryatr2);
totalcount+=1;
}
}
}
}
System.out.println("Total Count = "+totalcount);
ct.closeConnection();
}
public void setList(ArrayList list)
{
this.list=list;
}
public ArrayList getList()
{
return list;
}
public void setValues(int[] values)
{
this.values=values;
-
7/30/2019 Logic Based Pattern Discovery New
60/73
}
public int[] getValues()
{
return values;
}
private String[] x;
private String y;
private String result;
public String getResult() {
return result;
}
public void setResult(String result) {
this.result = result;
}
public String[] getX() {
return x;
}
public void setX(String[] x) {
this.x = x;
}
public String getY() {
return y;
}
-
7/30/2019 Logic Based Pattern Discovery New
61/73
public void setY(String y) {
this.y = y;
}
}
Comparetable.java
public class CompareTable {
Connection con=null;
Statement stmt=null;
ResultSet rs=null;
CompareTable()
{
}
public void connectionEstablish() throws SQLException
{
try {
con = DBConnection.getConnection();
} catch (ClassNotFoundException ex) {
}
}
public void closeConnection() throws SQLException
-
7/30/2019 Logic Based Pattern Discovery New
62/73
{
con.close();
}
public String prepareQuery(String atr1[],String atr2,int i,int j,boolean
b1,boolean b2) throws SQLException
{
StringBuffer sb=new StringBuffer();
if(!b1)
{
sb=sb.append("NOT(");
}
else
{
sb=sb.append("(");
}
for(int k=0;k0)
{
sb=sb.append(" and ");
}
sb=sb.append(atr1[k]+"="+i);
}
sb=sb.append(")");
-
7/30/2019 Logic Based Pattern Discovery New
63/73
if(b2)
{
sb=sb.append(" and (TYPE="+j+")");
}
else
{
sb=sb.append(" and NOT(TYPE="+j+")");
}
String str=new String(sb);
String qry="select count(*) from animal where "+str;
return qry;
}
public String displayOutput1(String atr1[],String atr2)
{
StringBuffer s=new StringBuffer();
s=s.append("{");
for(int i=0;i
-
7/30/2019 Logic Based Pattern Discovery New
64/73
for(int i=0;i
-
7/30/2019 Logic Based Pattern Discovery New
65/73
}
}
s=s.append(" } ");
s=s.append("==> { "+atr2+"(1) }\n");
s=s.append("Not{ ");
for(int i=0;i
-
7/30/2019 Logic Based Pattern Discovery New
66/73
if(!b1)
{
sb=sb.append("NOT(");
}
else
{
sb=sb.append("(");
}
for(int k=0;k0)
{
sb=sb.append(" and ");
}
if(atr1[k].equals("LEGS"))
{
sb=sb.append(atr1[k]+"="+legatr);
}
else
{
sb=sb.append(atr1[k]+"="+i);
}
}
sb=sb.append(")");
-
7/30/2019 Logic Based Pattern Discovery New
67/73
if(b2)
{
sb=sb.append(" and (TYPE="+j+")");
}
else
{
sb=sb.append(" and NOT(TYPE="+j+")");
}
String str=new String(sb);
String qry="select count(*) from animal where "+str;
return qry;
}
}
-
7/30/2019 Logic Based Pattern Discovery New
68/73
CHAPTER 7
SCOPE FOR FUTURE DEVELOPMENT
Every application has its own merits and demerits. The project has covered
almost all the requirements. Further requirements and improvements can easily be
done since the coding is mainly structured or modular in nature. Changing the
existing modules or adding new modules can append improvements.
Further enhancements:
Further enhancements can be made to the application, so that the windows
application functions are very attractive and useful manner than the present one.
We applied the logic on zoo dataset. We can apply the same logic to any transaction
dataset by doing slight modifications as well.
-
7/30/2019 Logic Based Pattern Discovery New
69/73
CHAPTER 8
CONCLUSION
We used mapping to logical equivalences according to propositional logic to
discover all interesting association rules without loss. These association rules
include item sets that are frequently and infrequently observed in a set of
transaction records. In addition to a complete set of rules being considered, these
association rules can also be reasoned as logical implications because they inherit
propositional logic properties. Having considered infrequent items, as well as being
implicational, these newly discovered association rules are distinguished from
typical association rules. These new association rules reduce the risks associated
with using an incomplete set of association rules for decision making, as following:
Our new set of association rules avoids reporting that item A is associated
with item B if there is a stronger association between item A and the absence
of item B. Using prior association rules that do not consider this situation
could lead a user to erroneous conclusions about the relationships among
items in a data set. Again, identifying the strongest rule among the same
items will promote information correctness and appropriate decision making.
The risks associated with incomplete rules are reduced fundamentally
because our association rules are created without the user having to identify
a minimum support threshold. Among the large number of association rules,
only those that can be mapped to logical equivalences according to
propositional logic are considered interesting and reported.
-
7/30/2019 Logic Based Pattern Discovery New
70/73
CHAPTER 9
BIBLOGRAPHY
Books:
Java 2 complete Reference by Herbert Schieldt
2. Software Engineering, A Practitioners Approach, 6th Edition, Tata
McGrawHill
3. Software Testing principles and practices, Srinivasan Desikan, Gopalaswami
Ramesh, Pearson edition, India.
4. A Unified Modeling Language User Guide, 2nd edition, Book by Grady Booch,
James RamBaugh, IvarJacobson for UML concepts and models.
References:
Logic-Based Pattern Discovery Alex Tze Hiang Sim, Maria Indrawan, Samar
Zutshi and Bala Srinivasan.
R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules between
Sets of Items in Large Databases,
Sim, A.T.H, Indrawan, M., Srinivasan, B., Mining Infrequent and Interesting
Rules from Transaction Records
S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, Dynamic Itemset Counting and
Implication Rules for Market Basket Data,
X. Wu, C.Zhang & S.Zhang Mining Both Positive and Negative Association
Rules
-
7/30/2019 Logic Based Pattern Discovery New
71/73
CHAPTER 10
APPENDIX
10.1 List of Symbols
S.NO SYMBOL NAME SYMBOL DESCRIPTION
1 ClassClasses represent a collection of
similar entities grouped together.
2 Association
Association represents a static
relationship between classes.
3 Aggregation
Aggregation is a form of association.
It aggregates several classes into
single class.
4 Actor
Actors are the users of the system
and other external entity that react
with the system.
5 Use Case
A use case is a interaction between
the system and the external
environment.
6 Relation (Uses)It is used for additional process
communication.
7 CommunicationIt is the communication between
various use cases.
-
7/30/2019 Logic Based Pattern Discovery New
72/73
8 State
It represents the state of a process.
Each state goes through various
flows.
9 Initial StateIt represents the initial state of the
object.
10 Final StateIt represents the final state of the
object.
11 Control FlowIt represents the various control flow
between the states.
12 Decision BoxIt represents the decision making
process from a constraint.
13 Node
Deployment diagrams use the nodes
for representing physical modules,
which is a collection of components.
14 Data Process/State
A circle in DFD represents a state or
process which has been triggered
due to some event or action.
15 External Entity
It represent any external entity such
as keyboard, sensors etc which are
used in the system.
16 TransitionIt represent any communication that
occurs between the processes.
17 Object Lifeline
Object lifelines represents the
vertical dimension that objects
communicates.
18 MessageIt represents the messages
exchanged.
-
7/30/2019 Logic Based Pattern Discovery New
73/73
10.2 List of Abbreviations
S.NO ABBREVATION DESCRIPTION
1 DFD Data Flow Diagram
2 API Application Programming Interface
3 UML Unified Modelling Language
4 GUI Graphical User Interface
5 IDE Integrated Development Environment
6 LBPD Logic based Pattern Discovery
7 AR Association Rule
8 PD Pattern Discovery