Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
-
Upload
rose-price -
Category
Documents
-
view
216 -
download
1
Transcript of Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Data Mining Techniques
• Cluster Analysis• Induction• Neural Networks• OLAP• Data Visualization
Association Rule
• An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database.
• Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items.
• The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.
Support
The support of an item set S is the percentage of those transactions in T which contain S.
• If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.
Confidence
• Confidence of a candidate rule X Y is calculated as support(XY) / support(X).
• The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y
Example: Association Rule
• In a store we might have I={cheese,ham,bread,butter,salt,coke}
• A transaction could look like: t={bread,butter} for a customer who bought cheese and coke.
• An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.
Apriori Algorithm
• Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets.
• Use the frequent itemsets to generate the desired rules.
Apriori Algorithm(cont’d)
Pass 1
1. Generate the candidate itemsets in C1
2. Save the frequent itemsets in L1 Pass k 1. Generate the candidate itemsets in Ck from the frequent
itemsets in Lk-1
2. Join Lk-1 with Lk-1, as follows: insert into Ck select p.item1, q.item1, . . . , p.itemk-1, q.itemk-1 from Lk-1 p, Lk-1q where p.item1 = q.item1, . . . , p.itemk-1 < q.itemk-1
Apriori Algorithm(cont’d)
3. Generate all (k-1)-subsets from the candidate itemsets in Ck
4. Prune all candidate itemsets from Ck where some
(k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1
2. Scan the transaction database to determine the support for each candidate itemset in Ck
3. Save the frequent itemsets in Lk
Smart Web Search Agents• Data Search Engines >> Information Search Agents
- Traditional searching on the Web is done using one of the following three:
- Directories (Yahoo, Lycos, etc)
- Search Engines (AltaVista, NorthernLight, etc)
- Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc)
All of these involve keyword searches; Drawback: not easily personalized,
too many results (although many give relevancy factors)
- local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!)
- local cache information base (containing mined information and discovered knowledge for efficient personal use)
- domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
Intelligent Tools for E-Business
• Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems
• Learning Algorithms, Heuristic Searching
• Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery
• Prediction & Time Series Analysis
• Information Retrieval, Intelligent User Interface
• Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
Enhancing E-Business Process Through Data Mining
• Quality of discovered knowledge
– Having right data
– Having appropriate data mining tools!!!
D a ta M in in g( Kn o w led g e d is c o v er y )
D AT A W ar eh o u s e
D AT A W ar eh o u s e
D AT A W ar eh o u s e
F ailu r e P atte r n s
Su cces s P at t ern s
F A IL U R E P at t ern s
SU C C E SS P at t ern s
• Traditional Data Mining Tools
– Simple query and reporting
– Visualization driven data exploration tools, OLAP
– Discovery process is user driven
Intelligent Data Mining Tools
• Automate the process of discovering patterns/knowledge in data
• Require hypothesis, exploration• Derive business knowledge (patterns) from data• Combine business knowledge of users with
results of discovery algorithms
D AT A W ar eh o u s e
D AT A W ar eh o u s e
D AT A W ar eh o u s e
F ailu r e P a tte r n s
Su cces s P at t ern s
F A IL U R E P at t ern s
SU C C E SS P at t ern s
Intelligent Information Agents
• The Data Mining Problem:– Clustering/ Classification– Association– Sequencing
• Viewed as an Optimization Problem
• Tools: Genetic Algorithms
Fuzzy Rules Discovering• Rules discovering : The discovery of
associations between business events, i.e. which items are purchased together
• In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge
• Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query
• Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data