Necessity is the mother of invention: using non core staff to provide core services - Debra McCann
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion...
-
Upload
ethelbert-neal -
Category
Documents
-
view
225 -
download
1
Transcript of Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion...
![Page 1: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/1.jpg)
Data Mining: Data Mining: Concepts & TechniquesConcepts & Techniques
![Page 2: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/2.jpg)
Motivation: Necessity is the Mother of Invention• Data explosion problem
– Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories
• We are drowning in data, but starving for knowledge!
• Solution: Data warehousing and data mining– Data warehousing and on-line analytical processing
– Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
![Page 3: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/3.jpg)
Evolution of Database Technology
![Page 4: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/4.jpg)
![Page 5: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/5.jpg)
![Page 6: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/6.jpg)
What Is Data Mining?
• Data mining (knowledge discovery in databases): – Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) information or patterns from data in large databases
• Alternative names and their “inside stories”: – Data mining: a misnomer?
– Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
• What is not data mining?– (Deductive) query processing.
– Expert systems or small ML/statistical programs
![Page 7: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/7.jpg)
Data Mining: A KDD Process
Data mining: the core of knowledge discovery process
![Page 8: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/8.jpg)
Steps of a KDD Process• Learning the application domain:
– relevant prior knowledge and goals of application
• Creating a target data set: data selection• Data cleaning and preprocessing: (may take 60% of effort!)• Data reduction and transformation:
– Find useful features, dimensionality/variable reduction, invariant representation.
• Choosing functions of data mining – summarization, classification, regression, association, clustering.
• Choosing the mining algorithm(s)• Data mining: search for patterns of interest• Pattern evaluation and knowledge presentation
– visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge
![Page 9: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/9.jpg)
• The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database– It includes data selection, cleaning,
enrichment, coding, data mining, and reporting
– Data Mining is the key stage of Knowledge Discovery Process
• The process of finding the desired information from large database
Knowledge Discovery Process
![Page 10: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/10.jpg)
Knowledge Discovery Process• Example: the database of a magazine publisher
which sells five types of magazines – on cars, houses, sports, music and comics– Data mining:
• Find interesting categorical properties
– Questions:• What is the profile of a reader of a car magazine?
• Is there any correlation between an interest in cars and an interest in comics?
• The knowledge discovery process consists of six stages
![Page 11: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/11.jpg)
Data Selection
• Select the information about people who have subscribed to a magazine
![Page 12: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/12.jpg)
• Pollutions: Type errors, moving from one place to another without notifying change of address, people give incorrect information about themselves – Pattern Recognition Algorithms
Cleaning
![Page 13: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/13.jpg)
• Lack of domain consistency
Cleaning
![Page 14: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/14.jpg)
Enrichment
• Need extra information about the clients consisting of date of birth, income, amount of credit, and whether or not an individual owns a car or a house
![Page 15: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/15.jpg)
• The new information need to be easily joined to the existing client records– Extract more knowledge
Enrichment
![Page 16: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/16.jpg)
• We select only those records that have enough information to be of value (row)
• Project the fields in which we are interested (column)
Coding
![Page 17: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/17.jpg)
• Code the information which is too detailed – Address to region– Birth date to age– Divide income by 1000– Divide credit by 1000– Convert cars yes-no to 1-0– Convert purchase date to month numbers
starting from 1990• The way in which we code the information will
determine the type of patterns we find• Coding has to be performed repeatedly in order to get
the best results
Coding
![Page 18: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/18.jpg)
Coding• The way in which we code the information will
determine the type of patterns we find
![Page 19: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/19.jpg)
• We are interested in the relationships between readers of different magazines– Perform flattening operation
Coding
![Page 20: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/20.jpg)
• We may find the following rules– A customer with credit > 13000 and aged between 22
and 31 who has subscribed to a comics at time T will very likely subscribe to a car magazine five years later
– The number of house magazines sold to customers with credit between 12000 and 31000 living in region 4 is increasing
– A customer with credit between 5000 and 10000 who reads a comics magazine will very likely become a customer with credit between 12000 and 31000 who reads a sports and a house magazine after 12 years
Data mining
![Page 21: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/21.jpg)
Knowledge Discovery Process
![Page 22: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/22.jpg)
Business-Question-Driven Process
![Page 23: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/23.jpg)
Data Mining and Business Intelligence
Increasing potentialto supportbusiness decisions End User
Business Analyst
DataAnalyst
DBA
MakingDecisions
Data Presentation
Visualization Techniques
Data MiningInformation Discovery
Data Exploration
OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data SourcesPaper, Files, Information Providers, Database Systems, OLTP
![Page 24: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/24.jpg)
Architecture of a Typical Data Mining System
![Page 25: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/25.jpg)
Data Mining: On What Kind of Data?• Relational databases
• Data warehouses
• Transactional databases
• Advanced DB and information repositories– Object-oriented and object-relational databases– Spatial databases– Time-series data and temporal data– Text databases and multimedia databases– Heterogeneous databases– WWW
![Page 26: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/26.jpg)
![Page 27: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/27.jpg)
![Page 28: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/28.jpg)
![Page 29: Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649dc35503460f94ab515f/html5/thumbnails/29.jpg)