Intro Duct In to Data Mining

download Intro Duct In to Data Mining

of 17

Transcript of Intro Duct In to Data Mining

  • 8/14/2019 Intro Duct In to Data Mining

    1/17

    1

    Introduction to Data Mining

    Jiang Li

    Department of Computer Science & InformationTechnology

    Austin Peay State University

  • 8/14/2019 Intro Duct In to Data Mining

    2/17

    2

    Outline

    Data Collected

    Knowledge Discovery An IterativeProcess

    Data Mining Examples

    Data Mining Functions and Algorithms

  • 8/14/2019 Intro Duct In to Data Mining

    3/17

    3

    Data Collected

    Business

    Wal-Mart

    20 million transactions a day

    Mobile Oil Corporation

    A 100 terabytes data warehouse

    Science

    The human genome database project

    Gigabytes of data

    NASA Earth Observing System (EOS)

    50 gigabytes data per hour

    Radio, Television, and Film Studios Multimedia databases

    WWW the infinite resources

    Email huge digital libraries

  • 8/14/2019 Intro Duct In to Data Mining

    4/17

    4

    Data vs. Knowledge

    Technology is available to help us collect data Bar code, cameras, scanners, Radars, satellites, etc.

    Technology is available to help us store data Databases, data warehouses, variety of repositories

    We are swamped by data that pours on us We need to interpret this data in search for new knowledge

    Our need is to extract interesting

    knowledge (rules, regularities,

    patterns, constraints) from data in

    large collections.

    We are drowning in information, but starving for

    knowledge.

    John Naisbitt

  • 8/14/2019 Intro Duct In to Data Mining

    5/17

    5

    Evolution of Database Technology

    1960s: Data collection, database creation (hierarchical and

    network models)

    1970s:

    Relational data model, relational DBMSimplementation

    1980s: Ubiquitous RDBMS, advanced data models

    (extended-relational, Object-Oriented, deductive,etc.) and application-oriented DBMS (spatial,scientific, engineering, etc.)

    1990s: Data mining and data warehousing, multimedia

    databases, and Web-based database technology

  • 8/14/2019 Intro Duct In to Data Mining

    6/17

    6

    Knowledge Discovery

  • 8/14/2019 Intro Duct In to Data Mining

    7/17

  • 8/14/2019 Intro Duct In to Data Mining

    8/178

    Steps of a KDD Process

    Learning the application domain relevant prior knowledge and goals of

    application

    Gathering and integrating of data

    Cleaning and preprocessing data (maytake 60% of effort!)

    Reducing and projecting data Find useful features, dimensionality/variable

    reduction,

    Choosing mining functions and algorithms summarization, classification, regression,

    association,

    Data mining: search for patterns of interest

    Evaluating results Interpretation: analysis of results

    visualization, alteration, removingredundant patterns,

    Use of discovered knowledge

  • 8/14/2019 Intro Duct In to Data Mining

    9/179

    Data Mining On What Kind ofData?

    Flat Files

    Generic Data Relational & Object-Relational Databases

    Object-Oriented Databases

    Multimedia Data Text Databases

    Audio, Image, and Video Databases

    Business Data Transactional Databases

    Engineering Data Spatial databases

    Temporal and Time-series databases

    WWW Data

  • 8/14/2019 Intro Duct In to Data Mining

    10/1710

    Data Mining Examples

    Data mining is primarily used today bycompanies with a strong consumer focus - retail,financial, communication, and marketingorganizations. It enables these companies to determine relationships

    among "internal" factors such as price, productpositioning, or staff skills, and "external" factors suchas economic indicators, competition, and customerdemographics.

    And, it enables them to determine the impact on sales,customer satisfaction, and corporate profits.

    Finally, it enables them to "drill down" into summary

    information to view detail transactional data.

  • 8/14/2019 Intro Duct In to Data Mining

    11/1711

    Data Mining Examples

    With data mining, a retailer could use point-of-salerecords of customer purchases to send targetedpromotions based on an individual's purchase history. By mining demographic data from comment or warranty cards,

    the retailer could develop products and promotions to appealto specific customer segments.

    Blockbuster Entertainment mines its video rentalhistory database to recommend rentals to individualcustomers.

    American Express can suggest products to itscardholders based on analysis of their monthlyexpenditures.

  • 8/14/2019 Intro Duct In to Data Mining

    12/1712

    Data Mining Examples

    WalMart is pioneering massive data mining to transformits supplier relationships.

    WalMart captures point-of-sale transactions from over 2,900stores in 6 countries and continuously transmits this data to

    its massive 7.5 terabyte Teradata data warehouse.

    WalMart allows more than 3,500 suppliers, to access data ontheir products and perform data analyses.

    These suppliers use this data to identify customer buying

    patterns at the store display level.

    They use this information to manage local store inventoryand identify new merchandising opportunities.

  • 8/14/2019 Intro Duct In to Data Mining

    13/17

    13

    Business Data Mining Examples

    The NBA is exploring a data mining application that can beused in conjunction with image recordings of basketballgames.

    The Advanced Scout software analyzes the movements ofplayers to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game

    played between the New York Knicks and the Cleveland Cavalierson January 6, 1995 reveals that when Mark Price played theGuard position, John Williams attempted four jump shots andmade each one!

    A coach can automatically bring up the video clips showing each

    of the jump shots attempted by Williams with Price on the floor,without needing to comb through hours of video footage.

    Those clips show a very successful pick-and-roll play in whichPrice draws the Knick's defense and then finds Williams for anopen jump shot.

    http://www.research.ibm.com/scout/home.htmlhttp://www.research.ibm.com/scout/home.html
  • 8/14/2019 Intro Duct In to Data Mining

    14/17

    14

    Data Mining Functions andAlgorithms

    Association Rules

    Data can be mined to identify associations. The butter->bread example is an example of associative mining.

    To find rules like inside(x, city) near(x, highway).

    Classification and Prediction Classify data based on the values in a classifying attribute, e.g.,

    classify countries based on climate

    classify cars based on gas mileage

    Stored data is used to locate data in predetermined groups.

    A restaurant chain could mine customer purchase data to determinewhen customers visit and what they typically order. This information

    could be used to increase traffic by having daily specials.

  • 8/14/2019 Intro Duct In to Data Mining

    15/17

    15

    Data Mining Functions andAlgorithms

    Clustering

    Data items are grouped according to logical relationships or

    consumer preferences. Data can be mined to identify market segments or consumer

    affinities.

    To cluster houses to find distribution patterns.

    Sequential patterns

    Data is mined to anticipate behavior patterns and trends. An outdoor equipment retailer could predict the likelihood of a

    backpack being purchased based on a consumer's purchase ofsleeping bags and hiking shoes.

    To find and characterize similar sequences and deviation data,

    e.g., stock analysis. To find segment-wise or total cycles or periodic behaviors in

    time-related data.

  • 8/14/2019 Intro Duct In to Data Mining

    16/17

    16

    Data Mining Linear Classification

    D e b t

    I n c o m e

    L o a n

    N o L o a n

    $ T

    A simple linear classification boundary for the loan dataset: shaded region denotes class no loan

  • 8/14/2019 Intro Duct In to Data Mining

    17/17

    17

    Data Mining - Confluence of MultipleDisciplines