885 Fall 2009

51
Copyright 2009, The Ohio State University Data Mining Research at Ohio State Srinivasan Parthasarathy DL 693 [email protected]

description

slide of data mining

Transcript of 885 Fall 2009

  • Items

    People

    Beer

    Cheese

    Diapers

    Eggs

    Dan

    1

    1

    Kathy

    1

    1

    Chuck

    1

    1

    1

    Bob

    1

    1

    Items

    People

    Beer

    Cheese

    Diapers

    Eggs

    Dan

    1

    1

    Kathy

    1

    1

    Chuck

    1

    1

    1

    Bob

    1

    1

    Copyright 2009, The Ohio State University

    Step2B: Algorithm SelectionData-Oriented: Boolean vs. quantitative associationsAssociation on discrete vs. continuous dataResult-Oriented: Single level vs. multiple-level analysisE.g, [Coors, Huggies] or [Beer, Diapers]Result-Oriented: Simple vs. constraint-basedE.g., small sales (sum < 100) trigger big buys (sum > 1,000)?Performance Oriented SelectionScalable Parallel and Sequential algorithmsSampling based methods for fast approximate results

    Copyright 2009, The Ohio State University

    Step 3: Knowledge InterpretationA. Post Processing of mining resultsWhen you have too many patterns, you need to:Order them using some interestingness metricPass them to the visualization tool incrementallyB. VisualizationRender the patterns in an easy-to-use intuitive manner Highlight most relevant patterns

    Copyright 2009, The Ohio State University

    Step 3B: Visualization

    Copyright 2009, The Ohio State University

    T2: ClassificationData categorization based on a set of training objects.Applications: credit approval, target marketing, medical diagnosis, treatment effectiveness analysis, automatic text categorization etc.Goal: Develop a description for each class. classification of future test data, better understanding of each class, and prediction of certain properties.Engine data example horsepower 21