Chapter 11 2 Applications and Trends in Data Mining

download Chapter 11 2 Applications and Trends in Data Mining

of 2

Transcript of Chapter 11 2 Applications and Trends in Data Mining

  • 7/31/2019 Chapter 11 2 Applications and Trends in Data Mining

    1/2

    Data mining system products and research prototypes

    How to Choose a Data Mining System?

    Commercial data mining systems have little in common

    o Different data mining functionality or methodology

    o May even work with completely different kinds of data sets

    Need multiple dimensional view in selection

    Data types: relational, transactional, text, time sequence, spatial?

    System issues

    o running on only one or on several operating systems?

    o a client/server architecture?

    o Provide Web-based interfaces and allow XML data as input and/or output?

    Data sources

    o ASCII text files, multiple relational data sources

    o support ODBC connections (OLE DB, JDBC)?

    Data mining functions and methodologies

    o One vs. multiple data mining functions

    o One vs. variety of methods per function

    More data mining functions and methods per function provide the user with

    greater flexibility and analysis power

    Coupling with DB and/or data warehouse systems

    o Four forms of coupling: no coupling, loose coupling, semitight coupling, and tight

    coupling

    Ideally, a data mining system should be tightly coupled with a database

    system

    Scalability

    o Row (or database size) scalability

    o Column (or dimension) scalability

    o Curse of dimensionality: it is much more challenging to make a system column

    scalable that row scalable

    Visualization tools

    o A picture is worth a thousand words

    o Visualization categories: data visualization, mining result visualization, mining

    process visualization, and visual data mining

    Data mining query language and graphical user interface

    o Easy-to-use and high-quality graphical user interface

    o Essential for user-guided, highly interactive data mining

    Examples of Data Mining Systems

    Mirosoft SQLServer 2005

    o Integrate DB and OLAP with mining

    o Support OLEDB for DM standard

    SAS Enterprise Miner

    o A variety of statistical analysis tools

    o Data warehouse tools and multiple data mining algorithms

    IBM Intelligent Miner

    o A wide range of data mining algorithms

    o Scalable mining algorithms

    o Toolkits: neural network algorithms, statistical methods, data preparation, and

    data visualization tools

    o Tight integration with IBM's DB2 relational database system

    SGI MineSeto Multiple data mining algorithms and advanced statistics

    o Advanced visualization tools

    Clementine (SPSS)

    o An integrated data mining development environment for end-users and developers

  • 7/31/2019 Chapter 11 2 Applications and Trends in Data Mining

    2/2

    o Multiple data mining algorithms and visualization tools

    11.3Additional Themes on Data Mining

    Due to the broad scope of data mining and the large variety of data mining

    methodologies, not all of the themes on data mining can be thoroughly covered

    11.3.1 Theoretical Foundations of Data Mining

    A solid and systematic theoretical foundation is important because it can help

    provide a coherent framework for the development, evaluation, and practice of data mining

    technology

    1. Data reduction:

    In this theory, the basis of data mining is to reduce the data representation.

    Data reduction trades accuracy for speed in response to the need to obtain quick

    approximate answers to queries on very large databases.

    Data reduction techniques include singular value decomposition (the driving element

    behind principal components analysis), wavelets, regression, log-linear models,

    histograms, clustering, sampling, and the construction of index trees

    2. Data compression:

    According to this theory, the basis of data mining is to compress the given data by

    encoding in terms of bits, association rules, decision trees, clusters, and so on.

    Encoding based on the minimum description length principle states that the best

    theory to infer from a set of data is the one that minimizes the length of the theory and

    the length of the data when encoded, using the theory as a predictor for the data.

    This encoding is typically in bits.

    3. Pattern discovery:

    In this theory, the basis of data mining is to discover patterns occurring in the

    database, such as associations, classification models, sequential patterns, and so on.

    Areas such as machine learning, neural network, association mining, sequential

    pattern mining, clustering, and several other subfields contribute to this theory.

    4. Probability theory:

    This is based on statistical theory.

    In this theory, the basis of data mining is to discover joint probability

    distributions of random variables, for example, Bayesian belief networks or hierarchical

    Bayesian models.

    5. Microeconomic view:

    The microeconomic view considers data mining as the task of finding patterns that are

    interesting only to the extent that they can be used in the decision-making process of

    some enterprise (e.g., regarding marketing strategies and production plans).

    This view is one of utility, in which patterns are considered interesting if they can

    be acted on.

    Enterprises are regarded as facing optimization problems, where the object is to

    maximize the utility or value of a decision.

    In this theory, data mining becomes a nonlinear optimization problem.

    6. Inductive databases:

    According to this theory, a database schema consists of data and patterns that are

    stored in the database.

    Data mining is therefore the problem of performing induction on databases, where the

    task is to query the data and the theory (i.e., patterns) of the database.

    This view is popular among many researchers in database systems.