Chapter 11 4 Topic data mining

download Chapter 11 4 Topic data mining

of 10

Transcript of Chapter 11 4 Topic data mining

  • 7/31/2019 Chapter 11 4 Topic data mining

    1/10

    Social impacts of data mining

  • 7/31/2019 Chapter 11 4 Topic data mining

    2/10

    Is Data Mining a Hype or Will It Be Persistent?

    Data mining is a technology

    Technological life cycle

    Innovators

    Early adopters

    Chasm

    Early majority

    Late majority Laggards

  • 7/31/2019 Chapter 11 4 Topic data mining

    3/10

    Life Cycle of Technology Adoption

    Data mining is at Chasm!? Existing data mining systems are too generic Need business-specific data mining solutions and smooth

    integration of business logic with data mining functions

  • 7/31/2019 Chapter 11 4 Topic data mining

    4/10

    Data Mining: Managers' Business or Everyone's?

    Data mining will surely be an important tool formanagers decision making Bill Gates: Business @ the speed of thought

    The amount of the available data is increasing, and

    data mining systems will be more affordable Multiple personal uses

    Mine your family's medical history to identify genetically-related medical conditions

    Mine the records of the companies you deal with

    Mine data on stocks and company performance, etc.

    Invisible data mining

    Build data mining functions into many intelligent tools

  • 7/31/2019 Chapter 11 4 Topic data mining

    5/10

    Social Impacts: Threat to Privacy and Data

    Security?

    Is data mining a threat to privacy and data security? Big Brother, Big Banker, and Big Business are carefully

    watching you

    Profiling information is collected every time

    credit card, debit card, supermarket loyalty card, or frequent flyercard, or apply for any of the above

    You surf the Web, rent a video, fill out a contest entry form,

    You pay for prescription drugs, or present you medical care numberwhen visiting the doctor

    Collection of personal data may be beneficial for companies

    and consumers, there is also potential for misuse Medical Records, Employee Evaluations, etc.

  • 7/31/2019 Chapter 11 4 Topic data mining

    6/10

    Protect Privacy and Data Security

    Fair information practices International guidelines for data privacy protection

    Cover aspects relating to data collection, purpose, use,quality, openness, individual participation, andaccountability

    Purpose specification and use limitation

    Openness: Individuals have the right to know whatinformation is collected about them, who has access to thedata, and how the data are being used

    Develop and use data security-enhancing techniques Blind signatures

    Biometric encryption

    Anonymous databases

  • 7/31/2019 Chapter 11 4 Topic data mining

    7/10

    11.5 Trends in Data Mining

    Application exploration development of application-specific data mining system

    Invisible data mining (mining as built-in function)

    Scalable data mining methods

    Constraint-based mining: use of constraints to guide datamining systems in their search for interesting patterns

    Integration of data mining with database systems,

    data warehouse systems, and Web database

    systems This will ensure data availability, data mining portability,

    scalability, high performance, and an integrated

    information processing environment for multidimensional

    data analysis and exploration.

  • 7/31/2019 Chapter 11 4 Topic data mining

    8/10

    Standardization of data mining language A standard will facilitate systematic development, improve

    interoperability, and promote the education and use of

    data mining systems in industry and society

    Visual data mining The systematic study and development of visual data

    mining techniques will facilitate the promotion and use of

    data mining as a tool for data analysis.

    New methods for mining complex types of data More research is required towards the integration of data

    mining methods with existing data analysis techniques for

    the complex types of data

  • 7/31/2019 Chapter 11 4 Topic data mining

    9/10

    Web mining Web content mining, Weblog mining, and data mining services

    on the Internet will become one of the most important andflourishing subfields in data mining.

    Distributed data mining:

    Traditional data mining methods, designed to work at acentralized location, do not work well in many of the distributedcomputing environments present today (e.g., the Internet,intranets, local area networks, high-speed wireless networks,and sensor networks).

    Real-time or time-critical data mining: Many applications involving stream data (such as e-commerce,

    Web mining, stock analysis, intrusion detection, mobile datamining, and data mining for counterterrorism) require dynamicdata mining models to be built in real time. Additionaldevelopment is needed in this area.

  • 7/31/2019 Chapter 11 4 Topic data mining

    10/10

    Privacy protection and information security in data mining An abundance of recorded personal information available in

    electronic forms and on the Web, coupled with increasingly

    powerful data mining tools, poses a threat to our privacy and

    data security. Growing interest in data mining for

    counterterrorism also adds to the threat. Graph mining, link analysis, and social network analysis:

    Graph mining, link analysis, and social network analysis are

    useful for capturing sequential, topological, geo- metric, and

    other relational characteristics of many scientific data sets and

    social data sets

    Multirelational and multidatabase data mining:

    Most data mining approaches search for patterns in a single

    relational table or in a single database.