Data Mining

29
Data Mining By: Chad Gregg John Wilder John Mary Lugemwa Yao Yao Bu

Transcript of Data Mining

Page 1: Data Mining

Data Mining

By:Chad GreggJohn Wilder

John Mary LugemwaYao Yao Bu

Page 2: Data Mining

Our Presentation:

General Overview/Brief HistoryThe Brushing TechniqueThe Technique of Using Neural NetworksData Mining & PrivacyCurrent Applications & Future Possibilities

Page 3: Data Mining

Overview & History

Data Mining: The process of finding patterns or correlations

among data

Evolution: Classical Statistics Artificial Intelligence Machine Learning

Page 4: Data Mining

Goals of Data Mining

PredictionIdentificationClassificationOptimization

Page 5: Data Mining

The Brushing Technique

Graphical exploratory data analysis

--visualize relations between variables

Could be 2D lines or 3D surfaces

Animated brushing and Automatic function refitting

Page 6: Data Mining

Pictures are easier to read

Page 7: Data Mining

Types of visualization

Page 8: Data Mining

Automated graphics

Page 9: Data Mining

Applications of the Brushing

Page 10: Data Mining

Neural Networks

A technique with roots in Cognitive Science and Artificial Intelligence

Data Mining adopted this technique as its own

What exactly are they?

Page 11: Data Mining

Learning

Repeatedly show input with correct outputSimilar to a humans brain

Changes neuron weights Change Firing threshold

Over-learning Problem

Page 12: Data Mining
Page 13: Data Mining
Page 14: Data Mining
Page 15: Data Mining

Possible Improvements

Make them autonomousUse a more correct model of the brain

Questions?

Page 16: Data Mining

Data Mining & Privacy

What is Privacy? Data being minedData Mining: Tracking Terrorist ActivitiesA heated debate; what are the issues?Proposed solutions Challenges

Page 17: Data Mining

The kind of your data being mined

Data involves information such as:• credit reports • credit card information and transactions, • student loan applications• bank account numbers, • tax payer identification numbers, and similar

information.• Medical records

Page 18: Data Mining

Data Mining: Tracking Terrorist Activities

Federal Government’ efforts to hunt terrorists after 9/11: Data is collected from both the federal agencies and the private

sector databases. United States General Account Office (GAO) report:

Out of the 199 data mining efforts, 122 involved personal information.

Private sector: out of the 54 data mining efforts, 36 involved personal information.

Federal agencies: of the 77 efforts identified, 46 relied on personal information.

Page 19: Data Mining

Privacy Concerns

Violation of privacy and sense of personal freedom protected by the Fourth Amendment.

Some information is too personal to be used for other purposes other than it was originally intended for.

Peoples’ private lives are put to unreasonable public scrutiny.

Eminent danger that some patterns may inaccurately match with a criminal profile which may lead to unreasonable charges and arrests.

Security concerns

Page 20: Data Mining

Federal Govt. Data Mining Programs

MATRIX The Multi-state Anti-terrorist Information

Exchange System

TIA Total Information Awareness

Page 21: Data Mining

What can be done

What Canada is proposing: Three optional waivers to customers when

giving out their information: i. State that no data mining be allowed on

customer’s data;

ii. Data mining be allowed only for internal use;

iii. Authorize internal or external use of the data.

Page 22: Data Mining

Challenges to the solution

Even if customers are informed beforehand about the purpose of the data, the challenge is that data mining extracts hidden patterns and rules; it is not easy to speculate what relationships will emerge.

The Federal Government’s exploitation private information in the interest of national security will continually create mixed concerning privacy issues.

If privacy issues are not adequately addressed, users of data mining technologies will be exposed to a wide range of legal challenges as the general public becomes more alerted to the potential misuse of

personal data generated from data mining.

Page 23: Data Mining

Current Applications

Public Sector Law Enforcement Policy Analysis

Private Sector Business Analysis Tool

Page 24: Data Mining

Future Possibilities

Government PoliciesCommercial Database Systems of the

FutureGrowing Research Opportunities

Page 25: Data Mining

Learn More:

Questions?Additional Resources

Chapter 27 of our Text

Page 26: Data Mining

Additional Resources:

Overview/History/Current Applications/Future Possibilities:

United States. Congress. House. Committee on Government Reform. Subcommittee on

Technology, Information Policy, Intergovernmental Relations, and the Census. Data Mining: Current Applications And Future Possibilities: hearing before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census of the Committee on Government Reform, House of Representatives, One Hindered Eighth Congress, 25, March 2003.

<http://purl.access.gpo.gov/GPO/LPS38964> Accessed on 3, December 2005.

This source if from a government hearing and contains an excellent overview on what data mining is and its history, data mining techniques, current applications, and thoughts as to what data mining will be like in the future. It is a bit of a lengthy document but it is easy to skip to different sections and understand what is being talked about. It gives an excellent understanding of data mining from a reliable source. I have found the discussion on the current applications of data mining especially useful.

Page 27: Data Mining

The Brushing Technique:

Wong, Pak Chung, and R. Daniel Bergeron. "Brushing Techniques for Exploring Volume Datasets." VIS '97: Proceedings of the 8th Conference on Visualization '97. Phoenix, Arizona, United States. <http://portal.acm.org/citation.cfm?id=267114>

This article composed by the Department of Computer Science at the University of New Hampshire describes several brushing techniques. It describes

qualitative brushing, planar brushing, and volume brushing.

Page 28: Data Mining

The Technique of Using Neural Networks:

Hill, T. & Lewicki, P. STATISTICS Methods and Applications. (2006) StatSoft, Tulsa, OK

This is a book about statistics that includes a chapter on data mining, providing a general overview and a small introduction to several techniques used in data mining. It also includes chapters about several of the techniques to give a more in depth explanation. The chapter of focus for me is Neural Networks. It gives a basic overview of artificial neural networks and their relation to the human brain and nervous system. It gives a brief discourse on human nerve function, and firing thresholds. The author then discusses how neural networks learn, and the general amount of information that needs to be shown for the information to be learned well. The book then touches on how over-learning can cause problems, comparing it to using too high of a degree polynomial in linear regression. Also, they give examples of several of the uses, and some problems associated with the use of neural networks.

Roy, Asim. Artificial Neural Networks- A Science in Trouble. (2000) SIGKDD Explorations, volume 1, issue 2, 33.

This is an article discussing shortcomings of the current concept of neural networks. The author discusses inaccuracy of the current neural network model, and claims changes need to be made to this model, making it similar to the human brain. The author also discusses what he believes to be over-optimism in regards to what neural networks can do (under the current model). He also talks about the necessity of creating a neural network that is autonomous, instead of always needing humans to feed in the inputs and outputs in order to learn.

Page 29: Data Mining

Data Mining & Privacy: United States General Accounting Office Data Mining: Federal Efforts Cover a Wide

Range of Uses. “Report to the Ranking Minority Member, Subcommittee of Financial Management, the Budget, and International Security, Committee on Governmental Affairs, U.S. Senate” 24, May 2004. <http://frwebgate.access.gpo. gov/cgi-bin/getdoc.cgi?dbname=gao&docid=f:d04548.pdf> Accessed on 2, December 2005.

This source is a government report on Data Mining. It reviews the understanding and current use data mining technologies. The report highlights the purposes of data mining efforts in government departments and agencies. Various Departments’ inventories of data mining efforts are provided in this report. Privacy concerns regarding the use data mining technology by the government is given consideration and highlights these concerns in light of exploitation of personal information. The source gave me a good understanding of the privacy concerns from a reliable source.

American Civil Liberties Union Technology and Liberty Program. Total Information Compliance: The TIA’s burden under the Wayne Amendment May 2003.<http://www.aclu.org/privacy/index.html> Accessed on 2, December 2005.

In this source a new proposed information system called Total Information Awareness (TIA) is being reviewed and it tries to alert the public about the dangers this program would potentially pose to personal privacy. TIA would monitor all transactions made by American in both corporate and government databases around the world. The program would have the capacity to track personal medical records, credits records, shopping patterns, travel arrangements, personal finances, and related information. Data mining techniques would then exploit this data to find patterns that supposedly link to potential terrorist activities.

Cavoukian Ann. Data Mining: Taking a Stake on Your Privacy Information andPrivacy Commissioner/Ontario, January 1998http://www.ipc.on.ca/docs/datamine.pdf

This document is a reaction to the use of data mining in ways that violates personal privacy

rights. Cavoukian is an information and privacy commissioner in Canada and she provides some suggestions on how to address these challenges. The report recommends at least three waivers options to customers when giving out their information; first to state that no data mining be allowed on customer’s data; second, data mining be allowed only for internal use; and lastly, to authorize internal or external use of the data. It helped me to get an idea of what can be done to address the problem.