Data Mining
Transcript of Data Mining
Data Mining
By:Chad GreggJohn Wilder
John Mary LugemwaYao Yao Bu
Our Presentation:
General Overview/Brief HistoryThe Brushing TechniqueThe Technique of Using Neural NetworksData Mining & PrivacyCurrent Applications & Future Possibilities
Overview & History
Data Mining: The process of finding patterns or correlations
among data
Evolution: Classical Statistics Artificial Intelligence Machine Learning
Goals of Data Mining
PredictionIdentificationClassificationOptimization
The Brushing Technique
Graphical exploratory data analysis
--visualize relations between variables
Could be 2D lines or 3D surfaces
Animated brushing and Automatic function refitting
Pictures are easier to read
Types of visualization
Automated graphics
Applications of the Brushing
Neural Networks
A technique with roots in Cognitive Science and Artificial Intelligence
Data Mining adopted this technique as its own
What exactly are they?
Learning
Repeatedly show input with correct outputSimilar to a humans brain
Changes neuron weights Change Firing threshold
Over-learning Problem
Possible Improvements
Make them autonomousUse a more correct model of the brain
Questions?
Data Mining & Privacy
What is Privacy? Data being minedData Mining: Tracking Terrorist ActivitiesA heated debate; what are the issues?Proposed solutions Challenges
The kind of your data being mined
Data involves information such as:• credit reports • credit card information and transactions, • student loan applications• bank account numbers, • tax payer identification numbers, and similar
information.• Medical records
Data Mining: Tracking Terrorist Activities
Federal Government’ efforts to hunt terrorists after 9/11: Data is collected from both the federal agencies and the private
sector databases. United States General Account Office (GAO) report:
Out of the 199 data mining efforts, 122 involved personal information.
Private sector: out of the 54 data mining efforts, 36 involved personal information.
Federal agencies: of the 77 efforts identified, 46 relied on personal information.
Privacy Concerns
Violation of privacy and sense of personal freedom protected by the Fourth Amendment.
Some information is too personal to be used for other purposes other than it was originally intended for.
Peoples’ private lives are put to unreasonable public scrutiny.
Eminent danger that some patterns may inaccurately match with a criminal profile which may lead to unreasonable charges and arrests.
Security concerns
Federal Govt. Data Mining Programs
MATRIX The Multi-state Anti-terrorist Information
Exchange System
TIA Total Information Awareness
What can be done
What Canada is proposing: Three optional waivers to customers when
giving out their information: i. State that no data mining be allowed on
customer’s data;
ii. Data mining be allowed only for internal use;
iii. Authorize internal or external use of the data.
Challenges to the solution
Even if customers are informed beforehand about the purpose of the data, the challenge is that data mining extracts hidden patterns and rules; it is not easy to speculate what relationships will emerge.
The Federal Government’s exploitation private information in the interest of national security will continually create mixed concerning privacy issues.
If privacy issues are not adequately addressed, users of data mining technologies will be exposed to a wide range of legal challenges as the general public becomes more alerted to the potential misuse of
personal data generated from data mining.
Current Applications
Public Sector Law Enforcement Policy Analysis
Private Sector Business Analysis Tool
Future Possibilities
Government PoliciesCommercial Database Systems of the
FutureGrowing Research Opportunities
Learn More:
Questions?Additional Resources
Chapter 27 of our Text
Additional Resources:
Overview/History/Current Applications/Future Possibilities:
United States. Congress. House. Committee on Government Reform. Subcommittee on
Technology, Information Policy, Intergovernmental Relations, and the Census. Data Mining: Current Applications And Future Possibilities: hearing before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census of the Committee on Government Reform, House of Representatives, One Hindered Eighth Congress, 25, March 2003.
<http://purl.access.gpo.gov/GPO/LPS38964> Accessed on 3, December 2005.
This source if from a government hearing and contains an excellent overview on what data mining is and its history, data mining techniques, current applications, and thoughts as to what data mining will be like in the future. It is a bit of a lengthy document but it is easy to skip to different sections and understand what is being talked about. It gives an excellent understanding of data mining from a reliable source. I have found the discussion on the current applications of data mining especially useful.
The Brushing Technique:
Wong, Pak Chung, and R. Daniel Bergeron. "Brushing Techniques for Exploring Volume Datasets." VIS '97: Proceedings of the 8th Conference on Visualization '97. Phoenix, Arizona, United States. <http://portal.acm.org/citation.cfm?id=267114>
This article composed by the Department of Computer Science at the University of New Hampshire describes several brushing techniques. It describes
qualitative brushing, planar brushing, and volume brushing.
The Technique of Using Neural Networks:
Hill, T. & Lewicki, P. STATISTICS Methods and Applications. (2006) StatSoft, Tulsa, OK
This is a book about statistics that includes a chapter on data mining, providing a general overview and a small introduction to several techniques used in data mining. It also includes chapters about several of the techniques to give a more in depth explanation. The chapter of focus for me is Neural Networks. It gives a basic overview of artificial neural networks and their relation to the human brain and nervous system. It gives a brief discourse on human nerve function, and firing thresholds. The author then discusses how neural networks learn, and the general amount of information that needs to be shown for the information to be learned well. The book then touches on how over-learning can cause problems, comparing it to using too high of a degree polynomial in linear regression. Also, they give examples of several of the uses, and some problems associated with the use of neural networks.
Roy, Asim. Artificial Neural Networks- A Science in Trouble. (2000) SIGKDD Explorations, volume 1, issue 2, 33.
This is an article discussing shortcomings of the current concept of neural networks. The author discusses inaccuracy of the current neural network model, and claims changes need to be made to this model, making it similar to the human brain. The author also discusses what he believes to be over-optimism in regards to what neural networks can do (under the current model). He also talks about the necessity of creating a neural network that is autonomous, instead of always needing humans to feed in the inputs and outputs in order to learn.
Data Mining & Privacy: United States General Accounting Office Data Mining: Federal Efforts Cover a Wide
Range of Uses. “Report to the Ranking Minority Member, Subcommittee of Financial Management, the Budget, and International Security, Committee on Governmental Affairs, U.S. Senate” 24, May 2004. <http://frwebgate.access.gpo. gov/cgi-bin/getdoc.cgi?dbname=gao&docid=f:d04548.pdf> Accessed on 2, December 2005.
This source is a government report on Data Mining. It reviews the understanding and current use data mining technologies. The report highlights the purposes of data mining efforts in government departments and agencies. Various Departments’ inventories of data mining efforts are provided in this report. Privacy concerns regarding the use data mining technology by the government is given consideration and highlights these concerns in light of exploitation of personal information. The source gave me a good understanding of the privacy concerns from a reliable source.
American Civil Liberties Union Technology and Liberty Program. Total Information Compliance: The TIA’s burden under the Wayne Amendment May 2003.<http://www.aclu.org/privacy/index.html> Accessed on 2, December 2005.
In this source a new proposed information system called Total Information Awareness (TIA) is being reviewed and it tries to alert the public about the dangers this program would potentially pose to personal privacy. TIA would monitor all transactions made by American in both corporate and government databases around the world. The program would have the capacity to track personal medical records, credits records, shopping patterns, travel arrangements, personal finances, and related information. Data mining techniques would then exploit this data to find patterns that supposedly link to potential terrorist activities.
Cavoukian Ann. Data Mining: Taking a Stake on Your Privacy Information andPrivacy Commissioner/Ontario, January 1998http://www.ipc.on.ca/docs/datamine.pdf
This document is a reaction to the use of data mining in ways that violates personal privacy
rights. Cavoukian is an information and privacy commissioner in Canada and she provides some suggestions on how to address these challenges. The report recommends at least three waivers options to customers when giving out their information; first to state that no data mining be allowed on customer’s data; second, data mining be allowed only for internal use; and lastly, to authorize internal or external use of the data. It helped me to get an idea of what can be done to address the problem.