Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth...

16
Data Mining Data Mining & & Data Warehousing Data Warehousing Presented Presented By: By: Group 4 Group 4 Kirk Bishop Joe Draskovich Amber Kirk Bishop Joe Draskovich Amber Hottenroth Hottenroth Brandon Lee Stephen Pesavento Brandon Lee Stephen Pesavento

Transcript of Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth...

Page 1: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data MiningData Mining&&

Data WarehousingData Warehousing

Presented Presented

By:By:

Group 4Group 4

Kirk Bishop Joe Draskovich Amber Hottenroth Kirk Bishop Joe Draskovich Amber Hottenroth

Brandon Lee Stephen PesaventoBrandon Lee Stephen Pesavento

Page 2: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

IntroductionIntroduction

What is Data Mining?

Data Mining is the process of collecting large amounts of raw data and transforming that data into useful information.

Data Warehousing?

A Data Warehouse is a computerized collection of mined data.

Page 3: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

ObjectivesObjectives

• Explore the business applications of data mining & Explore the business applications of data mining & warehousingwarehousing

• Explain the advantages & disadvantages Explain the advantages & disadvantages

• Uncover software used in data mining.Uncover software used in data mining.

• Find what data mining is used for.Find what data mining is used for.

• Discover current trends, regulation, and future uses of the Discover current trends, regulation, and future uses of the technology.technology.

Page 4: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

What is Data Mining?What is Data Mining?

Data mining is the practice of Data mining is the practice of searching through large amounts searching through large amounts

of computerized data to find useful of computerized data to find useful patterns or trends (American patterns or trends (American Heritage Dictionary, 2008). Heritage Dictionary, 2008).

Page 5: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data Mining ApplicationsData Mining Applications

BankingDetect Fraudulent Activity

InsuranceRisk Assessment

Medicine/HealthcareEnhance Research

RetailTrack consumer buying trends

Page 6: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data Mining Software ApplicationsData Mining Software Applications

-SPSS/ SPSS Clementine is a data mining toolkit with a wide variety of mining methods to choose from. This package is designed to be used by the normal person with the desire to do their own data mining projects

-Previously known as yale, RapidMiner is a data mining suite which makes a wide range of techniques available. . Its easy to use interface makes it accessible for general use, while its flexibility and extensibility make it suitable for academic use and adds a number of useful visualization methods.

-‘SAS Enterprise Miner,’ a tool provided in the package suite, has an easy to use data-flow GUI. A wide variety of data mining techniques are supported including: Decision Trees; Neural Networks; Market Basket Analysis; Clustering; Web Log Analysis; Regression; and Rule Induction.

Page 7: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Cross-Industry Standard Process for Data MiningCross-Industry Standard Process for Data Mining

- Understanding the businessUnderstanding the business

- Understanding the dataUnderstanding the data

- Data preparationData preparation

- ModelingModeling

- EvaluationEvaluation

- DeploymentDeployment

CRISP-DMCRISP-DM

Page 8: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data WarehousingData Warehousing

AdvantagesAdvantages•Access to information Access to information •Data InconsistencyData Inconsistency•Decrease Computing CostDecrease Computing Cost•Productivity IncreaseProductivity Increase• Increase company profitsIncrease company profits

Page 9: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data WarehousingData Warehousing

DisadvantagesDisadvantages• Data must be cleaned, loaded, and Data must be cleaned, loaded, and

extractedextracted• 80% of the overall process80% of the overall process

• User VariabilityUser Variability• Proper TrainingProper Training

• Difficult to Maintain Difficult to Maintain • Incongruence among systemsIncongruence among systems

Page 10: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data MiningData Mining

AdvantagesAdvantages

• Improves Customer Satisfaction/serviceImproves Customer Satisfaction/service

• Saves Time and MoneySaves Time and Money

• Increases Sales EffectivenessIncreases Sales Effectiveness

• Increases profitabilityIncreases profitability

Page 11: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Data MiningData Mining

DisadvantagesDisadvantages• Require skilled technical users to Require skilled technical users to

interpret and analyze data from interpret and analyze data from warehousewarehouse

• Validity of the patterns Validity of the patterns • Related to real world circumstancesRelated to real world circumstances

• Unable to Identify Casual RelationshipsUnable to Identify Casual Relationships• Reserved for the few instead of the manyReserved for the few instead of the many

Page 12: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Current IssuesCurrent Issues

Data QualityData Quality• Duplicated recordsDuplicated records• Lack of Data StandardsLack of Data Standards• Human ErrorHuman Error

InoperabilityInoperability• Lack of communications among existing Lack of communications among existing

systemssystems

Mission CreepMission Creep

Page 13: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Trends & Current IssuesTrends & Current Issues•4 Major Trends

•Data – growing amount collected to be sifted•Hardware – growing performance & storage•Scientific Computing – theory, experiment, simulation•Business – Meet higher standard in order to foresee risks, opportunities, and benefits for the company

•Growing quickly due to renewal with new methodology frequently discovered•Applications of uses & methodology to medical, marketing, operations, & others

•The government is closely reviewing the uses of data mining, due to the possibilities both good and bad

•Counterterrorism data mining has been done, but in some instances has been deemed a violation of privacy

Page 14: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Future Research PossibilitiesFuture Research Possibilities

•Government’s uses for data mining.Government’s uses for data mining.•National SecurityNational Security•Terrorism DetectionTerrorism Detection

•Identity theft through data mining.Identity theft through data mining.

Page 15: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

Conclusion/AnalysisConclusion/Analysis

• Data mining is the extraction of information that can Data mining is the extraction of information that can predict future trends & behaviorspredict future trends & behaviors

• Requires a large amount of data to be collected, and Requires a large amount of data to be collected, and then stored in data warehousethen stored in data warehouse

• Possible violation of privacy in some circumstancesPossible violation of privacy in some circumstances

• Government is getting involved with regulation, Government is getting involved with regulation, despite the counterterrorism program being a despite the counterterrorism program being a possible violationpossible violation

Page 16: Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.

QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions

• QuestionsQuestions