Chapter 01.Introduction to Data Mining
-
Upload
darmatasia-palehai -
Category
Documents
-
view
21 -
download
4
description
Transcript of Chapter 01.Introduction to Data Mining
Data MiningIKO42351
Bahan Rancangan PengajaranMohamad Ivan Fanany, Dr. Eng.,
Lectures Introduction
● Goals and Objectives
● Textbooks
● Syllabus
● Evaluation
● Lecture Plans
● Rules
Goals and Objectives
● After finishing this course, students are expected to
understand the concept, tools, and techniques of
machine learning for data mining.
● Beside acquiring general picture of the most recent
development in data mining, students are also
expected to deeply understand the used techniques
and appreciate their strengths and applicability by
actively doing their own experiments both as
individual and as a member of a team.
Textbooks
Major textbookbefore UTS
Programming Book
Textbooks
1. Introduction to Data Mining, Pang-Ning
Tan, Michael Steinbach, Vipin Kumar,
Addison-Wesley, 2006
2. R and Data Mining, Examples and Case
Studies, YangChang Zhao, 2013
Syllabus (Weekly)
1) Introduction
2) Data
3) Exploring Data
4) Classification: Basic Concepts, Decision
Tree, and Model Evaluation
5) Classification: Alternative Techniques
6) Association: Basic Concept and Algorithms
7) Association Analysis: Advanced Concepts
8) Cluster Analysis
9) Anomali Detection
UTSWittenCh.1-7
UASKumarCh.6-8WittenCh.8+
Evaluation
1.Tugas Individu (PR): 8 kali = 16%
2.Tugas Kelompok (TK): 1 kali = 14%
3.Ujian Tengah Semester = 35%
4.Ujian Akhir Semester = 35%
5.Bonus (partisipasi di kelas, pop-quiz, dll)=++
6.Total: 100% ++
Rules
● Toleransi keterlambatan 15 menit
● Handphone harus non-aktif
● Terkait PR:
◆Seluruh PR dan Tugas diwajibkan
menggunakan Python(x,y)
◆Untuk PR, tuliskan kode asisten dosen pada
masing-masing berkas PR, dan kumpulkan
berdasarkan kode asisten tersebut.
◆Penalti keterlambatan → Lihat BRP
R and R Studio
http://www.rstudio.com/http://www.r-project.org/
● Lots of data is being collected
and warehoused
◆ Web data, e-commerce
◆ purchases at department/
grocery stores
◆ Bank/Credit Card
transactions
● Computers have become cheaper and more powerful
● Competitive Pressure is Strong
◆ Provide better, customized services for an edge (e.g. in
Customer Relationship Management)
Why Mine Data? Commercial Viewpoint
Why Mine Data? Scientific Viewpoint
● Data collected and stored at
enormous speeds (GB/hour)
◆ remote sensors on a satellite
◆ telescopes scanning the skies
◆ microarrays generating gene
expression data
◆ scientific simulations
generating terabytes of data
● Traditional techniques infeasible for raw data
● Data mining may help scientists
◆ in classifying and segmenting data
◆ in Hypothesis Formation
Mining Large Data Sets - Motivation
● There is often information “hidden” in the data that is not readily evident
● Human analysts may take weeks to discover useful information
● Much of the data is never analyzed at all
Number of
analysts
What is Data Mining?
● Many Definitions◆ Non-trivial extraction of implicit, previously
unknown and potentially useful information from data
◆ Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
•What is (not) Data Mining?
What is Data Mining?
– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
– Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)
What is not Data Mining?
– Look up phone number in phone directory
– Query a Web search engine for information about “Amazon”
● Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems
● Traditional Techniques
may be unsuitable due to
◆Enormity of data
◆High dimensionality
of data
◆Heterogeneous,
distributed nature
of data
Origins of Data Mining
Machine Learning/
Pattern Recognition
Statistics/AI
Data Mining
Database systems
© 2002, AvaQuest Inc.
Text
Mining
Data
Mining
Data
Retrieval
Information
Retrieval
Search
(goal-oriented)
Discover
(opportunistic)
Structured
Data
Unstructured
Data (Text)
Search Vs Discovery
Data Mining = KDD: Knowledge ‘Discovery’ from DB
Data Mining Tasks
● Prediction Methods
◆Use some variables to predict unknown or
future values of other variables.
● Description Methods
◆Find human-interpretable patterns that
describe the data.
From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
Data Mining Tasks...
● Classification [Predictive]
● Clustering [Descriptive]
● Association Rule Discovery [Descriptive]
● Sequential Pattern Discovery [Descriptive]
● Regression [Predictive]
● Deviation Detection [Predictive]
Do you want to be a Miner?
19
Wisdom
Knowledge
Information
Data
Pattern
Why we need Data Mining?
The Internet
Storage
Storage
Storage
IncreasedCapacity
LowerCost
Faster... & Faster...
Storage
DATA EXPLOSION
DATA MINING
Wisdom
Knowledge
Information
Data
CompetitiveAdvantages
Data Mining and Machine Learning
MACHINE LEARNING
DATA MINING
MULTI-SOURCE
MULTI-TYPE
ENSEMBLE LEARNING
MULTI-DIMENSION
SPATIO-TEMPORAL
BIG DATA
DEEP LEARNING
Data Mining and Database
DATABASE
DATA MINING
DATA WAREHOUSE
DATA CLEANING
CLUSTER ANALYSIS
DATA CUBE OLAP
ASSOCIATION ANALYSIS
BIG DATA
Evolution of Database Technology
Financial Reporting
Another Dashboard
Another Dashboard
Another Dashboard