Optimizing data mining process using graphic processors

26
Optimizing Data Mining Process Using Graphic Processors

Transcript of Optimizing data mining process using graphic processors

Page 1: Optimizing data mining process using graphic processors

Optimizing Data Mining Process Using Graphic Processors

Page 2: Optimizing data mining process using graphic processors
Page 3: Optimizing data mining process using graphic processors

MACHINE

LEARNING

DATABASE

SYSTEMS

STATISTICS INFORMATION

SCIENCE

PATTERN

RECOGNITION

DATA

MINING

Data Mining An interdisciplinary field

“Extracting Knowledge from the Data”

Page 4: Optimizing data mining process using graphic processors

CRISP-DM CRoss Industry

Standard Process for Data Mining

http://www.crisp-dm.org/ founded in 1996

SIX Phases

Page 5: Optimizing data mining process using graphic processors

Financial data analysis

Telecommunications

Retail Industry

Healthcare and

biomedical research

Web Data Mining

Page 6: Optimizing data mining process using graphic processors

Scalability

Dimensionality Complex Data Data Quality

Data Ownership

Page 7: Optimizing data mining process using graphic processors
Page 8: Optimizing data mining process using graphic processors

Architecture difference between GPU and CPU • More transistors for data processing • Many-core (hundreds of cores)

Page 9: Optimizing data mining process using graphic processors

General Purpose computation using GPU in applications “other than 3D graphics”

Flexible and programmable it fully supports vectorized floating

point operations at IEEE single precision

additional levels of programmability are emerging with every generation of GPU (about every 18 months)

an attractive platform for general-purpose computation

Page 10: Optimizing data mining process using graphic processors
Page 11: Optimizing data mining process using graphic processors

Thread block “a batch of threads that can cooperate together by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses.”

Example of Block ID: A block (x,y) of a grid of DIM(X,Y) has block ID

(x + y.X)

Page 12: Optimizing data mining process using graphic processors
Page 13: Optimizing data mining process using graphic processors
Page 14: Optimizing data mining process using graphic processors

GPU Miner http://code.google.com/p/gpuminer/

SVM for Estimation of Aqueous Solubility

Data Mining on Cloud (Nov 22nd ‘10)

Page 15: Optimizing data mining process using graphic processors
Page 16: Optimizing data mining process using graphic processors

An itemset is frequent if its

support is not less than a threshold

specified by users

Thresholds: Minimum Confidence (in %): bond between the items of an itemset Minimum Support Count (in Numbers): how many times an itemset occur in the database

Page 17: Optimizing data mining process using graphic processors

“if an itemset is not frequent, any of its superset is never frequent”

An influential algorithm for mining frequent itemsets for association rules.

Proposed by Agrawal & Srikant

@ VLDB’94

Page 18: Optimizing data mining process using graphic processors
Page 19: Optimizing data mining process using graphic processors

No YES

Page 20: Optimizing data mining process using graphic processors
Page 21: Optimizing data mining process using graphic processors

Horizontal data layout

Vertical data layout

Bitmap Representation

Page 22: Optimizing data mining process using graphic processors
Page 23: Optimizing data mining process using graphic processors

Agrawal & Srikant @ VLDB’94

Page 24: Optimizing data mining process using graphic processors
Page 25: Optimizing data mining process using graphic processors

o We have presented a GPU-based implementation of Apriori algorithm for

frequent itemset mining.

o This implementation employs a bitmap data structure to encode the

transaction database on the GPU and utilize the GPU's SIMD parallelism for

support counting.

o Our implementation stores the itemsets in a bitmap, and runs entirely on the

GPU.

Page 26: Optimizing data mining process using graphic processors