IS-4075, Optimizing Games for Maximum Performance and Graphic Fidelity, by Devendra Raut
Optimizing data mining process using graphic processors
-
Upload
gurupad-hegde -
Category
Technology
-
view
899 -
download
2
Transcript of Optimizing data mining process using graphic processors
Optimizing Data Mining Process Using Graphic Processors
MACHINE
LEARNING
DATABASE
SYSTEMS
STATISTICS INFORMATION
SCIENCE
PATTERN
RECOGNITION
DATA
MINING
Data Mining An interdisciplinary field
“Extracting Knowledge from the Data”
CRISP-DM CRoss Industry
Standard Process for Data Mining
http://www.crisp-dm.org/ founded in 1996
SIX Phases
Financial data analysis
Telecommunications
Retail Industry
Healthcare and
biomedical research
Web Data Mining
Scalability
Dimensionality Complex Data Data Quality
Data Ownership
Architecture difference between GPU and CPU • More transistors for data processing • Many-core (hundreds of cores)
General Purpose computation using GPU in applications “other than 3D graphics”
Flexible and programmable it fully supports vectorized floating
point operations at IEEE single precision
additional levels of programmability are emerging with every generation of GPU (about every 18 months)
an attractive platform for general-purpose computation
Thread block “a batch of threads that can cooperate together by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses.”
Example of Block ID: A block (x,y) of a grid of DIM(X,Y) has block ID
(x + y.X)
GPU Miner http://code.google.com/p/gpuminer/
SVM for Estimation of Aqueous Solubility
Data Mining on Cloud (Nov 22nd ‘10)
An itemset is frequent if its
support is not less than a threshold
specified by users
Thresholds: Minimum Confidence (in %): bond between the items of an itemset Minimum Support Count (in Numbers): how many times an itemset occur in the database
“if an itemset is not frequent, any of its superset is never frequent”
An influential algorithm for mining frequent itemsets for association rules.
Proposed by Agrawal & Srikant
@ VLDB’94
No YES
Horizontal data layout
Vertical data layout
Bitmap Representation
Agrawal & Srikant @ VLDB’94
o We have presented a GPU-based implementation of Apriori algorithm for
frequent itemset mining.
o This implementation employs a bitmap data structure to encode the
transaction database on the GPU and utilize the GPU's SIMD parallelism for
support counting.
o Our implementation stores the itemsets in a bitmap, and runs entirely on the
GPU.