HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y...

27
INSTITUTE OF COMPUTING TECHNOLOGY How to Use BigDataBench 4.0 Jianfeng Zhan, Chen Zheng, and Wanling Gao http://prof.ict.ac.cn ICT, Chinese Academy of Sciences ASPLOS 2018, Williamsburg, VA, USA

Transcript of HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y...

Page 1: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

INSTITUTE O

F COM

PUTING

TECHN

OLO

GY

How to Use BigDataBench 4.0

Jianfeng Zhan, Chen Zheng, and Wanling Gaohttp://prof.ict.ac.cn

ICT,ChineseAcademyofSciences

ASPLOS2018, Williamsburg, VA, USA

Page 2: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

General Steps to Use BigDataBench

n Currentreleasen Version4.0 onhttp://prof.ict.ac.cn

n Generalstepstorunthebenchmarksn PreparethepackageofBigDataBenchn Preparetheenvironmentsoftheselectedsoftwarestackn Generatedatasetsasyouneed•YoucanfindagenDate*oraprepare*shellscriptineachdirectoryofthebenchmarks

n Runthescriptsorcommands(User Manual!)

Page 3: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Directory Structure

Root directory

MicroBenchmark

AI TensorFlow, Caffe2

Offline analytics Hadoop, Spark, Flink, MPI

Hadoop, Spark, Flink,GraphLab, MPIGraph analytics

NoSQL Hbase, MongoDBComponentBenchmark

Online service Xapian

Data warehouse Hive, SparkSQL, Impala

Streaming Spark streaming, JStorm

Data Generator(BDGS)

Page 4: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

BDGS - Text

n Text_datagenn Wikipedia generator - 3trainedmodels• lda_wiki1w, wiki_1w5, wiki_noSW_90_Sampling

n Amazon movie review generator – 2 models• amazonMR1, AMR1_noSW_95_Sampling

n Use“gen_text_data.sh”

e.g.lda_wiki1w e.g.10 e.g.100 e.g.10000

e.g.amazonMR1 e.g.10 e.g.100 e.g.10000

Wiki example:

Amazon example:

Page 5: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

BDGS - Graph

n Graph_datagenn Kronecker Model• Weighted graph• Un-weighted graph

e.g.kronecker model parameter Vertex: 2^16

Page 6: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

BDGS - Table

n Table_datagenn E-commerce data generation• PDGF: usesXMLconfigurationfilesfordatadescriptionanddistribution

n Personal Resume generation

Page 7: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Micro Benchmark

n Offline analytics & Graph analyticsn Streaming

Page 8: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Offline Analytics - RandSample

n Target: run RandSample microbenchmarkn General steps:

n Prepare Hadoop environmentn Prepare input data• Using wikipedia text data generator

n ./run_RandSample.sh• hadoop jarRandSample.jar RandSample <input><output><sample_ratio>

Page 9: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Offline Analytics – FFT examplen Target: run “FFT” micro benchmarkusinghadoopn General steps:

n Prepare Hadoop environmentn Prepare matrix data

• cd/BigDataBench_V4.0_Hadoop/MicroBenchmark/OfflineAnalytics/FFT• sh genData_FFT.shsh generate-matrix<mat_row><mat_col><sparsity>

n RunFFT:• sh run_FFT.shhadoop jarfft.jarorg.fft.fft <inputfile><outputfile1><outputfile2><log2_col><log2_co>:(auto-generated by run_FFT.sh)

Page 10: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Streaming – Grep example

n Target:rungrep benchmarkusingSparkstreamingn Generalsteps:

n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/MicroBenchmark/Streaming/Grep

n ./run-sparkstreaming-grep.sh

Page 11: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Micro Benchmark

n AI

Page 12: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

AI – Conv2d example

n Target: run conv2d micro benchmark usingTensorFlow

n General steps:n Prepare TensorFlow environmentn Prepare image datan Config image directory in conv2d.pyn python conv2d.py

Page 13: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Micro Benchmark

n NoSQL

Page 14: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

NoSQL – Write example

n Target:run“write”operationsusingHBasen Generalsteps:

n PrepareHBase accordingtotheofficeguide• sh /hbase-0.94.5/bin/hbase shell• create'usertable','f1','f2','f3'

n PrepareYCSBastheworkloadgenerator• YCSBisinthedirectoryofBasicDatastoreOperaOons/ycsb-0.1.4

n RunYCSBcommandslikethis:• •sh bin/ycsb loadhbase -Pworkloads/workloadc -pthreads=<thread-numbers>-pcolumnfamily=<family>-precordcount=<recordcount-value>-phosts=<hosOp>-s>load.dat

Page 15: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Component Benchmark

n AI

Page 16: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

AI – Alexnet Examplen Target: run “Alexnet” micro benchmarkusingTensorflown General steps:

n Prepare Tensorflow environmentn RunAlexnet:

• cd/BigDataBench_V4.0_Tensorflow/ComponentBenchmark/AI/Alexnet• pythonalexnet_cifar10.py• Choosing CPU or GPU environment

Page 17: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Component Benchmark

n Offline analytics & Graph analyticsn Streaming

Page 18: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Offline Analytics – SIFT examplen Target: run “SIFT” component benchmarkusinghadoopn General steps:

n Prepare Hadoop environmentn Prepare SIFT data

• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/OfflineAnalytics/SIFT• Put the image data under SIFT directory• sh genData_SIFT.shhadoopjar$jarFile/hibImport.jar-h/testimage/out.hib

n RunSIFT:• sh run_SIFT.shhadoop jarsift.jar<out.hib><outsif><out.hib>:genData_SIFT.shgeneratedata<outsif>:theresulttosavepath

Page 19: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Streaming – Kmeans example

n Target:runkmeans benchmarkusingSparkstreaming

n Generalsteps:n PrepareSparkstreamingenvironmentn cd/BigDataBench_V4.0_Streaming/ComponentBenchmark/Streaming/Kmeans

n ./run-sparkstreaming-kmeans.sh

Page 20: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Graph Analytics – PageRankn Target: run “PageRank” component benchmarkusinghadoopn General steps:

n Prepare Hadoop environmentn Runthedatagenerationscript

• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/GraphAnalytics/PageRank

• sh genData_PageRank.sh

n RunPageRank:• sh run_PageRank.shhadoop jarpegasus.PagerankNaive <inputfile>pr_tempmv pr_output<Internation><reducers><1024><makesym><new>

Page 21: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Online Service – Xapian (cont’)

n Target: run searching using Xapiann General steps:

n 3) Online searching• Run xapian/run_networked.sh

Page 22: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Online Service – Xapian

n Target: run searching using Xapiann General steps:

n 1) Install Xapian according to user manual• ./build.sh to install harness (gcc version > 4.8)• xapian/build.sh to install xapian

Page 23: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Online Service – Xapian (cont’)

n Target: run searching using Xapiann General steps:

n 2) Configuration• vim xapian/run_networked.sh

Page 24: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Component Benchmark

n Data warehouse

Page 25: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Data Warehouse – Select example

n Target: run “Select” benchmarkusinghadoop hiven General steps:

n Prepare Hadoop andhiveenvironmentn Runthedatagenerationscript

• cd/BigDataBench_V4.0_Hadoop/ComponentBenchmark/Datawarehouse/Select/• sh genData_Select.sh

n RunSelectlikethis:• sh run_Select.sh

Page 26: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018

Conclusion

n Website:http://prof.ict.ac.cn

n Please refer to user manual for more details !

Page 27: HowtoUseBigDataBench4prof.ict.ac.cn/BigDataBench_asplos_18/HowToUse_BigDatBench4.pdf · Y HowtoUseBigDataBench4.0 JianfengZhan, ChenZheng, andWanlingGao ... Hadoop,Spark,Flink, GraphLab,MPI

BigDataBench ASPLOS2018