DataBench Benchmarks and Communities

10
EUROPEAN BIG DATA VALUE FORUM 2020 DataBench Benchmarks and Communities 2020 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’20) Nov 15-16, 2020 Todor Ivanov ([email protected]) Lead Consult

Transcript of DataBench Benchmarks and Communities

Page 1: DataBench Benchmarks and Communities

EUROPEAN BIG DATA VALUE FORUM 2020

DataBench Benchmarks and Communities

2020 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’20)

Nov 15-16, 2020

Todor Ivanov ([email protected])

Lead Consult

Page 2: DataBench Benchmarks and Communities

EUROPEAN BIG DATA VALUE FORUM 2020

Data Analytics/ML(including data

processing for analysis, AI and Machine

Learning)

Data Visualisation, Action/Interaction

(including data presentation environment/boundary/user

action and interaction)

Data Acquisition/Collection

(including Data Ingestion processing, Streaming, Data

Extraction, Ingestion Storage, Different data types)

Data Storage/Preparation

(including Storage Retrieval/Access/Queries Data Protection, Curation, Integration, Publication)

Streaming/ Realtime Processing

Interactive Processing

Batch Processing

Data Privacy/Security

Data Governance/Mgmt

Data Storage

Industrial Analytics (Descriptive, Diagnostic, Predictive, Prescriptive)

Machine Learning, AI, Data Science

Visual Analytics

BDVA Reference Model

DataBench Pipeline Methodology

Big Data and AI Landscape Big Data and AI Benchmark EcosystemICT Big Data PPPICT-13-14 Projects

▪ I-BiDaaS▪ TheyBuyForYou (TBFY)▪ Track&Know▪ DataBio▪ DeepHealth

Page 3: DataBench Benchmarks and Communities

EUROPEAN BIG DATA VALUE FORUM 2020

BenchCounsil Benchmarks

Data Analytics/ML(including data

processing for analysis, AI and Machine

Learning)

Data Visualisation, Action/Interaction

(including data presentation environment/boundary/user

action and interaction)

Data Acquisition/Collection

(including Data Ingestion processing, Streaming, Data

Extraction, Ingestion Storage, Different data types)

Data Storage/Preparation

(including Storage Retrieval/Access/Queries Data Protection, Curation, Integration, Publication)

DataBench Pipeline Methodology

AIBench

BigDataBench

AIBench

BigDataBench

AIBench

BigDataBench

AIBench

BigDataBench

HPC AI500 HPC AI500 HPC AI500

Edge AIBench Edge AIBench

AIoTBench AIoTBench

MLPerf

ABench ABench

MLPerf

ABench

Hobbit Benchmark Platform

Hobbit Benchmark Platform

Hobbit Benchmark Platform

Hobbit Benchmark Platform

Linked Data Benchmark Council (LDBC)

Graphalytics

Semantic Publishing Benchmark (SPB)

Graphalytics

Semantic Publishing Benchmark (SPB)

Semantic Publishing Benchmark (SPB)

TheyBuyForYou(TBFY)

TPC-H TPC-H

HiBench HiBench HiBenchI-BiDaaS

Page 4: DataBench Benchmarks and Communities

EUROPEAN BIG DATA VALUE FORUM 2020

PI-DV x x x x x x x

PI-DA x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

PI-DS x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

PI-DI x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

Sp

arkB

ench

TP

Cx

-V

IoT

Ab

ench

Big

FU

N

TP

C-D

S v

2

TP

Cx

-BB

City

Ben

ch

Grap

haly

tics

Yah

oo

Stream

ing

Ben

chm

ark (Y

SB

)

Sh

enZ

hen

Tran

spo

rtation

Sy

stem (S

ZT

S)

Deep

Ben

ch

Deep

Mark

Ten

sorF

low

Ben

chm

arks

Fath

om

Ad

Ben

ch

RIo

TB

ench

Ho

bb

it Ben

chm

ark

TP

Cx

-HS

v2

Big

Ben

ch V

2

San

zu

AIM

Ben

chm

ark

GA

RD

EN

IA

Pen

n m

achin

e learnin

g b

ench

mark

(PM

LB

)

Op

enM

L b

ench

mark

suites

Ben

chip

Deep

Learn

ing

Ben

chm

arkin

g S

uite (D

LB

S)

TP

Cx

-IoT

Sen

ska

DA

WN

Ben

ch

Blo

ckB

ench

IDEBench 

AB

ench

Stream

WatD

iv

TE

RM

inato

r Su

ite

HE

RM

IT

ML

Ben

ch S

ervices

ML

Ben

ch D

istribu

ted

ML

Perf

Train

ing

Ben

chm

ark fo

r DN

Ns (T

BD

)

Po

lyB

ench

NN

Ben

ch-X

GD

PR

ben

ch

Ben

chIo

T

IoT

Ben

ch

Visu

al Ro

ad

Ad

aBen

ch

MiD

Ben

ch

CB

ench

-Dy

nam

o

Ed

ge A

IBen

ch

AIB

ench

HP

C A

I50

0

Sp

arkA

IBen

ch

AI M

atrix

20

19

Ben

chm

arks

20

15

20

16

20

17

20

18

Mapping between the DataBench Pipeline Steps and the Benchmark Ecosystem → matrix available in the DataBench ToolBox

Data Analytics/ML(including data

processing for analysis, AI and Machine

Learning)

Data Visualisation, Action/Interaction

(including data presentation environment/boundary/user

action and interaction)

Data Acquisition/Collection

(including Data Ingestion processing, Streaming, Data

Extraction, Ingestion Storage, Different data types)

Data Storage/Preparation

(including Storage Retrieval/Access/Queries Data Protection, Curation, Integration, Publication)

DataBench Pipeline Methodology

Page 5: DataBench Benchmarks and Communities
Page 6: DataBench Benchmarks and Communities

The DataBench Toolbox includes:

• HiBench

• SparkBench

• YCSB

• TPCx-IoT

• Yahoo Streaming Benchmark

• BigBench V2

• TPC-H

• TPC-DS

• Hadoop Workload Examples

• PigMix

• Social Network Benchmark

• WatDiv

• Sanzu

• BigDataBench

• CLASS Benchmark

• ...

Searchable

• HiBench

• YCSB

• Yahoo! Streaming

• CLASS

• BigBench V2

• TPCx-BB (in progress)

Integrated & Runnable

• Classified around 80+

benchmarks

developed between

1999 and 2019!

• More than 80 are

already searchable in

the Toolbox!

Page 7: DataBench Benchmarks and Communities
Page 8: DataBench Benchmarks and Communities
Page 9: DataBench Benchmarks and Communities
Page 10: DataBench Benchmarks and Communities