A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and...

A Map-Reduce System with an Alternate API for Multi-Core Environments

Wei Jiang, Vignesh T. Ravi and Gagan Agrawal

Outline

April 20, 20232

Introduction MapReduce Generalized Reduction System Design and Implementation Experiments Related Work Conclusion

April 20, 20233

Growing need for analysis of large scale data Scientific Commercial

Data-intensive Supercomputing (DISC) Map-Reduce has received a lot of attention

Database and Datamining communities High performance computing community

E.g. this conference !!

Motivation

April 20, 20234

Positives: Simple API

Functional language based Very easy to learn

Support for fault-tolerance Important for very large-scale clusters

Questions Performance?

Comparison with other approaches Suitability for different class of applications?

Map-Reduce: Positives and Questions

Class of Data-Intensive Applications Many different types of applications

Data-center kind of applications Data scans, sorting, indexing

More ``compute-intensive`` data-intensive applications Machine learning, data mining, NLP Map-reduce / Hadoop being widely used for this class

Standard Database Operations Sigmod 2009 paper compares Hadoop with Databases

and OLAP systems

What is Map-reduce suitable for? What are the alternatives?

MPI/OpenMP/Pthreads – too low level?April 20, 20235

This Paper Proposes MATE (a Map-Reduce system with an

AlternaTE API) based on Generalized Reduction Phoenix implemented Map-Reduce in shared-memory

systems MATE adopted Generalized Reduction, first proposed in

FREERIDE that was developed at Ohio State 2001-2003 Observed API similarities and subtle differences between

MapReduce and Generalized Reduction Comparison for

Data Mining Applications Compare performance and API Understand performance overheads

Will an alternative API be better for ``Map-Reduce``?

April 20, 20236

April 20, 20237

Map-Reduce Execution

April 20, 20238

Phoenix Implementation It is based on the same principles of

MapReduce But targets shared-memory systems

Consists of a simple API that is visible to application programmers Users define functions like splitter, map, reduce,

and etc.. An efficient runtime that handles low-level

details Parallelization Resource management Fault detection and recovery

April 20, 20239

Phoenix Runtime

Comparing Processing Structures

10

• Reduction Object represents the intermediate state of the execution• Reduce func. is commutative and associative• Sorting, grouping.. overheads are eliminated with red. func/obj.

April 20, 2023

Observations on Processing Structure Map-Reduce is based on functional idea

Do not maintain state This can lead to overheads of managing

intermediate results between map and reduce Map could generate intermediate results of very

large size MATE API is based on a programmer managed

reduction object Not as ‘clean’ But, avoids sorting of intermediate results Can also help shared memory parallelization Helps better fault-recovery

April 20, 202311

April 20, 202312

Apriori pseudo-code using MATE

An Example

April 20, 202313

Apriori pseudo-code using MapReduce

Example – Now with Phoenix

April 20, 202314

System Design and Implementation

Basic dataflow of MATE (MapReduce with AlternaTE API) in Full Replication scheme

Data structures in MATE scheduler Used to communicate between the user code

and the runtime Three sets of functions in MATE

Internally used functions MATE API provided by the runtime MATE API to be defined by the users

Implementation considerations

April 20, 202315

MATE Runtime Dataflow Basic one-stage dataflow (Full Replication

scheme)

April 20, 202316

Data Structures-(I) scheduler_args_t: Basic fields

Field Description

Input_data Input data pointer

Data_size Input dataset size

Data_type Input data type

Stage_num Computation-Stage number (Iteration number)

Splitter Pointer to Splitter function

Reduction Pointer to Reduction function

Finalize Pointer to Finalize function

April 20, 202317

Data Structures-(II) scheduler_args_t: Optional fields for

performance tuning

Field Description

Unit_size # of bytes for one element

L1_cache_size # of bytes for L1 data cache size

Model Shared-memory parallelization model

Num_reduction_workers

Max # of threads for reduction workers(threads)

Num_procs Max # of processor cores used

April 20, 202318

Functions-(I) Transparent to users (R-required; O-optional)

Function Description R/O

static inline void * schedule_tasks(thread_wrapper_arg_t *)

R

static void * combination_worker(void *) R

static int array_splitter(void *, int, reduction_args_t *) R

void clone_reduction_object(int num) R

static inline int isCpuAvailable(unsigned long, int) R

April 20, 202319

Functions-(II) APIs provided by the runtime


int mate_init(scheudler_args_t * args) R

int mate_scheduler(void * args) R

int mate_finalize(void * args) O

void reduction_object_pre_init() R

int reduction_object_alloc(int size)—return the object id R

void reduction_object_post_init() R

void accumulate(int id, int offset, void * value) O

void reuse_reduction_object() O

void * get_intermediate_result(int iter, int id, int offset) O

April 20, 202320

Functions-(III) APIs defined by the user


int (*splitter_t)(void *, int, reduction_args_t *) O

void (*reduction_t)(reduction_args_t *) R

void (*combination_t)(void*) O

void (*finalize_t)(void *) O

April 20, 202321

Implementation Considerations Focus on the API differences in programming

models Data partitioning: dynamically assigns splits to

worker threads, same as Phoenix Buffer management: two temporary buffers

one for reduction objects the other for combination results

Fault tolerance: re-executes failed tasks Checking-point may be a better solution since

reduction object maintains the computation state

April 20, 202322

For comparison, we used three applications Data Mining: KMeans, PCA, Apriori Also evaluated the single-node performance

of Hadoop on KMeans and Apriori Experiments on two multi-core platforms

8 cores on one WCI node (intel cpu) 16 cores on one AMD node (amd cpu)

Experiments Design

April 20, 202323

Results: Data Mining (I) K-Means: 400MB dataset, 3-dim points, k =

100 on one WCI node with 8 cores

0

20

40

60

80

100

120

1 2 4 8

PhoenixMATEHadoop

Avg

. Tim

e P

er

Itera

tion

(sec)

# of threads

2.0 speedup

April 20, 202324

Results: Data Mining (II) K-means: 400MB dataset, 3-dim points, k =

100 on one AMD node with 16 cores

0

20

40

60

80

100

120

140

1 2 4 8 16

PhoenixMATEHadoop

Avg

. Tim

e P

er

Itera

tion

(sec)

# of threads

3.0 speedup

April 20, 202325

Results: Data Mining (III)

PCA: 8000 * 1024 matrix, on one WCI node with 8 cores

0

50

100

150

200

250

300

1 2 4 8

PhoenixMATE

# of threads

Tota

l Tim

e (

sec)

2.0 speedup

April 20, 202326

Results: Data Mining (IV) PCA: 8000 * 1024 matrix, on one AMD node

with 16 coresTota

l Tim

e

(sec)

0

50

100

150

200

250

300

350

400

450

1 2 4 8 16

PhoenixMATE

# of threads

2.0 speedup

April 20, 202327

Results: Data Mining (V) Apriori: 1,000,000 transactions, 3% support,

on one WCI node with 8 cores

0

20

40

60

80

100

120

140

1 2 4 8

PhoenixMATEHadoop

# of threads

Avg

. Tim

e P

er

Itera

tion

(sec)

April 20, 202328

Results: Data Mining (VI) Apriori: 1,000,000 transactions, 3% support,

on one AMD node with 16 cores

0

20

40

60

80

100

120

140

160

1 2 4 8 16

PhoenixMATEHadoop

# of threads

Avg

. Tim

e P

er

Itera

tion

(sec)

Observations

April 20, 202329

MATE system has between reasonable to significant speedups for all the three datamining applications.

In most of the cases, it outperforms Phoenix and Hadoop significantly, while slightly better in only one case.

Reduction object can help to reduce the memory overhead associated with managing the large set of intermediate results between map and reduce

Related Work

April 20, 202330

Lots of work on improving and generalizing MapReduce’s API… Industry: Dryad/DryadLINQ from Microsoft, Sawzall

from Google, Pig/Map-Reduce-Merge from Yahoo! Academia: CGL-MapReduce, Mars, MITHRA,

Phoenix, Disco, Hive… Address MapReduce limitations

One input, two-stage data flow is extremely rigid Only two high-level primitives

April 20, 202331

Conclusions MapReduce is simple and robust in

expressing parallelism Two-stage computation style may cause

performance losses for some subclasses of applications in data-intensive computing

MATE provides an alternate API that is based on generalized reduction

This variation can reduce overheads of data management and communication between Map and Reduce

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and...

Documents

Transcript of A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and...