Lecturers : Kayvan Zarei - shahed mahmoodi 1Azad University of Sanandaj Professor : Dr. Kyumars...

37
Easy and Efficient Parallel Processing of Massive Data Sets Lecturers : Kayvan Zarei - shahed mahmoodi 1 Azad University of Sanandaj Professor : Dr. Kyumars Sheykh Esmaili SCOPE

Transcript of Lecturers : Kayvan Zarei - shahed mahmoodi 1Azad University of Sanandaj Professor : Dr. Kyumars...

  • Slide 1
  • Lecturers : Kayvan Zarei - shahed mahmoodi 1Azad University of Sanandaj Professor : Dr. Kyumars Sheykh Esmaili SCOPE
  • Slide 2
  • 2.PLATFORM OVERVIEW Availability Reliability Scalability 2Azad University of Sanandaj Performance Cost 2.1 Cosmos Storage System 2.2 Cosmos Execution Environment 3.SCOPE Scripting Language 3.2 Select and Join 3.1 Input and Output 3.3 Expressions and Functions 3.4 User-Defined Operators 4. SCOPE Execution 4.1 SCOPE Compilation 4.2 SCOPE Optimization 4.3 Example Query Plan 4.4 Runtime Optimization 5. EXPERIMENTAL EVALUATION 5.1 Experimental Setup 5.2 TPC-H Queries 5.3 Scalability 6. RELATED WORK 1.About SCOP
  • Slide 3
  • Azad University of Sanandaj3 - A new declarative and extensible scripting language SCOPE (Structured Computations Optimized for Parallel Execution) - Targeted for this type of massive data analysis - Being amenable to efficient parallel execution on large clusters -Data is modeled as sets of rows composed of typed columns SQL
  • Slide 4
  • SCOPE 4Azad University of Sanandaj
  • Slide 5
  • 5 Users can easily define their own functions and implement their own versions of operators : Note : Extractors (parsing and constructing rows from a file) Processors (row-wise processing) Reducers (group-wise processing) Combiners (combining rows from two inputs)
  • Slide 6
  • Large-scale Distributed Computing Large data centers (x1000 machines): storage and computation Key technology for search (Bing, Google, Yahoo) Web data analysis, user log analysis, relevance studies, etc.... How to program the beast? 6Azad University of Sanandaj
  • Slide 7
  • 7 Internet companies store and analyze massive data sets, for example: - such as search logs - web content collected by crawlers - click streams collected from a variety of web services. Such analysis is becoming increasingly valuable for business in a variety of ways: - to improve service quality and support novel features - to detect changes in patterns over time - to detect fraudulent activity.
  • Slide 8
  • Parallel Processing Azad University of Sanandaj8
  • Slide 9
  • Matrix Multiplication Azad University of Sanandaj9
  • Slide 10
  • Parallel Processing Architecture in Database Systems Azad University of Sanandaj10 Inter-Query Inter-Operation Intra-Operation
  • Slide 11
  • Parallel Processing in Business? Azad University of Sanandaj11 Massively Parallel Processing Database for Business Intelligence http://www.computerworld.com/pdfs/mpp_wp.pdf
  • Slide 12
  • Azad University of Sanandaj12 Companies have developed distributed data storage and processing systems on large clusters of shared-nothing commodity servers including : Google s File System, Bigtable, Map-Reduce, Hadoop, Yahoo! s Pig system, Ask.com s Neptune, Microsoft s Dryad. A typical cluster consists of hundreds or thousands of commodity machines connected via a high-bandwidth network. It is challenging to design a programming model that enables users to easily write programs that can efficiently and effectively utilize all resources in such a cluster and achieve maximum degree of parallelism
  • Slide 13
  • Map-Reduce / GFS - GFS / Bigtable provide distributed storage - The Map-Reduce programming model - Good abstraction of group-by-aggregation operations - Map function -> grouping - Reduce function -> aggregation - Very rigid: every computation has to be structured as a sequence of map-reduce pairs - Not completely transparent: users still have to use a parallel mindset - Error-prone and suboptimal: writing map-reduce programs is equivalent to writing physical execution plans in DBMS 13Azad University of Sanandaj
  • Slide 14
  • Pig Latin / Hadoop - Hadoop: distributed file system and map-reduce execution engine - Pig Latin: a dataflow language using a nested data model - Imperative programming style - Relational data manipulation primitives and plug-in code to customize processing - New syntax need to learn a new language - Queries are mapped to map-reduce engine 14Azad University of Sanandaj
  • Slide 15
  • An Example: QCount Compute the popular queries that have been requested at least 1000 times Data model: a relational rowset with well-defined schema 15Azad University of Sanandaj
  • Slide 16
  • SCOPE / Cosmos 16Azad University of Sanandaj Microsoft has developed a distributed computing platform, called Cosmos Figure 1: Cosmos Software Layers
  • Slide 17
  • Azad University of Sanandaj17 Microsoft has developed a distributed computing platform, called Cosmos. For storing and analyzing massive data sets. Cosmos is designed to : run on large clusters consisting of thousands of commodity servers. Disk storage is distributed with each server having one or more direct-attached disks.
  • Slide 18
  • Azad University of Sanandaj18 - Cosmos Storage System - Append-only distributed file system for storing petabytes of data - Optimized for sequential I/O - Data is compressed and replicated - Cosmos Execution Environment - Flexible model: a job is a DAG (directed acyclic graph) - Vertices -> processes - edges -> data flows - The job manager schedules and coordinates vertex execution - Provides runtime optimization, fault tolerance, resource management
  • Slide 19
  • Azad University of Sanandaj19 High-level design objectives for the Cosmos platform include : 1. Availability: Cosmos is resilient to multiple hardware fail ures to avoid whole system outages. 2. Reliability: Cosmos is architected to recognize transient hardware conditions to avoid corrupting the system. 3. Scalability: Cosmos is designed from the ground up to be a scalable system, capable of storing and processing petabytes of data. 4. Performance: Cosmos runs on clusters comprised of thousands of individual servers 5. Cost: Cosmos is cheaper to build, operate and expand, per gigabyte, than traditional approaches to the same problem
  • Slide 20
  • Azad University of Sanandaj20 Cosmos Storage System - is an append-only file system that reliably stores petabytes of data - The system is optimized for large sequential I/O - All writes are append-only - Data is distributed A Cosmos Store provides a directory with a hierarchical names pace and stores sequential files of unlimited size. A file is physically composed of a sequence of extents. Extents are the unit of space allocation and are typically a few hundred megabytes in size. A unit of computation generally consumes a small number of collocated extents.
  • Slide 21
  • Cosmos Execution Environment The lowest level primitives of the Cosmos execution environment provide only the ability to run arbitrary executable code on a server. Clients upload application code and resources onto the system via a Cosmos execution protocol. A recipient server assigns the task a priority and executes it at an appropriate time. It is difficult, tedious, error prone, and time consuming to program at this lowest level to build an efficient and fault tolerant application. Azad University of Sanandaj21
  • Slide 22
  • Input and Output - SCOPE works on both relational and nonrelational data sources - EXTRACTOUTPUT - EXTRACT and OUTPUT commands provide a relational abstraction of underlying data sources - Built-in/customized extractors and outputters (C# classes) EXTRACT column[: ] [, ] FROM USING [(args)] [HAVING ] OUTPUT [ ] TO [USING [(args)]] publicclassLineitemExtractor : Extractor { public override Schema Produce(string[] requestedColumns, string[] args) { } public overrideIEnumerable Extract(StreamReader reader, Row outputRow, string[] args) { } } publicclassLineitemExtractor : Extractor { public override Schema Produce(string[] requestedColumns, string[] args) { } public overrideIEnumerable Extract(StreamReader reader, Row outputRow, string[] args) { } } 22Azad University of Sanandaj
  • Slide 23
  • Select and Join - Supports different Agg functions: COUNT, COUNTIF, MIN, MAX, SUM, AVG, STDEV, VAR, FIRST, LAST. - No subqueries (but same functionality available because of outer join) SELECT [DISTINCT] [TOP count] select_expression [AS ] [, ] FROM { USING | { [ []]} [, ] } [WHERE ] [GROUP BY [, ] ] [HAVING ] [ORDER BY [ASC | DESC] [, ]] joined input: JOIN [ON ] join_type: [INNER | {LEFT | RIGHT | FULL} OUTER] 23Azad University of Sanandaj
  • Slide 24
  • Deep Integration with.NET (C#) - SCOPE supports C# expressions and built-in.NET functions/library - User-defined scalar expressions - User-defined aggregation functions R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHEREStringOccurs(C, xyz) > 2 #CS public static intStringOccurs(stringstr, string ptrn) { } #ENDCS 24Azad University of Sanandaj
  • Slide 25
  • User Defined Operators PROCESS REDUCECOMBINE - SCOPE supports three highly extensible commands: PROCESS, REDUCE, and COMBINE - Complements SELECT for complicated analysis - Easy to customize by extending built-in C# components - Easy to reuse code in other SCOPE scripts 25Azad University of Sanandaj
  • Slide 26
  • Process - PROCESS - PROCESS command takes a rowset as input, processes each row, and outputs a sequence of rows PROCESS [ ] USING [ (args) ] [PRODUCE column [, ]] [WHERE ] [HAVING ] publicclassMyProcessor: Processor { public override Schema Produce(string[] requestedColumns, string[] args, Schema inputSchema) { } public overrideIEnumerable Process(RowSet input, Row outRow, string[] args) { } } publicclassMyProcessor: Processor { public override Schema Produce(string[] requestedColumns, string[] args, Schema inputSchema) { } public overrideIEnumerable Process(RowSet input, Row outRow, string[] args) { } } 26Azad University of Sanandaj
  • Slide 27
  • Reduce - REDUCE - REDUCE command takes a groupedrowset, processes each group, and outputs zero, one, or multiple rows per group REDUCE [ [PRESORT column [ASC|DESC] [, ]]] ONgrouping_column [, ] USING [ (args) ] [PRODUCE column [, ]] [WHERE ] [HAVING ] publicclassMyReducer: Reducer { public override Schema Produce(string[] requestedColumns, string[] args, Schema inputSchema) { } public overrideIEnumerable Reduce(RowSet input, Row outRow, string[] args) { } } publicclassMyReducer: Reducer { public override Schema Produce(string[] requestedColumns, string[] args, Schema inputSchema) { } public overrideIEnumerable Reduce(RowSet input, Row outRow, string[] args) { } } - Map/ReduceProcess/Reduce - Map/Reduce can be easily expressed by Process/Reduce 27Azad University of Sanandaj
  • Slide 28
  • Combine - COMBINE command takes two matching input rowsets, combines them in some way, and outputs a sequence of rows COMBINE [AS ] [PRESORT ] WITH [AS ] [PRESORT ] ON USING [ (args) ] PRODUCE column [, ] [HAVING ] publicclassMyCombiner: Combiner { public override Schema Produce(string[] requestedColumns, string[] args, Schema leftSchema, string leftTable, Schema rightSchema, string rightTable) { } public overrideIEnumerable Combine(RowSet left, RowSet right, Row outputRow, string[] args) { } } publicclassMyCombiner: Combiner { public override Schema Produce(string[] requestedColumns, string[] args, Schema leftSchema, string leftTable, Schema rightSchema, string rightTable) { } public overrideIEnumerable Combine(RowSet left, RowSet right, Row outputRow, string[] args) { } } COMBINE S1 WITH S2 ON S1.A==S2.A AND S1.B==S2.B AND S1.C==S2.C USINGMyCombiner PRODUCE D, E, F 28Azad University of Sanandaj
  • Slide 29
  • Importing Scripts - Combines the benefits of virtual views and stored procedures in SQL - Enables modularity and information hiding - Improves reusability and allows parameterization - Provides a security mechanism IMPORT [PARAMS = [,]] 29Azad University of Sanandaj
  • Slide 30
  • Life of a SCOPE Query... Scope Queries Parser / Compiler / Security 30Azad University of Sanandaj
  • Slide 31
  • Optimizer and Runtime Transformation Engine Optimization Rules Logical Operators Physical operators Cardinality Estimation Cost Estimat ion Scope Queries (Logical Operator Trees) Optimal Query Plans (Vertex DAG) SCOPE optimizer Transformation-based optimizer Reasons about plan properties (partitioning, grouping, sorting, etc.) Chooses an optimal plan based on cost estimates Vertex DAG: each vertex contains a pipeline of operators SCOPE Runtime Provides a rich class of composable physical operators Operators are implemented using the iterator model Executes a series of operators in a pipelined fashion 31Azad University of Sanandaj
  • Slide 32
  • Example Query Plan (QCount) 1. Extract the input cosmos file 2. Partially aggregate at the rack level 3. Partition on query 4. Fully aggregate 5. Apply filter on count 6. Sort results in parallel 7. Merge results 8. Output as a cosmos file SELECT query, COUNT(*) AS count FROM search.log USINGLogExtractor GROUP BY query HAVING count> 1000 ORDER BY count DESC; OUTPUT TO qcount.result SELECT query, COUNT(*) AS count FROM search.log USINGLogExtractor GROUP BY query HAVING count> 1000 ORDER BY count DESC; OUTPUT TO qcount.result 32Azad University of Sanandaj
  • Slide 33
  • TPC-H Query 2 // Extract region, nation, supplier, partsupp, part RNS_JOIN = SELECTs_suppkey, n_nameFROM region, nation, supplier WHEREr_regionkey == n_regionkey ANDn_nationkey == s_nationkey; RNSPS_JOIN = SELECTp_partkey, ps_supplycost, ps_suppkey, p_mfgr, n_name FROM part, partsupp, rns_join WHEREp_partkey == ps_partkeyANDs_suppkey == ps_suppkey; SUBQ = SELECTp_partkeyASsubq_partkey, MIN(ps_supplycost) ASmin_cost FROMrnsps_joinGROUP BY p_partkey; RESULT = SELECTs_acctbal, s_name, p_partkey, p_mfgr, s_address, s_phone, s_comment FROMrnsps_joinAS lo, subqAS sq, supplier AS s WHERElo.p_partkey == sq.subq_partkey ANDlo.ps_supplycost == min_cost ANDlo.ps_suppkey == s.s_suppkey ORDERBYacctbalDESC, n_name, s_name, partkey; OUTPUTRESULT TO "tpchQ2.tbl"; // Extract region, nation, supplier, partsupp, part RNS_JOIN = SELECTs_suppkey, n_nameFROM region, nation, supplier WHEREr_regionkey == n_regionkey ANDn_nationkey == s_nationkey; RNSPS_JOIN = SELECTp_partkey, ps_supplycost, ps_suppkey, p_mfgr, n_name FROM part, partsupp, rns_join WHEREp_partkey == ps_partkeyANDs_suppkey == ps_suppkey; SUBQ = SELECTp_partkeyASsubq_partkey, MIN(ps_supplycost) ASmin_cost FROMrnsps_joinGROUP BY p_partkey; RESULT = SELECTs_acctbal, s_name, p_partkey, p_mfgr, s_address, s_phone, s_comment FROMrnsps_joinAS lo, subqAS sq, supplier AS s WHERElo.p_partkey == sq.subq_partkey ANDlo.ps_supplycost == min_cost ANDlo.ps_suppkey == s.s_suppkey ORDERBYacctbalDESC, n_name, s_name, partkey; OUTPUTRESULT TO "tpchQ2.tbl"; 33Azad University of Sanandaj
  • Slide 34
  • Sub Execution Plan to TPCH Q2 1. Join on suppkey 2. Partially aggregate at the rack level 3. Partition on group-by column 4. Fully aggregate 5. Partition on partkey 6. Merge corresponding partitions 7. Partition on partkey 8. Merge corresponding partitions 9. Perform join 34Azad University of Sanandaj
  • Slide 35
  • A Real Example 35Azad University of Sanandaj
  • Slide 36
  • Current/Future Work 36Azad University of Sanandaj
  • Slide 37
  • Conclusions - SCOPE: a new scripting language for large-scale analysis - Strong resemblance to SQL: easy to learn and port existing applications -Very extensible - Fully benefits from.NET library - Supports built-in C# templates for customized operations - Highly composable - Supports a rich class of physical operators - Great reusability with views, user-defined operators - Improves productivity - High-level declarative language - Implementation details (including parallelism, system complexity) are transparent to users - Allows sophisticated optimization - Good foundation for performance study and improvement 37Azad University of Sanandaj