Google Dremel

Click here to load reader

  • date post

    19-Nov-2014
  • Category

    Technology

  • view

    579
  • download

    3

Embed Size (px)

description

 

Transcript of Google Dremel

  • 1. Google Dremel maruyama097

2. Dremel Web Android Google Google BooksOCR. Google Mapmap BigtableTablet Google Disk I/O Google 3. Dremel Web DremelSQL-like HivePigMap-Reduce Dremel CPU 4. Google Dremel 5. Dremel Web 1000~4000 6. Dremel Map Reduce ->billions of records Dremel SQL DEFINE TABLE t AS /path/to/data/* SELECT TOP(signal1, 100), COUNT(*) FROM t DashboardCatalog GFS 7. message Document { required int64 DocId; optional group Links { repeated int64 Backward; repeated int64 Forward; } repeated group Name { repeated group Language { required string Code; optional string Country; } optional string Url; }}Protocol BufferDocument DocIDLinks? Backward*Name* Forward*Language*CodeUrl?Country? 8. DocumentDocIDLinks?Backward*Forward*Names*Language*CodeUrl?Country? 9. DocumentDocIDRepetition LevelNames* 1Links?Backward* 1 Forward* 1 Language* 2 Url?CodeCountry? 10. DocumentDocID Links? 1Definition LevelNames* 1Backward* 2 Forward* 2 Language* 2 Url? 2Code 2Country? 3 11. DocumentRepetetion Level Definition LevelDocID Links? 1Names* 1 1Backward* 1 Forward* 1 Language* 2 Url? 2 2 2 2Code 2Country? 3 12. DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 13. DocId: 10 0,0 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 14. DocId: 10 0,0 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 15. DocId: 10 ,0 Links Backward:NULL 0,1 Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 16. DocId: 10 ,0 Links Backward:NULL 0,1 Forward: 20 0,2 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 17. DocId: 10 ,0 Links Backward:NULL 0,1 Forward: 20 0,2 Forward: 40 1,2 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 18. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 19. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 20. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 21. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2 0,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 22. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2 0,2 0,3DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 23. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2 0,2 0,3DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 24. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'0,1 0,2 1,2 1,2 0,2 0,3 2,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 25. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 26. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Url: 'http://B' Name Language Code: 'en-gb'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 27. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Url: 'http://B' Name Language Code: 'en-gb'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 28. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Url: 'http://B'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2Name Language Code: 'en-gb' Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 29. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Url: 'http://B'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1Name Language Code: 'en-gb' Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 30. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B'0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1Name Language Code: 'en-gb' Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 31. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb' Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 32. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb 1,2 Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 33. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 34. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb 1,2 Country: 'gb DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 35. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 36. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 37. DocId: 10 ,0 Links Backward:NULL Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us Language Code: 'en Country:NULL Url: 'http://A Name Language Code: NULL Country:NULL Url: 'http://B0,1 0,2 1,2 1,2 0,2 0,3 2,2 2,2 0,2 1,1 1,1 1,2Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 38. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 39. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 40. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 0,2 41. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 0,2 1,2 42. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 0,2 1,2 0,2 43. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C0,0 0,2 1,2 0,2 44. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Language Url: 'http://C0,0 0,2 1,2 0,2 45. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Language Code: NULL Url: 'http://C0,0 0,2 1,2 0,2 0,1 46. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 0,0 Links Backward: 10 0,2 Backward: 30 1,2 Forward: 80 0,2 Name Language Code: NULL 0,1 Country: NULL 0,1 Url: 'http://C 47. Name Language Code: 'en-gb 1,2 Country: 'gb 1,3 URI:NULL 1,1 DocId: 20 0,0 Links Backward: 10 0,2 Backward: 30 1,2 Forward: 80 0,2 Name Language Code: NULL 0,1 Country: NULL 0,1 Url: 'http://C 0,2 48. DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 49. DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 50. DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C 51. Level 0DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 52. DocumentDocIDLevel 1Names* 1Links?Backward* 1 Forward* 1 Language*CodeUrl?Country? 53. DocumentDocIDLevel 1Names* 1Links?Backward* 1 Forward* 1 Language*CodeUrl?Country? 54. DocumentDocIDLevel 0 Level 1Names* 1Links?Backward* 1 Forward* 1 Language*CodeUrl?Country? 55. DocumentDocIDBackward*Level 2Links?Forward*Names*Language* 2 Url?CodeCountry? 56. DocumentDocIDBackward*Level 2Links?Forward*Names*Language* 2 Url?CodeCountry? 57. Level 0 Level 1 Level 2DocumentDocIDNames* 1Links?Backward* 1 Forward* 1 Language* 2 Url?CodeCountry? 58. DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 59. DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 60. Level 0DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 61. Level 0 Level 1 Level 2DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 62. Level 0 Level 1 Level 2DocumentDocIDBackward*Links?Forward*Names*Language*CodeUrl?Country? 63. Finite State Machine CONSTRUCTION ALGORITHM 64. Finite State Machine CONSTRUCTION ALGORITHM Links.BackwardLinks.Forward 65. 1. procedure ConstructFSM(Field[] fields): 2. 3. maxLevel 4. barrier FSM 66. (Lines 6-10) FieldReader 67. 6. preField 7. preFieldbarrierLevel 8. backLevelpreField 9. (field, backLevel) -> preField 10. 68. 2 (Lines 11-14) Line8 69. 11. [barrierLevel+1..maxLevel] 12. 13. level-1 14. 70. 3(Lines 15-17) barrierLevel barrier FieldReader barrier 71. 15.[0..barrierLevel] 16. (field, level) -> barrier 17. 72. COLUMN STRIPING ALGORITHM 73. DissectRecord RecordDecoder RecordDecoder FieldWriters FieldWriter DissectRecord 74. writer While (Line 5) 75. seenFields chRepetitionLevel (Lines 913) (Line 18) 76. Section 4.2 FieldWriters writer non-leaf writer( )writer writer writernon-null 77. (2Name.Language)non-atomic non-NULL 78. 1. procedure DissectRecord( RecordDecoder decoder, FieldWriter writer, int ): 2. writer 3. seenFields = {} // 4. decoder 5. FieldWriter chWriter = decoder writer 79. 6. 7.int = if seenFieldswriterID = writer 8. else 9. writerIDseenFields 10. end if 80. 11. if Writeratomic field 12. writer 13. else 14. DissectRecord( RecordDecoder, writer, 15. end if 16.end while 81. RECORD ASSEMBLY ALGORITHM 82. In their on-the-wire representation, records are laid out as pairs of a field identifier followed by a field value. Nested records can be thought of as having an opening tag and a closing tag, similar to XML (actual binary encoding may differ, see [21] for details). In the following, writing opening tags is referred to as starting the record, and writing closing tags is called ending it. 83. AssembleRecord procedure takes as input a set of FieldReaders and (implicitly) the FSM with state transitions between the readers. Variable reader holds the current FieldReader in the main routine (Line 4). Variable lastReader holds the last reader whose value we appended to the record and is available to all three procedures shown in Figure 17. The main while-loop is at Line 5. 84. We fetch the next value from the current reader. If the value is not NULL, which is determined by looking at its definition level, we synchronize the record being assembled to the record structure of the current reader in the method MoveToLevel, and append the field value to the record. Otherwise, we merely adjust the record structure without appending any valuewhich needs to be done if empty records are present. 85. On Line 12, we use a full definition level. Recall that the definition level factors out required fields (only repeated and optional fields are counted). Full definition level takes all fields into account. 86. Procedure MoveToLevel transitions the record from the state of the lastReader to that of the nextReader (see Line 22). For example, suppose the lastReader corresponds to Links.Backward in Figure 2 and nextReader is Name.Language.Code. The method ends the nested record Links and starts new records Name and Language, in that order. Procedure ReturnsToLevel (Line 30) is a counterpart of MoveToLevel that only ends current records without starting any new ones. 87. SELECT PROJECT AGGREGATE EVALUATION ALGORITHM 88. The algorithm addresses a general case when a query may reference repeated fields; a simpler optimized version is used for flat-relational queries, i.e., those referencing only required and optional fields. The algorithm has two implicit inputs: a set of FieldReaders, one for each field appearing in the query, and a set of scalar expressions, including aggregate expressions, present in the query. The repetition level of a scalar expression (used in Line 8) is determined as the maximum repetition level of the fields used in that expression. 89. In essence, the algorithm advances the readers in lockstep to the next set of values, and, if the selection conditions are met, emits the projected values. Selection and projection are controlled by two variables, fetchLevel and selectLevel. During execution, only readers whose next repetition level is no less than fetchLevel are advanced (see Fetch method at Line 19). In a similar vein, only expressions whose current repetition level is no less than selectLevel are emitted (Lines 7-10). 90. The algorithm ensures that expressions at a higher-level of nesting, i.e., those having a smaller repetition level, get evaluated and emitted only once for each deeper nested expression. 91. Google Dremel 92. Document DocID10 0 0 20 0 0Links?Backward*NULL 0 1 10 02 30 12Name*Forward*20 40 60 800 1 1 02 2 2 2Language*Url?http://A 0 2 Code Country? http://B 1 2 NULL 11 http://C 0 2 en-us 0 2 us 03 en 2 2 NULL 2 2 NULL 1 1 NULL 1 1 en-gb 1 2 gb 13 NULL 0 1 NULL 0 1 93. Dremel SELECT DocId AS Id, COUNT(Name.Language.Code) WITHIN Name AS Cnt, Name.Url + ',' + Name.Language.Code AS Str FROM t WHERE REGEXP(Name.Url, '^http') AND DocId < 20; Id: 10 Name Cnt: 2 Language Str: 'http://A,en-us' Str: 'http://A,en' Name Cnt: 0 message QueryResult { required int64 Id; repeated group Name { optional uint64 Cnt; repeated group Language { optional string Str; }}} 94. SELECT A, COUNT(B) FROM T GROUP BY A SELECT A, SUM(c) FROM (R11 UNION ALL ...R1n) GROUP BY A R1i = SELECT A, COUNT(B) AS c FROM T1i GROUP BY A T1i1iT Tablet 95. 96. 97. MapReduceDremelnumRecs: table sum of int; numWords: table sum of int; emit numRecs