GCPUG-FUKUOKA データ加工&可視化ハンズオン

Click here to load reader

Embed Size (px)

Transcript of GCPUG-FUKUOKA データ加工&可視化ハンズオン

  1. 1. Google Cloud Dataprep Google Data Studio 2017/11/11 Wasaburo Miyata
  2. 2. 2 Wasaburo Miyata BigQuery / /CTO
  3. 3. 3 Agenda 1. / 2. Google Cloud Dataprep() / 3. Google Data Studio() 11/11GA
  4. 4. 4
  5. 5. 5 48 16481 https://ckan.open-governmentdata.org/dataset/atmospheric48
  6. 6. 6
  7. 7. 7 Data Studio Cloud Dataprep Cloud Shell Cloud Storage Cloud Storage CSV https://datasut
  8. 8. 8 Google Cloud Dataprep
  9. 9. 9
  10. 10. 10 Excel sed,awk,grep RPython DB(DWH)SQL Hadoop Cloud Dataprep
  11. 11. 11 Dataprep 1. 2. 3. Trifacta(https://www.trifacta.com/)GCP 4. &Cloud Dataflow 5. Shift_JIS(UTF-8) https://www.trifacta.com/support/articles/topics/139873-transforms https://cloud.google.com/dataprep/docs/ 11/11GA
  12. 12. 12 DataprepDataprep Google Cloud Dataflow: Cloud Dataprep Cloud Dataflow Cloud Dataflow : Google Cloud Storage Cloud Storage
  13. 13. 13 1. 2. 3.
  14. 14. 14 1. BigQuery 2. Excel 3. CSV 4. JSON 5. PLAIN TEXT 6. LOG 7. TSV 8. Avro https://cloud.google.com/dataprep/docs/html/Supported-File-Formats_57344528 Input Output 1. BigQuery 2. CSV 3. JSON 4. Avro
  15. 15. 15 Dataprep
  16. 16. 16 https://cloud.google.com/dataprep/docs/quickstarts/quickstart-dataprep 1. 2. GCP 3. 4. API a. Cloud Dataflow b. BigQuery c. Cloud Storage
  17. 17. 17 1. Google Cloud Shell 2. a. b. UTF-8 c. GCS https://github.com/wamiya/dataprep-handson/blob/master/uplaodgcs.txt Shell
  18. 18. 18
  19. 19. 19 1. 2. 3. Suggestion 4. 5. 6.
  20. 20. Input 20 DATASETS Import Data GCS -dataprep import Wrangle in new Flow
  21. 21. 21 Random,FirstRows,Filter-based,Anomaly-based Stratified,Cluster-based
  22. 22. // 22 Add
  23. 23. / 23
  24. 24. 24
  25. 25. 25 Dataflow
  26. 26. 26
  27. 27. 27 1. 2. 3. 3.1. column_ ) 3.2. 24 0 4. 4.1. 4.2. (24) 4.3. 4.4. 5. 6.
  28. 28. 28 (1) (24) 20171101 33.6723 130.437 ppd 8 12 [] - 2017-11-01 01:00:00 ppd 8 2017-11-01 02:00:00 ppd 10 2017-11-01 23:00:00 ppd 7 2017-11-02 00:00:00 ppd 12 []
  29. 29. 29 Columns column1Column24 WindowsShift Action Restructure Unpivot
  30. 30. 30 Add
  31. 31. 31 (mismatched) mismatched values to NULL Add
  32. 32. 32 (column_) column_ column with in key Add
  33. 33. 33 (240) column_ 24 with in key edit
  34. 34. 34 (240) New value0 Add
  35. 35. 35 derive DATEADD(column,1,day) nextday Add
  36. 36. 36 (24) set column IF(key==0,nextday, column) Add
  37. 37. 37 set column Add DATETIME(year(column),month(column),day(column),key,0,0)
  38. 38. 38 column FormatChange format dateformat( $col, 'yyyy-MM-dd HH:mm:ss' ) Add
  39. 39. 39 Columns nextday column3 column4 key Action Drop
  40. 40. 40 rename Add column date_time column1 kyoku column2 koumoku column5 tani value value
  41. 41. 41 1. 2.
  42. 42. 42 edit
  43. 43. 43 GCS -dataprep
  44. 44. 44 Create new file
  45. 45. 45 kankyo-out Save Settings
  46. 46. 46 Run job
  47. 47. 47 GCS
  48. 48. 48 Google Cloud DataStudio
  49. 49. 49 1. / 2. a. BigQuery/Cloud SQL b. Goolge c. Google 3. 4. https://cloud.google.com/data-studio/?hl=ja
  50. 50. 50 1. 1.1. 1.2. 2. 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. / 3. 3.1. 3.2.
  51. 51. 51 https://datastudio.google.com/?hl=ja
  52. 52. 52
  53. 53. 53
  54. 54. 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. 58 []-dataprep/kankyo-out.csv
  59. 59. 59 YYYYMMDDHH
  60. 60. 60
  61. 61. 61 YYYYMMDD
  62. 62. 62
  63. 63. 63
  64. 64. 64 kankyodata date_time date_time tani value kyoku koumoku
  65. 65. 65
  66. 66. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69
  70. 70. 70
  71. 71. 71
  72. 72. 72 /
  73. 73. 73
  74. 74. 74
  75. 75. 75