DevoxxUK 2015 "The Seven Deadly Sins of Microservices (Full Version)"
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
-
Upload
javier-ramirez -
Category
Software
-
view
1.117 -
download
2
description
Transcript of Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
![Page 1: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/1.jpg)
@supercoco9#devoxxBigQuery
Big Data with Google BigQuery
Javier Ramirez@supercoco9https://teowaki.com
![Page 2: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/2.jpg)
@supercoco9#DevoxxBigquery
Managing Big Data with BigQuery
Javier Ramirez
•Writing software since 1996
•Web dev. since 1999 (C++, JAVA, PHP, Ruby, JS...)
•Founder of https://teowaki.com
•Google Developer Expert on the Cloud Platform
![Page 3: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/3.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
DAT
A
DAT
A
![Page 4: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/4.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
SERVER
S
SERVER
S
![Page 5: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/5.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
DEV
OPS
DEV
OPS
![Page 6: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/6.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
MONEY
MONEY
![Page 7: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/7.jpg)
bigdata is cool but...
hard to set up and monitor
expensive cluster
not interactive enough
![Page 8: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/8.jpg)
@supercoco9#DevoxxBigquery
bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds
![Page 9: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/9.jpg)
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
![Page 10: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/10.jpg)
Based on “Dremel”
Specifically designed for interactive queries over
petabytes of real-time data
![Page 11: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/11.jpg)
@supercoco9#DevoxxBigquery
Your only worries
•Load data
•Query the dataset
![Page 12: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/12.jpg)
loading data.
You just send the data in
text (or JSON) format
![Page 13: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/13.jpg)
up to 100K inserts per second
in stream mode
![Page 14: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/14.jpg)
It's just SQL
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
![Page 15: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/15.jpg)
@supercoco9#DevoxxBigquery
Subselect and joins out of the box
SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)
WHERE rank=1ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
![Page 16: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/16.jpg)
@supercoco9#DevoxxBigquery
specific extensions for analytics
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
correlations
![Page 17: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/17.jpg)
Things you always wanted to try but were too scared to
select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;
223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
![Page 18: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/18.jpg)
columnar storage
https://cookbook.experiencesaphana.com/crm/what-is-crm-on-hana/technology-innovation/row-vs-column-based/
![Page 19: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/19.jpg)
highly distributed execution using a tree
![Page 20: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/20.jpg)
web console screenshot
![Page 21: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/21.jpg)
@supercoco9#DevoxxBigquery
country segmented traffic
![Page 22: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/22.jpg)
@supercoco9#DevoxxBigqueryjavier ramirez @supercoco9 https://teowaki.com
window functions
![Page 23: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/23.jpg)
@supercoco9#DevoxxBigquery
our most active user
![Page 24: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/24.jpg)
@supercoco9#DevoxxBigquery
Worldwide events in the last 36 years
SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)
WHERE rank=1ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
![Page 25: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/25.jpg)
![Page 26: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/26.jpg)
![Page 27: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/27.jpg)
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday}
20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
![Page 28: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/28.jpg)
@supercoco9#DevoxxBigquery
![Page 29: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/29.jpg)
![Page 30: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/30.jpg)
![Page 31: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/31.jpg)
@supercoco9#DevoxxBigquery
Automation with Apps Script
●Read from BigQuery
●Create a spreadsheet on Drive
●E-mail it everyday as a PDF
https://developers.google.com/apps-script/
![Page 32: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/32.jpg)
![Page 33: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/33.jpg)
![Page 34: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/34.jpg)
![Page 35: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/35.jpg)
@supercoco9#DevoxxBigquery
bigquery pricing
$26 per stored TB1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB1 full scan = 160 MB
1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03
AppsScripts is for free
![Page 36: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/36.jpg)
@supercoco9#DevoxxBigquery
£0.054307 / month*
per 1MM rows
*the 1st 1TB every month is free of charge**assumming your rows have web server logs-like info
price per month
![Page 37: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/37.jpg)
@supercoco9#DevoxxBigquery
ig
![Page 38: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/38.jpg)
@YourTwitterHandle#DVXFR14{session hashtag} @supercoco9#devoxxBigquery
THAN
KS!
Javier Ramirez@supercoco9https://teowaki.com
Related links at:
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
![Page 39: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.fdocuments.net/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/39.jpg)
@supercoco9#DevoxxBigquery
Thanks / Creative Commons
•Presentation Template — Guillaume LaForge
•The Queen — A prestigious heritage with some inspiration from The Sex Pistols and funny Devoxxians
•Girl with a Balloon — Banksy
•Tube — Michael Keen