Apache PIG introduction
-
Upload
jackson-dos-santos-olveira -
Category
Technology
-
view
286 -
download
2
Transcript of Apache PIG introduction
![Page 1: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/1.jpg)
Jackson Oliveira@cyber_jsoSoftware Engineer
APACHE PIG
![Page 2: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/2.jpg)
A High Level Analysis Platform
![Page 3: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/3.jpg)
Which can be plugged on Hadoop
![Page 4: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/4.jpg)
![Page 5: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/5.jpg)
How it works?
![Page 6: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/6.jpg)
How it works?
![Page 7: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/7.jpg)
What is the point in using PIG?!
![Page 8: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/8.jpg)
MR is not difficult in theory...
![Page 9: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/9.jpg)
But the reality can be different...
![Page 10: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/10.jpg)
We want it easy to understand
Users = LOAD 'myfile.txt' ‘users’ USING PigStorage('\t') AS (name, age);
Filtered = FILTER Users BY age >= 18 AND age <= 25;
Pages = LOAD ‘pages’ AS (user, url);
Joined = JOIN Filtered BY name, Pages BY user;
Grouped = GROUP Joined BY url;
Summed = FOREACH Grouped generate GROUP, COUNT(Joined) AS clicks;
Sorted = ORDER Summed BY clicks DESC;
![Page 11: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/11.jpg)
Also easy to extend (UDFs)...
![Page 12: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/12.jpg)
It takes care of the execution plan for you
![Page 13: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/13.jpg)
When use apache pig?
![Page 14: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/14.jpg)
If you want thing being done faster
![Page 15: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/15.jpg)
An active community
![Page 16: Apache PIG introduction](https://reader034.fdocuments.net/reader034/viewer/2022052218/556a7a35d8b42a7c758b4ac2/html5/thumbnails/16.jpg)
You might need rethink complicated things