Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc....

16
Big Data Analytics using Spark CSE255 / DSE230

Transcript of Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc....

Page 1: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

BigDataAnalyticsusingSpark

CSE255/DSE230

Page 2: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Whatis“BigData”?

• 1GB?• 1TB?

• 1PB?• ….

• Weneedadefinitionthatdoesnotchangeovertime.• Moredatathancanfitonasinglework-station.• Communicationdominatescomputation.

Page 3: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

“DataScience”vs.“Computerscience”

• Computersciencefocusesonthealgorithm• Requirementsspecifyinputtooutputrelationship (findshortestpath)• Algorithmshouldbecorrectandefficient• Input(data)canbeanythingthatconformstoinputformat.

• DataSciencefocusesonthedata.• Thegoalistounderstand/model/controlthephysicalprocessgeneratingthedata.• Algorithmsareusedbythedatascientisttoidentifypatternsinthedata.• Dataisassumedtoconformtoastatisticalmodel.

Page 4: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Whatisadatascientist?From:DoingDataScience:StraightTalkfromtheFrontlineRachelSchutt&CathyO’Neil

&Communicationskills

Page 5: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Therearemanygoodjobsindatascience

• DataScientist: Oneofthetentopjobsin2016accordingtoForbesandglass-door.• Therearecurrently8446datascienceopeningsintheUS(LinkedIn).• 7000openingsinIndia(naukuri.com),• Medianbasesalaryisaround$116,000peryear(Glassdoor).

Page 6: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Halicioglu graduatedwithabachelor’sdegreeincomputersciencein1996

Page 7: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

NickWoodman,FounderofGo-ProWoodmangraduatedfromUCSDinJune1997withaB.Ainvisualartsandaminorincreativewriting.

Page 8: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

TheoutputofasinglegoPro

• GoProHeroBlack5:$400.• 120FPS1080p1920X1080• =250Mpixel/seceachpixel3*8bits=6Gbit/sec• Maxcompressedoutputbitrate60Mbit/sec• Compressionbyafactorof100.• 2:14minutes=1GBcompressed.• Imageprocessingrequiresuncompressed•

Page 9: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Processingatthesource

• SupposeyouwantedtouseGoProtomonitoryourfrontdoor.• TheGoProusessophisticatedlossy compressiontoreducedatabyafactorof100.• However,toperformanalysis,yourPCwouldhavetouncompress thedataandthenprocess>40GBperminute.• Youwouldneedabeefycomputer.• Butmostofthetimethereisverylittlechangefromframetoframe,soifchangedetectorisimplementedonthecamera,thereis,mostofthetime,nothingtocommunicate.

Page 10: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Scalingup:Sensornetworks& Smartcities

Page 11: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

MatchPointhttps://datascience.sdsc.edu/matchpoint

Page 12: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

CSE255/DSE230

• Afuncourse• Notaneasycourse.• WeeklyHW,fromFridaytoFridayexpecttospend~10hoursoneachHW.• Youareexpectedtofigureoutthingsonyourown.

• Consultdocumentationofpython,sparketc.• Brushuponyourlinearalgebra,eigen-vectors,eigen-values,eigen-decomposition.• Seelinearalgebramaterialonwebsite.• Wikipedia

• YouareexpectedtoparticipateinclassandonPiazza.

Page 13: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Whatwillyoulearn?From:DoingDataScience:StraightTalkfromtheFrontlineRachelSchutt&CathyO’Neil

&Communicationskills

PythonSpark

LinearAlgebraPCARegressionClassification

Jupyter NotebooksVisualizationInterpretationBreakdownProblems

Page 14: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Jupyter Notebooks

• Pullthemfromthegithub repository.• Theyareyourmainresource:• ClassSlidesarederivedfromthenotebooks• Code• Explanations• Pointerstoadditionalresources• Exercises

Page 15: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Grading

• HW:50%• Therewillbe9HWassignments,theonewiththelowestgradewillbedroppedfromtheaverage.

• Quiz:10%• EachThursday.Lowestgradedroppedfromaverage.

• BreakdownProblems:10%• Explainedonclasswebpage.

• Final:30%• Yetdodecidewhetherin-classortakehome.

Page 16: Big Data Analytics using Spark - GitHub Pages · •Consult documentation of python, spark etc. •Brush up on your linear algebra, eigen-vectors, eigen-values, eigen-decomposition.

Moredetailsonthewebsite

• Goto• https://mas-dse.github.io/DSE230/