Download - Interactive Scientific Image Analysis using Spark

Transcript
Page 1: Interactive Scientific Image Analysis using Spark

SUMMIT EASTSUMMIT EAST

InteractiveScientificImageAnalysisandAnalyticsusingSparkKevinMaderSparkEast,NYC,19March2015

Page 2: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

OutlineBackground:OurTechnique(whywehavebigdata)

X-RayTomographicMicroscopy

Imagingin2015

TheProblem(s)

TheToolsSparkImagingLayer

3DImaging

HyperspectralImaging

InteractiveAnalysis/Streaming

TheScienceGenomeScaleStudies

LargeDatasets

Outlook/Developments

Page 3: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Synchrotron-basedX-RayTomographicMicroscopyTheonlytechniquewhichcandoall

peerdeepintolargesamples

achieve isotropicspatialresolution

with1.8mmfieldofview

achieve>10Hztemporalresolution

8GB/sofimages

[1]Moksoetal.,J.Phys.D,46(49),2013

< 1μm

CourtesyofM.PistoneatU.Bristol

Page 4: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ImageSciencein2015:MoreandfasterX-Ray

SwissLightSource(SRXTM)imagesat(>1000fps) 8GB/s,diffractionpatterns(cSAXS)at30GB/s

Nanoscopium(Soleil),10TB/day,10-500GBfilesizes,veryheterogenousdata

OpticalLight-sheetmicroscopy(see ofJeremyFreeman)producesimages500MB/s

High-speedconfocalimagesat(>200fps)78Mb/s

GeospatialNewsatelliteprojects(Skybox,etc)willmeasurehundredsofterabytestopetabytesofimagesayear

talk→

PersonalGoPro4Black-60MB/s(3840x2160x30fps)for$600

-400MB/s(640x480x840fps)for$400fps1000

Page 5: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

HowmuchisaTB,really?Ifyoulookedatone1000x1000sizedimageeverysecond

Itwouldtakeyou139hourstobrowsethroughaterabyteofdata.

Year Timeto1

TB

Manpowerto

keepup

SalaryCosts/

Month

2000 4096min 2people 25kCHF

2008 1092min 8people 95kCHF

2014 32min 260people 3255kCHF

2016 2min 3906people 48828kCHF

Page 6: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Computinghaschanged:ParallelMooresLaw

Basedondatafrom

Transistors ∝ 2T/(18 months)

https://gist.github.com/humberto-ortiz/de4b3a621602b78bf90d

Therearenowmanymoretransistorsinsideasinglecomputerbuttheprocessingspeedhasn'tincreased.Howcanthisbe?

MultipleCore

Manymachineshavemultiplecoresforeachprocessorwhichcanperformtasksindependently

MultipleCPUs

Morethanonechipiscommonlypresent

Newmodalities

GPUsprovidemanycoreswhichoperateatslowspeed

ParallelCodeisimportant

Page 7: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

CloudComputingCostsThefigureshowstherangeofcloudcosts(determinedbypeakusage)comparedtoalocalworkstationwithutilizationshownastheaveragenumberofhoursthecomputerisusedeachweek.

Thefigureshowsthecostofacloudbasedsolutionasapercentageofthecostofbuyingasinglemachine.Thevaluesbelow1showthepercentageasanumber.Thepanelsdistinguishtheaveragetimetoreplacementforthemachinesinmonths

Page 8: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

TheProblemThereisafloodofnewdataWhattookanentirePhD3-4yearsago,cannowbemeasuredinaweekend,orevenseveralseconds.Analysistoolshavenotkeptup,aredifficulttocustomize,andusuallyhighlyspecific.

OptimizedData-StructuresdonotfitData-structuresthatwerefastandefficientforcomputerswith640kbofmemorydonotmakesenseanymore

Single-corecomputingistooslowCPU'sarenotgettingthatmuchfasterbuttherearealotmoreofthem.Iteratingthroughahugearraytakesalmostaslongon2014hardwareas2006hardware

Page 9: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ExploratoryImageProcessingPrioritiesCorrectnessThemostimportantjobforanypieceofanalysisistobecorrect.

Apowerfultestingframeworkisessential

Avoidrepetitionofcodewhichleadstoinconsistencies

Usecompilerstofindmistakesratherthanusers

Easilyunderstood,changed,andusedAlmostallimageprocessingtasksrequireanumberofpeopletoevaluateandimplementthemandarealmostalwaysmovingtargets

Flexible,modularstructurethatenablesreplacingspecificpieces

FastThelastofthemajorprioritiesisspeedwhichcoversbothscalability,rawperformance,anddevelopmenttime.

Longwaitsforprocessingdiscouragesexploration

Manualaccesstodataonsepareatedisksisahugespeedbarrier

Real-timeimageprocessingrequiresmillisecondlatencies

Implementingnewideascanbedonequickly

Page 10: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

TheFrameworkFirstRatherthanbuildingananalysisasquicklyaspossibleandthentryingtohackittoscaleuptolargedatasets

chosetheframeworkfirst

thenstartmakingthenecessarytools.

Google,Amazon,Yahoo,andmanyothercompanieshavemadehugein-roadsintotheseproblems

Therealneedisafast,flexibleframeworkforrobustly,scalablyperformingcomplicatedanalyses,asortofExcelforbigimagingdata.

ApacheSparkandHadoop2Thetwoframeworksprovideafreeoutoftheboxsolutionfor

scalingto>10000computers

storingandprocessingexabytesofdata

faulttolerance

2/3rdsofcomputerscancrashandarequeststillaccuratelyfinishes

hardwareandsoftwareplatformindpendence(Mac,Windows,Linux)

Page 11: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Spark->Microscopy?TheseframeworksarereallycoolandSparkhasabigvocabulary,butflatMap,filter,aggregate,join,groupBy,andfoldstilldonotsoundlikeanythingIwanttodotoanimage.

Iwantto

filteroutnoise,segment,chooseregionsofinterest

contour,componentlabel

measure,count,andanalyze

SparkImageLayerDevelopedat , ,and

TheSparkImageLayerisaDomainSpecificLanguageforMicroscopyforSpark.

Itconvertscommonimagingtasksintocoarse-grainedSparkoperations

4Quant ETHZurichPaulScherrerInstitut

Page 12: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

SparkImageLayerWehavedevelopedanumberofcommandsforSILhandlingstandardimageprocessingtasks

Fullyexensiblewith

Page 13: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Usecase:HyperspectralImagingHyperspectralimagingisarapidlygrowingareawiththepotentiallyformassivedatasetsandaseveredeficitofusuabletools.

Thescaleofthedataislargeandstandardimageprocessingtoolsareill-suitedforhandlingthem,althoughtheideasusedinimageprocessingareequallyapplicabletohyperspectraldata(filtering,thresholding,segmentation,…)anddistributed,parallelapproachesmakeevenmoresenseonsuchmassivedatasets

Page 14: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

FlexibilitythroughTypesDevelopinginScalabringsadditionalflexibilitythroughtypes[1],withmicroscopythestandardformatsare2-,3-andeven4-ormoredimensionalarraysormatriceswhichcanbeiteratedthroughquicklyusingCPUandGPUcode.WhilestillpossibleinScala,thereisagreatdealmoreflexibilityfordatatypesallowinganythingtobestoredasanimageandthenprocessedaslongasbasicfunctionsmakesense.

[1]FightingBitRotwithTypes(ExperienceReport:ScalaCollections),MOdersky,FSTTCS2009,December2009

Whatisanimage?Acollectionofpositionsandvalues,maybemore(notanarrayofdouble).Arraysareefficientforstoringincomputermemory,butoftenapoorwayofexpressingscientificideasandanalyses.

FilterNoise?

combine information from nearbypixels

Findobjects

determine groups of pixelswhich are very similar todesired result

Page 15: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

MakingCodingSimplerwithTypestrait BasicMathSupport[T] extends Serializable { def plus(a: T, b: T): T def times(a: T, b: T): T def scale(a: T, b: Double): T def negate(a: T): T = scale(a,-1) def invert(a: T): T def abs(a: T): T def minus(a: T, b: T): T = plus(a, negate(b)) def divide(a: T, b: T): T = times(a, invert(b)) def compare(a: T, b: T): Int}

Page 16: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ContinuingwithTypesSimplefilterimplementation

Spectraaswellsupportedtypes

def SimpleFilter[T](inImage: Image[T])(implicit val wst: BasicMathSupport[T]) = {val width: Double = 1kernel = (pos: D3int,value: T) => value * exp(-(pos.mag/width)**2)kernelReduce = (ptA,ptB) => (ptA + ptB) * 0.5runFilter(inImage,kernel,kernelReduce)}

implicit val SpectraBMS = new BasicMathSupport[Array[Double]] { def plus(a: Array[Double], b: Array[Double]) = a.zip(b).map(_ + _)... def scale(a: Array[Double], b: Double) = a.map(_*b)

Page 17: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

InteractiveAnalysisCombiningmanydifferentcomponentstogetherinsideoftheSparkShell,IPythonorZeppelin,makeiteasiertoassembleworkflows

Page 18: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ScientificCases:Genome-scaleImagingWewanttounderstandtherelationshipbetweengeneticbackgroundandbonestructure

Withexistingtools,analysisispossibleandanumberofpublicationshavebeenmade,evenonesthatshowdifferencesbetweenstrainsofmice

But

n<12

time-consuming(yearsbetweenmeasurementandpublication)

notflexibleorreproducible

notcloud-based

Page 19: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Genome-ScaleImagingGeneticstudiesrequirehundredstothousandsofsamples,inthiscasethedifferencebetween717and1200samplesisthedifferencebetweenfindingthelinksandfindingnothing.

2008approach-120yearsHandIdentification->30s/object

30-40kobjectspersample

OneSamplein6.25weeks

2014approach-1.5yearsImageJmacroforsegmentation(2-4hours/sample)

Pythonscriptforshapeanalysis(3hours/sample)

Paraviewmacrofornetworkandconnectivity(2hours/sample)

Pythonscripttopoolresults(3-4hours)

MySQLDatabasestoringresults(5minutes/query)

Page 20: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

GeneticStudiesusingSparkImageLayerAnalysiscouldbecompletedinseveralmonths(insteadof120years,couldnowbecompletedindaysinthecloud)

Datacanbefreelyexploredandanalyzed

val bones = sc.loadImages("work/f2_bones/*/bone.tif")Segmenthardandsofttissues

Labelcells

Exportresults

val hardTissue = bones.threshold(OTSU)val softTissue = hardTissue.invert

val cells = hardTissue.componentLabel. filter(c=>c.size>100 & c.size<1000)

cells.shapeAnalysis.WriteOutput("lacuna.csv")

Page 21: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ParallelToolsforImageandQuantitativeAnalysisval cells = sqlContext.csvFile("work/f2_bones/*/cells.csv")val avgVol = sqlContext.sql("select SAMPLE,AVG(VOLUME) FROMcells GROUP BY SAMPLE")Collaborators/Competitorscanverifyresultsandextendonanalyses

CombineImageswithResults

avgVol.filter(_._2>1000).map(sampleToPath).joinByKey(bones)Seeimmediatelyindatasetsofterabyteswhichimagehadthelargestcells

Newhypothesesandanalysescanbedoneinseconds/minutes

Task SingleCoreTime SparkTime(40cores)

LoadandPreprocess 360minutes 10minutes

SingleColumnAverage 4.6s 400ms

1K-meansIteration 2minutes 1s

Page 22: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ScienceProblems:FullBrainImagingCollaborationwithA.AstolfoandA.Patera

Measureafullmousebrain(1cm )withcellularresolution(1 m)

10x10x10scansat2560x2560x216014TVoxels

0.000004%oftheentiredataset

3

μ

14TVoxels=56TB

Eachscanneedstoberegisteredandalignedtogether

Therearenocomputerswith56TBofmemory

Evenmultithreadedapproachsarenotfeasibleandrequiremanylogistics

Analysisofthestitcheddataisalsoofinterest(segmentation,vesselanalysis,distributionandnetworkconnectivity)

Page 23: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ScienceProblems:BigStitchingImages : RDD[((x, y, z), Img[Double])] =

[( , Img),…]x dispField = Images. cartesian(Images).map{ case ((xA,ImA), (xB,ImB)) => xcorr(ImA,ImB,in=xB-xA) }

Page 24: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

FromMatchingtoStitchingFromtheupdatedinformationprovidedbythecrosscorrelationsandbyapplyingappropriatesmoothingcriteria(ifnecessary).

Thestitchingitself,ratherthanrewritingtheoriginaldatacanbedoneinalazyfashionascertainregionsoftheimageareread.

Thisalsoensurestheoriginaldataisleftunalteredandallanalysisisreversible.

def getView(tPos,tSize) = stImgs. filter(x=>abs(x-tPos)<img.size). map { case (x,img) => val oImg = new Image(tSize) oImg.copy(img,x,tPos)}.addImages(AVG)

Page 25: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ViewingRegionsgetView(Pos(26.5,13),Size(2,2))

Page 26: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Real-timewithSparkStreaming:WebcamInthebiologicalimagingcommunity,theopensourcetoolsofImageJ2andFijiarewidelyacceptedandhavealargenumberofreadilyavailablepluginsandtools.

WecanintegratethefunctionalitydirectlyintoSparkandperformoperationsonmuchlargerdatasetsthanasinglemachinecouldhaveinmemory.Additionallytheseanalysescanbeperformedonstreamingdata.

Page 27: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

StreamingAnalysisReal-timeWebcamProcessing

Filterimages

Createabackgroundimage

val wr = new WebcamReceiver()val ssc = sc.toStreaming(strTime)val imgList = ssc.receiverStream(wr)

val filtImgs = allImgs.mapValues(_.run("Median...","radius=3"))

val totImgs = inImages.count()val bgImage = inImages.reduce(_ add _).multiply(1.0/totImgs)

Page 28: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

IdentifyOutliersinStreamsRemovethebackgroundimageandfindthemeanvalue

Showtheoutliers

val eventImages = filtImgs. transform{ inImages => val corImage = inImages.map { case (inTime,inImage) => val corImage = inImage.subtract(bgImage) (corImage.getImageStatistics().mean, (inTime,corImage)) } corImage }

eventImages.filter(iv => Math.abs(iv._1)>20). foreachRDD(showResultsStr("outlier",_))

Page 29: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

StreamingDemowithWebcam

Page 30: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Asascientist(notadata-scientist)ApacheSparkisbrilliantplatformandutilizingGraphX,MLLib,andotherpackagesthereunlimitedpossibilities

Scalacanbeabeautifulbutnoteasylanguage

Pythonisaneasierlanguage

Bothsufferfrom

Non-obviousworkflows

Scriptsdependingonscriptsdependingonscripts(canbeveryfragile)

Althoughallanalysescanbeexpressedasaworkflow,thisisoftendifficulttoseefromthecode

Non-technicalpersonshavelittleabilitytounderstandormakeminoradjustmentstoanalysis

Parametersrequirerecompilingtochange

orGUIsneedtobeplacedontop

Page 31: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

AbasicimagefilteringoperationThankstoSpark,itiscached,inmemory,approximate,cloud-ready

ThankstoMap-Reduceitisfault-tolerant,parallel,distributed

ThankstoJava,itishardwareagnostic

Butitisalsonotreallysoreadable

def spread_voxels(pvec: ((Int,Int),Double), windSize: Int = 1) = { val wind=(-windSize to windSize) val pos=pvec._1 val scalevalue=pvec._2/(wind.length*wind.length) for(x<-wind; y<-wind) yield ((pos._1+x,pos._2+y),scalevalue)}

val filtImg=roiImg. flatMap(cvec => spread_voxels(cvec)). filter(roiFun).reduceByKey(_ + _)

Page 32: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

LittleblocksforbigdataHereweusea -basedworkflowandourSparkImagingLayerextensionstocreateaworkflowwithoutanyScalaorprogrammingknowledgeandwithaneasilyvisibleflowfromoneblocktothenextwithoutanyperformanceoverheadofusingothertools.

KNIME

Page 33: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

RealityCheckSparkisnotperformant dedicated,optimizedCPUandGPUcodeswillperformslightlytomuchmuchbetterwhenevaulatedbypixelspersecondperprocessingpowerunit

thesecodeswillbewildlyoutperformedbydedicatedhardware/FPGAsolutions

Serializationoverheadandnetworkcongestionarenotneglibleforlargedatasets

→ ButScala/PythoninSparkissubstantiallyeasiertowriteandtest

Highlyoptimizedcodesareveryinflexible

Humantimeis400xmoreexpensivethanAWStime

Mistakesduetopoortestingcanbefatal

Sparkscalessmoothlytoenormousdatasets

GPUsrarelyhavemorethanafewgigabytes

Writingcodethatpagestodiskispainful

Sparkishardwareagnostic(nodriversorvendorlock-in)

Page 34: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

Wehaveacooltool,butwhatdoesthismeanforme?Aspinoff-4Quant:Fromimagestoinsight

CloudImageProcessing

UseourdistributedversionofImageJinthecloudtoanalyzethousandsofremotedatasetsusingyourown,ours,orcommunityprovidedprocessingroutines

CustomAnalysisSolutions

Custom-tailoredsoftwaretosolveyourproblems

OneStopShop

Measurement,analysis,andstatisticalanalysis

Education/TrainingConsulting

Adviceonimagingtechniques,analysispossibilities

Developmentofnewanalysistoolsandworkflows

Education

WorkshopsonImageAnalysis

Courses/Training

QuantitativeBigImaging

Page 35: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

AcknowledgementsAITatPSIandScientificComputeratETH

TOMCATGroup

Weareinterestedinpartnershipsandcollaborations

Learnmoreat4Quant:FromImagestoStatistics-

X-RayImagingGroupatETHZurich-

http://www.4quant.com

http://bit.ly/1gD8wKb

QuantitativeBigImagingCourseatETHZurich

Page 36: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

FeatureVectorsApairingbetweenspatialinformation(position)andsomeotherkindofinformation(value).

Weareusedtoseeingimagesinagridformatwherethepositionindicatestherowandcolumninthegridandtheintensity(absorption,reflection,tipdeflection,etc)isshownasadifferentcolor

→x f

Thealternativeformforthisimageisasalistofpositionsandacorrespondingvalue

x y Intensity

1 1 12

2 1 68

3 1 81

4 1 89

5 1 87

1 2 40

ThisrepresentationcanbecalledthefeaturevectorandinthiscaseitonlyhasIntensity

= ( , )I x f

Page 37: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

WhyFeatureVectorsIfweusefeaturevectorstodescribeourimage,wearenolongertoworryingabouthowtheimageswillbedisplayed,andcanfocusonthesegmentation/thresholdingproblemfromaclassificationratherthanaimage-processingstandpoint.

ExampleSowehaveanimageofacellandwewanttoidentifythemembrane(thering)fromthenucleus(thepointinthemiddle).

Asimplethresholddoesn'tworkbecauseweidentifythepointinthemiddleaswell.Wecouldtrytousemorphologicaltrickstogetridofthepointinthemiddle,orwecouldbettertuneoursegmentationtotheringstructure.

Page 38: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

AddinganewfeatureInthiscaseweaddaverysimplefeaturetotheimage,thedistancefromthecenteroftheimage(distance).

x y Intensity Distance

-10 -10 0.9350683 14.14214

-10 -9 0.7957197 13.45362

-10 -8 0.6045178 12.80625

-10 -7 0.3876575 12.20656

-10 -6 0.1692429 11.66190

Wenowhaveamorecomplicatedimage,whichwecan'taseasilyvisualize,butwecanincorporatethesetwopiecesofinformationtogether.

Page 39: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

ApplyingtwocriteriaNowinsteadoftryingtofindtheintensityforthering,wecancombinedensityanddistancetoidentifyit

if f (5 < Distance < 10&0.5 < Intensity > 1.0)

Page 40: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

CommonFeaturesThedistancewhileillustrativeisnotacommonlyusedfeatures,morecommonvariousfiltersappliedtotheimage

GaussianFilter(informationonthevaluesofthesurroundingpixels)

Sobel/CannyEdgeDetection(informationonedgesinthevicinity)

Entroy(informationonvariabilityinvicinity)

x y Intensity Sobel Gaussian

1 1 0.94 0.32 0.53

1 10 0.48 0.50 0.45

1 11 0.50 0.50 0.46

1 12 0.48 0.64 0.46

1 13 0.43 0.78 0.45

1 14 0.33 0.94 0.42

Page 41: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

AnalyzingthefeaturevectorThedistributionsofthefeaturesappearverydifferentandcanthuslikelybeusedforidentifyingdifferentpartsoftheimages.

Combinethiswithouraprioriinformation(calledsupervisedanalysis)

Page 42: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

UsingMachineLearningNowthattheimagesarestoredasfeaturevectors,theycanbeeasilyanalyzedwithstandardMachineLearningtools.Itisalsomucheasiertocombinewithtraininginformation.

x y Absorb Scatter Training

700 4 0.3706262 0.9683849 0.0100140

704 4 0.3694059 0.9648784 0.0100140

692 8 0.3706371 0.9047878 0.0183156

696 8 0.3712537 0.9341989 0.0334994

700 8 0.3666887 0.9826912 0.0453049

704 8 0.3686623 0.8728824 0.0453049

WanttopredictTrainingfromx,y,Absorb, and Scatter MLLib:LogisticRegression,RandomForest,K-NearestNeighbors,…

Page 43: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

BeyondImageProcessingFormanydatasetsprocessing,segmentation,andmorphologicalanalysisisalltheinformationneededtobeextracted.Formanysystemslikebonetissue,cellulartissues,cellularmaterialsandmanyothers,thestructureisjustthebeginningandthemostinterestingresultscomefromtheapplicationtophysical,chemical,orbiologicalrulesinsideofthesestructures.

= m∑j

F ij xi

Suchsystemscanbeeasilyrepresentedbyagraph,andanalyzedusingGraphXinadistributed,faulttolerantmanner.

Page 44: Interactive Scientific Image Analysis using Spark

SUMMIT EAST

HadoopFilesystem(HDFSnotHDF5)Bottleneckisfilesystemconnection,manynodes(10+)readinginparallelbringsevenGPFS-basedinfinibandsystemtoacrawl

OneofthecentraltenantsofMapReduce™isdata-centriccomputation insteadofdatatocomputation,movethecomputationtothedata.

Usefastlocalstorageforstoringeverythingredundantly lesstransferandfault-tolerance

Largestfilesize:512yottabytes,Yahoohas14petabytefilesysteminuse