Apache Spark: The Analytics Operating System by Anjul Bhambhri
-
Upload
spark-summit -
Category
Data & Analytics
-
view
2.030 -
download
0
Transcript of Apache Spark: The Analytics Operating System by Anjul Bhambhri
![Page 1: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/1.jpg)
Apache Spark: The Analytics Operating System
Anjul BhambhriIBM Vice President, Big Data
![Page 2: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/2.jpg)
IBM Invests in Reinventing Computing
Linux, 199913,000,000 lines of code.500+ Server SolutionsUshered in Computer Science
System 360, 196410,000,000 lines of code.54 Peripheral SolutionsUshered in Information Science
Apache Spark, 2015400,000 lines of code.15+ Data & Analytics SolutionsUshered in Data Science
![Page 3: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/3.jpg)
The Analytics Operating System
1 platform
Apache Spark
![Page 4: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/4.jpg)
IBM | Spark
expressive-ness speed
any data:on disk,
or on the wire
(almost) any application unified model ->
high productivity
unparalleled performance
Why Spark?
![Page 5: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/5.jpg)
Enhance it! Offer it!
Leverage it!
Spark Technology Center @ SF
Shipping with BigInsights /Spark as a
Service
Inside our products
At IBM, We Love Spark!
![Page 6: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/6.jpg)
IBM is Building on Apache Spark
• IBM Analytics• IBM Commerce• IBM Watson• IBM Research• IBM Cloud
![Page 7: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/7.jpg)
Spark for scalable financial reporting Financial data lakes are growing• Regulatory requirements => data retention• 30+ years of historical data (petabytes)• 100s of business analysts• 1000s of disparate reports requested
Overnight and real-time transactions also large• Complex ledger “posting” processes
Tight timelines (2-3 hours before banks open)
Scalable “scan-sharing” engine to the rescue:• SQL-inspired “financial” DSL built on Spark• Runs common portions of queries simultaneously• Dramatically lowers cost of producing the “next” analyst request that comes along
![Page 8: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/8.jpg)
Spark maps Customer Experience “journey”• Multiple channels of customer
interaction.
• Very large data volumes that need fast processing.
• Correlating events across channels to interactions.
• Continuous classification of interactions and map the journey of the customer across channels.
• Sequence mining algorithm on Spark processes terabytes of interactions in minutes• MLLib models detect frustration in customers by length and frequency of interaction across
channels• SparkSQL and Parquet allow supporting multiple concurrent queries
PUB / SUBMQTT / WebSockets / Flume / Kafka
> > >
` ` `
JourneyDashboards
> > >
>>>
>> >
Interaction & Journey Data
<< < >> >
Voice & Text Data
![Page 9: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/9.jpg)
visit www.spark.tc for more information
IBM | Spark
IBM Spark Technology CenterSan Francisco
Growing pool of contributors
300+ inventors
Contributed SystemML
Founding member of AMPLab
Partnerships in the ecosystem
IBM has made a significant investment in Spark
![Page 10: Apache Spark: The Analytics Operating System by Anjul Bhambhri](https://reader035.fdocuments.net/reader035/viewer/2022062523/587080c01a28ab57368b6529/html5/thumbnails/10.jpg)
Power of data. Simplicity of design. Speed of innovation.
IBM Apache Spark
For Apache Spark news and innovationfrom IBM’s Spark Technology Center —