Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection...
Transcript of Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection...
![Page 1: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/1.jpg)
Architecting Big Data SolutionsMichael Carducci @MichaelCarducci
![Page 2: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/2.jpg)
Limitations
![Page 3: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/3.jpg)
No Prescription
![Page 4: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/4.jpg)
Focused on “mainstream" tools
![Page 5: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/5.jpg)
This talk is very high-level
![Page 6: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/6.jpg)
Ecosystem is Young
![Page 7: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/7.jpg)
Data Science is Out of Scope
![Page 8: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/8.jpg)
Agenda
• What is Big Data
• Big Data Design Considerations
• Big Data Solution Architecture
• Big Data Ecosystem
• Architecture Template
• Data Acquisition
• Persistence
• Transform/Analytical
• Presentation
• Sample Architectures
• Closing Thoughts
![Page 9: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/9.jpg)
What is Big Data
![Page 10: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/10.jpg)
–IBM Marketing Cloud, 2017
“90% of the data in the world today has been created in the last two years alone, at 2.5
quintillion bytes of data a day!”
What is Big Data
![Page 11: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/11.jpg)
What is Big Data
• Large quantity of data
• Many sources/formats
• Unstructured
• Large Processing Load (needs to be distributed)
• Streaming/real-time processing
• Big Deployment footprint
![Page 12: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/12.jpg)
Key Trends
Device Explosion
Ubiquitous Connection
Social Networks
Sensor Networks
Cheap Storage
Inexpensive Computing
23.14 Billion Connected devices
3.3 ZB by 2021 278 EB/month
Millions of new sensors go online every hour
$0.019 Avg Cost of 1GB in 2016 15.5 Million Times Cheaper
$0.03 Cost/GFLOPS in 2018
2.77 billion social media users By 2019
![Page 13: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/13.jpg)
Big Data Big Opportunities
![Page 14: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/14.jpg)
A New Way of Thinking What, not Why
![Page 15: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/15.jpg)
Big Data - Challenges
Volume Variety Velocity
![Page 16: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/16.jpg)
Designing Big Data Solutions
![Page 17: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/17.jpg)
Considerations
![Page 18: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/18.jpg)
Availability
![Page 19: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/19.jpg)
Consistency
![Page 20: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/20.jpg)
Access Patterns
![Page 21: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/21.jpg)
Real-time vs Historical
![Page 22: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/22.jpg)
How Big?
![Page 23: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/23.jpg)
Data Security
![Page 24: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/24.jpg)
Big Data Solution Longevity
• We are still on the bleeding-edge
• Tools are still young/evolving
• Don’t “predict” the future
![Page 25: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/25.jpg)
Future Proofing Your Education and Effort
• Don’t follow “trends” follow Value
• Look for high-level support
• Look for product/developer support
• Look for cloud options
• Look for adoption by leading companies
• Look for open APIs and Data Formats
![Page 26: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/26.jpg)
Design Approach:
![Page 27: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/27.jpg)
Start with the end it mind
![Page 28: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/28.jpg)
Leverage Everything Available
![Page 29: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/29.jpg)
Hadoop Ecosystem
![Page 30: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/30.jpg)
Hadoop Ecosystem
![Page 31: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/31.jpg)
Hadoop Ecosystem
![Page 32: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/32.jpg)
Hadoop Ecosystem
![Page 33: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/33.jpg)
Hadoop Ecosystem
![Page 34: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/34.jpg)
Hadoop Ecosystem
![Page 35: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/35.jpg)
Hadoop Ecosystem
![Page 36: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/36.jpg)
Hadoop Ecosystem
![Page 37: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/37.jpg)
Hadoop Ecosystem
![Page 38: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/38.jpg)
Hadoop Ecosystem
![Page 39: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/39.jpg)
Hadoop Ecosystem
![Page 40: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/40.jpg)
Hadoop Ecosystem
![Page 41: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/41.jpg)
Hadoop Ecosystem
![Page 42: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/42.jpg)
Hadoop Ecosystem
![Page 43: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/43.jpg)
Hadoop Ecosystem
![Page 44: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/44.jpg)
Hadoop Ecosystem
![Page 45: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/45.jpg)
Hadoop Ecosystem
![Page 46: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/46.jpg)
Architecture Template
Data Acquisition
Persistence
Transformation/ Analytics
Presentation
![Page 47: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/47.jpg)
Data Acquisition
![Page 48: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/48.jpg)
Data Acquisition Responsibilities
• Connect to data sources
• Data Conversion
• Filtering
• Protocol Management (handshake)
• Track Data as it moves
• Save Data to Persistence Module
• Stream Data to Processing Module
![Page 49: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/49.jpg)
Historical Data Acquisition (Store & Forward)
• Receive from a source “at rest”
• Move data in units
• Track Completion
• Retransmit if required
![Page 50: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/50.jpg)
Real-Time Data Acquisition (Streaming)
• Continuously moving data stream
• Throttle and Source
• Throttle for Persistence
• Inflight Storage
![Page 51: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/51.jpg)
Data Acquisition - What to Architect
• Identification of new data
• Re-Acquisition and Retransmission
• Data Loss Tolerance
• Source Buffering
• Security in Transit
• Privacy
• Monitoring
• Throttling
• Redundancy
• Scalability
![Page 52: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/52.jpg)
Data Acquisition - Best Practices
• Involve Source Owners
• Reliable Means for Identification of New Data
• Reliable Means for Identification of Missing Data
• APIs and Formats should be standardized as early as possible
• Consider Separate Channels for Real-Time vs Historical Data
• Pay Attention to Security and Privacy
• Use Reliable Inflight Storage
![Page 53: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/53.jpg)
Data Acquisition - Popular Technologies
• SQOOP
• Flume
• Kafka/Storm
• Spark
• NiFi
![Page 54: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/54.jpg)
SQOOP
• Command line tool for transferring data to/from relational data sources and hadoop
• Transfer entire database or the results of SQL
• Supports incremental data exchange
• Support for common file formats (Avro, Sequence, Parquet, CSV etc)
• Hive and HBase support
• Support Parallelism
• BLOB support
![Page 55: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/55.jpg)
SQOOP
• Command line tool for transferring data to/from relational data sources and hadoop
• Transfer entire database or the results of SQL
• Supports incremental data exchange
• Support for common file formats (Avro, Sequence, Parquet, CSV etc)
• Hive and HBase support
• Support Parallelism
• BLOB support
![Page 56: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/56.jpg)
Flume
• A distributed service for collecting, aggregating, and moving large amounts of log/file data
• Robust/Fault tolerant (throttling, failover, recovery etc)
• Sources can span a large number of servers
• Supports multiple sources (files, strings, http posts, etc)
• Support for multiple sinks
• Can add custom sources and sinks through code
• Supports in-flight data processing through intercepters
![Page 57: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/57.jpg)
Flume - Benefits and Challenges
• Pros
• Configuration Driven
• Massively Scalable
• Customizable through code
![Page 58: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/58.jpg)
Flume - Benefits and Challenges
• Challenges
• No Ordering Guarantees
• Duplication possible
![Page 59: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/59.jpg)
Kafka
• An open source message broker platform for real-time data feeds
• Pub/sub architecture
• Developed at LinkedIn, written in Scala
• Topics are published, multiple subscribers can be for each topic
• Ordering guarantees
• Coding required for pub/sub to interface with Kafka (not config driven)
• Replication and high Availability
• Often paired with Storm for Steam Processing
![Page 60: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/60.jpg)
Kafka - Benefits and Challenges
• Pros
• Highly scalable realtime messaging system
• Multiple subscribers
• Ordering guarantees
![Page 61: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/61.jpg)
Kafka - Benefits and Challenges
• Challenges
• Coding is required for publishers and subscribers
![Page 62: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/62.jpg)
Spark
“A fast and general engine for large-scale data processing”
![Page 63: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/63.jpg)
Spark Components
• Spark Streaming
• Spark SQL
• MLLib
• GraphX
![Page 64: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/64.jpg)
Spark - Benefits and Challenges
• Pros
• Rich Ecosystem
• Very Fast
• Supports Streaming
• Supports real-time Machine Learning model updates
![Page 65: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/65.jpg)
Spark - Benefits and Challenges
• Challenges
• In-memory processing can be expensive
• Not “true” real-time
• No file-management system
![Page 66: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/66.jpg)
NiFi
“A real-time integrated data logistics and simple event processing platform Apache NiFi automates the movement of data between disparate data sources and systems, making data ingestion fast, easy and secure.”
![Page 67: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/67.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 68: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/68.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 69: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/69.jpg)
RDBMS - Benefits and Challenges
• Pros
• Mature/Popular
• Extensive Support
• Incremental Fetches and Filtering
• Simple transformations at Source
![Page 70: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/70.jpg)
RDBMS - Benefits and Challenges
• Challenges
• Proprietary data platforms
• Cross-organizational boundary access limited by security
![Page 71: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/71.jpg)
Typically Integrated with Apache Sqoop
![Page 72: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/72.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 73: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/73.jpg)
Files - Benefits and Challenges
• Pros
• Easy way to integrate with legacy apps
• Files easily cross organizational boundaries
• Many tools and utilities available
![Page 74: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/74.jpg)
Files - Benefits and Challenges
• Challenges
• Slow
• Data Exposure
• Potential Data/Privacy Leaks
![Page 75: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/75.jpg)
Typically integrated with Flume
![Page 76: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/76.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 77: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/77.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 78: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/78.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 79: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/79.jpg)
REST - Benefits and Challenges
• Pros
• de-facto standard for data exchange over the internet
• Excellent security and scalability
• Easy to integrate
![Page 80: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/80.jpg)
REST - Benefits and Challenges
• Challenges
• Not designed for real-time data
• Stateless APIs can be “chatty”
• Often data sources impose rate limiting
![Page 81: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/81.jpg)
Typically integrated with Apache NiFi
![Page 82: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/82.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 83: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/83.jpg)
Data Acquisition - Common Sources
• RDBMS
• Logs
• Files
• Social Media
• HTTP/Rest
• Data Streams
![Page 84: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/84.jpg)
Streaming - Benefits and Challenges
• Pros
• Real-time, instantaneous data transfer
• Data streams are typically deltas
• Supported by major cloud APIs
![Page 85: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/85.jpg)
Streaming - Benefits and Challenges
• Challenges
• Data is typically lost if a connection is broken
• Might need to be supplemented by historical data
• Some providers might implement rate-limiting
![Page 86: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/86.jpg)
Typically integrated with Kafka/Storm or Spark Streaming
![Page 87: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/87.jpg)
Persistence
![Page 88: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/88.jpg)
Persistence Responsibilities
• Reliable Data Storage
• Data Access
• Scalability
• Redundancy
![Page 89: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/89.jpg)
Persistence - What to Architect
• Polyglot Storage & flexible schema
• Consistency
• Reliability
• Read-intensive vs Write Intensive
• Mutable vs immutable data
• Cataloging
• Latency
![Page 90: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/90.jpg)
Persistence - Best Practices
• Choose platform that best supports your use case
• Keep design/schema flexible
• Keep data at lowest granularity
• Summarize only where needed
• Consider real-time application needs
• Redundancy > Backups
![Page 91: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/91.jpg)
Persistence
• HDFS
• HBase
• Cassandra
• S3
• Azure
• RDBMS
![Page 92: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/92.jpg)
HDFS
![Page 93: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/93.jpg)
HDFS
![Page 94: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/94.jpg)
HDFS
![Page 95: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/95.jpg)
HBase
![Page 96: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/96.jpg)
HBase
![Page 97: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/97.jpg)
HBase
![Page 98: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/98.jpg)
Cassandra
![Page 99: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/99.jpg)
S3
![Page 100: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/100.jpg)
Azure
![Page 101: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/101.jpg)
RDBMS
![Page 102: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/102.jpg)
Transform/Analytics
![Page 103: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/103.jpg)
Transformation Responsibilities
• Build/update Machine Learning Models
• Historical processing
• Analytical querying
• Transform data into intermediate representations
• Transform data for presentation representation
![Page 104: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/104.jpg)
Historical Data Processing
• Pull data from persistence layer
• Aggregate, summarize, and analyze data
• Save to Persistence or Presentation Layer
![Page 105: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/105.jpg)
Real-Time Transformation/Analytics (Streaming)
• Update ML Models in real time
• Update stream analytics
![Page 106: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/106.jpg)
Transformation - What to Architect
• Data Access latency
• Support for input volume
• Architect for future consumers of analytical data
• Architect for anticipated access patterns
• Real-time vs batch processing
![Page 107: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/107.jpg)
Data Acquisition - Best Practices
• “What” is a question for the data scientists
• Use the best tools for the job/TIMTOOTDI
![Page 108: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/108.jpg)
Transformation - Common Tools
• Mapreduce
• Mahout
• Spark
• Pig
• Hive/Impala
![Page 109: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/109.jpg)
Mapreduce
![Page 110: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/110.jpg)
Mahaout
![Page 111: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/111.jpg)
Spark
![Page 112: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/112.jpg)
Introducing Apache Spark
“A fast and general engine for large-scale data processing”
![Page 113: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/113.jpg)
Spark Components
• Spark Streaming
• Spark SQL
• MLLib
• GraphX
![Page 114: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/114.jpg)
Pig
![Page 115: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/115.jpg)
Pig
• Abstracts Map/Reduce Complexity
• Uses PIG LATIN scripting language
• Highly extensible with UDFs
• Can perform faster than MapReduce
![Page 116: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/116.jpg)
Oozie
![Page 117: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/117.jpg)
Presentation
![Page 118: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/118.jpg)
Presentation Responsibilities
• Provide analytical results in a consumable/digestible form
• Act as a reporting data source
• Bridge OLAP/OLTP Environments
• Low-latency repository
![Page 119: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/119.jpg)
Presentation - What to Architect
• Performance
• Data Access
• Security
• Privacy
• API
• Data Refresh Frequency
• Data Lifecycle
• Availability
• Consistency
• Load
![Page 120: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/120.jpg)
Presentation - Best Practices
• Involve Consumers
• Optimize Refresh Rate for Requirements
• Choose persistence mechanism based on access patterns and needs
• Real-time vs Digest/Summary
![Page 121: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/121.jpg)
Presentation
• HBase
• RDBMS
• MongoDB
• Cassandra
• Redis
![Page 122: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/122.jpg)
HBASE
• NoSQL DB built on HDFS
• Based on BigTable
• No Language - API
• Auto Sharding
![Page 123: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/123.jpg)
HBase Data Model
• Fast access to any given row
• A row is referenced by a unique key
• Rows has a small number of column families
• A column family may contain arbitrary columns
• You can have a very large number of columns in a column family
• Each Cell can have many versions with given timestamps
• Sparse data is ok. Missing columns in a row consume no storage
![Page 124: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/124.jpg)
Example: Web Table
com.cnn.www
Contents Column Family
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//
EN" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//
EN" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//
EN"
KeyAnchor Column Family
Contents:
“CNN” “CNN.com”
Anchor cnnsi.com:
Anchor my.look.ca:
![Page 125: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/125.jpg)
Accessing HBase
• HBase shell
• API
• Wrappers in many language
• Spark, Hive, Pig
• Rest Service
• Thrift Service
• Avro Service
![Page 126: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/126.jpg)
RDBMS
• Standard/mature database Technology
• Low Latency
• Good support for additional querying/filtering/transformation
• Easy to integrate
![Page 127: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/127.jpg)
MongoDB
• NoSQL DocumentDB
• Very Fast
• Very Scalable
• Flexible Schema
• Supports Multiple Indexes
![Page 128: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/128.jpg)
Cassandra
• NoSQL
• High Availability
• Write Scalability
• CQL Query Language
![Page 129: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/129.jpg)
MongoDB vs Cassandra
![Page 130: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/130.jpg)
Redis
• Not just for caching
• Very Fast
• Distributed
• Persistent
![Page 131: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/131.jpg)
Sample Architectures
![Page 132: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/132.jpg)
Closing Thoughts
![Page 133: Architecting Big Data Solutions · 2018-07-27 · Key Trends Device Explosion Ubiquitous Connection Social Networks Sensor Networks Cheap Storage Inexpensive Computing 23.14 Billion](https://reader033.fdocuments.net/reader033/viewer/2022041923/5e6cbf6372d7b531690b14c1/html5/thumbnails/133.jpg)
Thank youMichael Carducci @MichaelCarducci