Motivation
Making the task of finding a parking spot smooth and easy using real time parking sensor data
SF Parking Data
Total of 952 Parking Spots (15 garages and 937 street parking) data are ingested every 2 seconds.
Data throughput ~15 GB/day
Can be extended to handle huge loads for multiple cities
Cluster Setup- ec2 - m4.large - 4 instances
● Hadoop - 1 Namenode and 3 Worker nodes
● Spark - 1 Master and 3 Slaves
● Kafka - 4 brokers
● Cassandra - 4 Nodes
● Elasticsearch - 4 Nodes
● Zookeeper - 4 Nodes
Pipeline
SF ParkFirebase
Real time Ingestion
Storage for Batch Batch Processing
Stream Processing
Time series aggregate
Analytics Dashboard
Geo-Spatial Query
User GPS
Challenges
● Spark partitioning RDD for distributed computing.
● Writing data from Kafka to HDFS - Camus vs. custom script.
● Elasticsearch partial document update.
● Spark to Cassandra - PySpark-Cassandra driver.
About meSuhas - CS Grad @UIUC
www.thinkjs.ioBackground:
Full Stack Web Development
Passionate to learn Big-data technologies.
Future plan is to Contribute to Open Source.
Hobbies: Long drives
Top Related