IoT Traffic Simulation: Data Pipeline

1
At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess. IoT Traffic Simulation: Data Pipeline Anchal Agrawal and Professor Robert J. Brunner Project Overview System Architecture Web Dashboard References Laboratory for Cosmological Data Mining, University of Illinois at Urbana-Champaign [email protected] | [email protected] IoT - Internet of Things Internet of Things [1] IoT is a network of physical devices that collect and exchange data about their surroundings. Applications include home automation (e.g., light sensors) and traffic management (e.g., automatic toll collection). With increasing road traffic, it is crucial for vehicles to adapt to traffic and weather patterns so they can be routed efficiently and accidents can be prevented. Vehicles coupled with current technology such as Internet of Things (IoT) present a huge opportunity – sensors on vehicles and roads can be used to emit vehicle coordinates and weather data. This can help us understand traffic and weather patterns so vehicles may be routed efficiently and alerted if needed. The system consists of five components – the message log (Kafka), the stream processing and machine learning framework (Spark), the batch processing framework (Hadoop), the datastore (Cassandra) and a web dashboard for viewing vehicle activity. [1] Intel – A Fast, Flexible, and Scalable Path to Commercial IoT Solutions. https://software.intel.com/en-us/articles/a-fast-flexible-and-scalable-path-to-commercial-iot-solutions [2] Apache Kafka – https://kafka.apache.org [3] Apache Spark – https://spark.apache.org [4] Apache Cassandra – https://cassandra.apache.org [5] eBay – Cassandra Data Modeling Best Practices, Part 1. http://ebaytechblog.com/2012/07/16/cassandra- data-modeling-best-practices-part-1 [6] Apache Hadoop – https://hadoop.apache.org Kafka Spark Cassandra Sensor data is simulated with Python scripts and pushed to Kafka. There are two Kafka topics (queues) – one for car data and another for road data. Car data includes latitude, longitude and timestamp. Road data includes max/min temperatures, expected precipitation and timestamp. Spark consumes data from Kafka and analyzes it to make predictions based on traffic and weather patterns. Spark provides MLlib, a machine learning library. Spark Streaming is used to process live data as it comes in from the simulated car and road sensors. Cassandra is a row-oriented distributed key-value store. A Cassandra cluster consists of a ring of servers. Cassandra is suitable for time-series data because rows can be sorted by timestamp, resulting in efficient queries. Sensor data is persisted to Cassandra for future processing. Cassandra table architecture [4, 5] Spark architecture [3] Kafka architecture [2] Acknowledgments We acknowledge support from the National Science Foundation Grant No. AST-1313415, the National Center for Supercomputing Applications, the University of Illinois and Microsoft Azure. A view of the dashboard, built on top of Google Maps At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess. Hadoop Hadoop YARN architecture [6] Hadoop is a batch computation engine based on the MapReduce framework. A Hadoop cluster has a master-slave architecture and uses HDFS as the distributed file system. Hadoop is used for generating monthly traffic reports.

Transcript of IoT Traffic Simulation: Data Pipeline

Page 1: IoT Traffic Simulation: Data Pipeline

•  At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess.

IoT Traffic Simulation: Data PipelineAnchal Agrawal and Professor Rober t J. Brunner

Project Overview

System Architecture

Web Dashboard

References

Laborator y for Cosmological Data Mining, University of Il l inois at Urbana-Champaign

aagrawa4@ill inois.edu | bigdog@ill inois.edu

IoT - Internet of Things

Internet of Things [1]

IoT is a network of physical devices that collect and exchange data about their surroundings. Applications include home automation (e.g., light sensors) and traffic management (e.g., automatic toll collection).

•  With increasing road traffic, it is crucial for vehicles to adapt to traffic and weather patterns so they can be routed efficiently and accidents can be prevented.

•  Vehicles coupled with current technology such as Internet of Things (IoT) present a huge opportunity – sensors on vehicles and roads can be used to emit vehicle coordinates and weather data.

•  This can help us understand traffic and weather patterns so vehicles may be routed efficiently and alerted if needed.

The system consists of five components – the message log (Kafka), the stream processing and machine learning framework (Spark), the batch processing framework (Hadoop), the datastore (Cassandra) and a web dashboard for viewing vehicle activity.

[1] Intel – A Fast, Flexible, and Scalable Path to Commercial IoT Solutions. https://software.intel.com/en-us/articles/a-fast-flexible-and-scalable-path-to-commercial-iot-solutions[2] Apache Kafka – https://kafka.apache.org[3] Apache Spark – https://spark.apache.org[4] Apache Cassandra – https://cassandra.apache.org[5] eBay – Cassandra Data Modeling Best Practices, Part 1. http://ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1[6] Apache Hadoop – https://hadoop.apache.org

Kafka

Spark

Cassandra

•  Sensor data is simulated with Python scripts and pushed to Kafka.

•  There are two Kafka topics (queues) – one for car data and another for road data.

•  Car data includes latitude, longitude and timestamp. Road data includes max/min temperatures, expected precipitation and timestamp.

•  Spark consumes data from Kafka and analyzes it to make predictions based on traffic and weather patterns. Spark provides MLlib, a machine learning library.

•  Spark Streaming is used to process live data as it comes in from the simulated car and road sensors.

•  Cassandra is a row-oriented distributed key-value store. A Cassandra cluster consists of a ring of servers.

•  Cassandra is suitable for time-series data because rows can be sorted by timestamp, resulting in efficient queries.

•  Sensor data is persisted to Cassandra for future processing.

Cassandra table architecture [4, 5]

Spark architecture [3]

Kafka architecture [2]

AcknowledgmentsWe acknowledge support from the National Science Foundation Grant No. AST-1313415, the National Center for Supercomputing Applications, the University of Illinois and Microsoft Azure.

A view of the dashboard, built on top of Google Maps

•  At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess.

Hadoop

Hadoop YARN architecture [6]

•  Hadoop is a batch computation engine based on the MapReduce framework.

•  A Hadoop cluster has a master-slave architecture and uses HDFS as the distributed file system.

•  Hadoop is used for generating monthly traffic reports.