TransformingTransport Big Data Value in Mobility and Logistics
“Real-time prediction of flight arrival times using surveillance
information” - SACBD@ECSA2018 Presentation
Madrid, September 25th 2018
David Scarlatti, Boeing R&T - Europe
TT Project Context
• EC H2020 funded Lighthouse project. +40 partners across Europe
• Demonstrator style: Focus on deployment of technology on business.
• Delivering “Pilots” in different Transport Industry Domains…
SACBD@ECSA2018 2
TT Project Context
• Smart Highways– AUSOL Load Balancing Pilot
– Load Balancing Pilot – Norte Litoral
• Sustainable Connected Vehicles– Sustainable Connected Cars Pilot
– Sustainable Connected Trucks Pilot
• Proactive Rail Infrastructures– Predictive Rail Asset Management Pilot
– Predictive High Speed Network
• Ports as Intelligent Logistics Hubs– Valencia Sea Port Pilot
– Duisport Inland Port Pilot
• Smart Airport Turnaround– Smart Passenger Flow Pilot
– Smart Turnaround
• Integrated Urban Mobility– Integrated Urban Mobility: Tampere pilot
– Valladolid Integrated Urban Mobility
• Dynamic Supply Networks
SACBD@ECSA2018 3
TT Project in Aviation
• Smart Airport Turnaround. Two Pilots:
– Smart Passenger Flow Pilot – @Athens Airport
– Smart Turnaround – @Milan Airport
SACBD@ECSA2018 4
SACBD@ECSA2018 5
Turnaround overview.Why Arrival Time matters?
Landing Taxi-in Turnaround Taxi-out Takeoff
Aircraft Process
Objective: Increase predictability of the process and airport situational awareness. As a result we expect an improvement and optimization of the turnaround.
Arrival estimation Taxi-in predictions
Boarding time predictions
Taxi-out predictions
While the Aircraft is on ground, the airline is not making money, and the airport can’t serve another aircraft.
Estimated Time of ArrivalModel/Rule Base .vs. Data Driven
• Traditional approach: You model Aircraft behaviour and context (i.e. Winds, airspace use, etc...). Then you predict, knowing where is the aircraft, what will happen till it lands. This give you the ETA.
• Data Driven: Following current approach in many fields, let the data talk... ML can discover patterns, not apparent to humans, on what influence the arrival time.
SACBD@ECSA2018 6
But, which “Data“?
• We follow an iterative approach, starting with one source, and adding in each increment (Agile approach) a new source of data.
1. First, just surveillance data (4D location) is used
2. Weather conditions to be added
3. Flight Plan data to be added
4. Other sources???
SACBD@ECSA2018 7
High Level Architecture
• Architecture match a simple lambda architecture.
SACBD@ECSA2018 8
Data Collection:
(surveillance, weather,
etc…)Historical data
wrangling and ML Training.
Real Time Predictions / Visualization
Data Collection
• Surveillance stream. We use FR24 live feed data, based mainly in ADS-B:
– Sample records
– Latitude, Longitude, Altitude, Timestamp….
SACBD@ECSA2018 9
{"1de0d34c": ["4787B2", 51.20212, 12.81526, 354, 38000, 404, "7364", "5000", "B738", "LN-NGE", 1537045185, "MLA", "OSL", "DY1851", 0, -64, "NAX51J", 1537050813], "1de1108d": ["4B1903", 47.53462, 8.49345, 333, 2600, 178, "3002", "5000", "A343", "HB-JMH", 1537045156, "ZRH", "JNB", "LX288", 0, 1152, "SWR288", 0,], "1de1108c": ["4B17FD", 47.45358, 8.5565, 129, 0, 0, "2000", "200", "BCS3", "HB-JCF", 1537045174, "ZRH", "", "", 1, 0, "SWR97Q", 0], "1de0e21b": ["4CA175", 40.33225, 22.82972, 112, 6300, 232, "4456", "200", "B738", "EI-FEF", 1537045188, "POZ", "SKG", "RR6505", 0, -704, "RYS6505", 1537045663, "1de0e4f2": ["4BBC53", 36.99613, 35.28329, 56, 0, 51, "5101", "200", "A320", "TC-OBS", 1537045190, "IST", "ADA", "8Q16", 1, 0, "OHY016", 1537045157,]…
Data Collection
• We use Apache Flume for the collection. Current Flume Flow:
SACBD@ECSA2018 10
FR24 live feed
Sourcetype = exec
selector.type = replicating
In Memory Channel
In Memory Channel
Sinktype = HDFS
path = …/date=%Y-%m-%d
Sinktype = Kafka
Batch Layer
• We use Hadoop and Hive for preparing the Training datasets:
SACBD@ECSA2018
11
End User GUI (Web)
Data Lake
SQL Alike Access Layer
Raw DataMXP
airport Data
Processed Data
ETL
Interactive Data Science
NOAAData
Training Dataset
ML model for Real-Time prediction
Speed Layer
• We use Kafka for the Data distribution and python/H2O framework for the real-time prediction.
SACBD@ECSA2018 12
Data Distribution Data Processing / Machine Learning
FR24Surveillance
Web GUI
VR
ML model for Real-Time prediction
Visualization
SACBD@ECSA2018 13
• We adapted Virtual Radar Server to show the live ETA:
Andrew Whewell. Virtual radar server open source project, 2010. http://www.virtualradarserver.co.uk/
Next iteration
Architecture-Full picture
SACBD@ECSA2018 14
Data Distribution Data Processing / Machine Learning
End User GUI (Web)
FR24Surveillance
Data Lake
SQL Alike Access Layer
Raw DataMXP
airport Data
Processed Data
NON SQLDocument DB
ECTLB2B
Flight Data
ETL
Interactive Data Science
Web GUI
NOAAData
VR
Machine Learning details
• Current version only uses surveillance derived features: latitude, longitude, altitude, speeds, timestamp...
• Dependent variable: Seconds to land
• Regression problem. Algorithm: Gradient Boost Machine (GBM)
• Training dataset: Ground truth for selected airlines. January 2016 to August 2017. +10 million observations. We take a 80 % split of this dataset for training, so that approximately 2.3 million observations are employed in the testing of the model.
SACBD@ECSA2018 15
ETA Results presentation.Box plots along time.
SACBD@ECSA2018 16
This aircraft is about 85 minutes to land:The ETA error is between -856 and +895 seconds ,i.e. if ETA is 8:30 it may land between 8:16 and 8:45
This aircraft is about 35 minutes to land:The ETA error is between -507 and 389 seconds ,i.e. if ETA is 8:30 it may land between 8:21 and 8:36
60 minutes
ML Results evaluation
• In Figure 2 we compare the errors of the predictions issued by three services 1 hour before landing for 736 flights bound for Malpensa-Milan airport on the period ranging from September 2017 to December 2017 (new data never seen by TT algorithm).
SACBD@ECSA2018 17
Future Work
• Adding new data sources:
– METAR reports as weather data
• New Flume Flow, new Hive tables for adding new features in training models, new Kafka topic with the METAR data
– Flight Plans from Eurocontrol
• Adding Mongo to Batch layer (HDFS no good for millions of small .xml files)
• Try other algorithms:
– Xtreme Gradient Boost
SACBD@ECSA2018 18
Thank You!
SACBD@ECSA2018 19
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 731932