Connected hubs: an analysis of the Lufthansa network in Europe
-
Upload
sau-yee-chan -
Category
Data & Analytics
-
view
157 -
download
0
Transcript of Connected hubs: an analysis of the Lufthansa network in Europe
Connected Hubs
1 February 2017
CHAN Sau Yee & WANG Xi
Plan
● Objective
● Lufthansa Open API
● Methodology
● Data analysis
● Data visualisation
2
Objective
● To produce a map that shows the location of airports in Europe and the
direct flights in-between
○ What we need...
■ list of European airports
■ direct flights between any two airports
● To analyse the importance of airports based on the number of
connections needed
○ What we need...
■ list of European airports
■ number of direct flights for each airport
■ rank of airports by number of destinations
3
Definitions
● Case study in Europe (defined as EU28 including Britain, plus Switzerland
and Norway)
● Connection: the smallest number of transfers needed to travel between
two airports
○ Connection = 0: direct flight (without transfer)
○ Connection = 1: with 1 transfer
○ Connection > 1: with >1 transfer
4
A
B
C
D
Lufthansa Open API
● Reference data: Countries, Cities, Airports● Operations: Flight Schedules
A priori, the data are not limited to Lufthansa flights
5
Structure of data in the APIExample: Berlin-Tegel airport TXL in XML
6
“Airport”, “RailwayStation” or "BusStation"
Data model
countries
countryCode
zoneCode
countryName
cities
cityCode
countryCode
cityName
lat
lon
airports
airportCode
airportName
cityCode
countryCode
lat
lon
flights
origin
des
httpcode
date
7
primary keyforeign key
Methodology3 MOOCs on Coursera (University of Michigan)
- “Python Data Structures”- “Using Python to Access Web Data”- “Using Databases with Python”
2 books on Python:
- “Thinking Python” - Allen B. Downey- “Python For everybody” - Charles Severance
8
MethodologyCharles Severance “Python for everybody”, Chap.16
9
MethodologyCharles Severance “Python for everybody”, Chap.16
10
How to GET data
Step 1: Acquire all reference data on Countries, Cities and Airports...
Problem: 1,261 airports in total
→ get all records in several loops by altering the value of offset
number of records returnedMaximum is 100!
11
How to GET data (2)Step 2: Information on all flights between European airports over a week (2017/01/20-2017/01/26)
Obtain a list of European airports by SQL
→ 2 loops to create all possible pairs
220 x 220 = 48400 pairs = 3 h of execution per day!
12
need to always include origin, destination and date in request
Authorisation : OAuth 2Token acquisition before requests can be sent
13
Rate Limit● 5 request / seconde● 1,000 → 10,000 requests / hour● Decorator “RateLimited”
Error is thrown when
limits are exceeded
14
Inserting data into our database
import sqlite3
→CREATE TABLE if not exists,
INSERT INTO ____ VALUES...
15
Analysis and visualisations
16
The most important airports around the world,According to Lufthansa
17
European hubs of LH
18
LH flights in Europe
19
Data analysis in SQL
- 5 airports as origine with the greatest number of direct connections
- Data over a week
- Net flights per day
- Frequency by week
- 5 hubs based on Lufthansa BD
20
Data analysis in SQL (2)
- 5 airports as destination with the greatest number of direct connections
21
Data analysis in SQL (3)- Airports in France as origin in Lufthansa DB
22
Data analysis in SQL (4)
- 5 airports as origin with least direct connections
23
Data analysis in SQL (5)
- Connections of 5 hubs as origin in Lufthansa DB
24
Airport Connection = 0 Connection = 1 Connections > 1
Frankfurt (FRA) 91 105 24
Munich (MUC) 92 103 25
Vienna (VIE) 58 132 30
Zurich (ZRH) 51 132 37
Brussel (BRU) 48 117 55
Data analysis in SQL (6)Airports with Connections(= 1) from Frankfurt, sorted by country
25
Visualisation: Force-directed graph in D3.js
● Physical model: forces of attraction and repulsion● Algorithm defined in D3.js (JavaScript), a popular
package for data visualisation
Drawings obtained with force-directed algorithms
Source: https://cs.brown.edu/~rt/gdhandbook/chapters/force-directed.pdf
26
Force-directed graph (1)
27
Force-directed graph (2)Node central : Francfort
28
Tree
29
Limitations and perspectives
- Limitations- Quality of data
- Exclusivity of data
- Perspectives- A map that shows the frequency of service between airports
- Country profile: domestic VS local flights
- Airlines: legacy VS budget
30
Thank you!
31