Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. •...

Post on 18-May-2018

250 views 3 download

Transcript of Data visualization in Python · Eurostat Interactive maps • All maps can be made interactive. •...

Eurostat

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Data visualization in Python

Martijn Tennekes

Eurostat

Outline

• Overview data visualization in Python

• ggplot2

• tmap

• tabplot

2

Eurostat

Which packages/functions

• Standard charts (e.g. line chart, bar chart, scatter plot):

• Matplotlib

• Pandas

• Seaborn

• ggplot

• Altair

• Thematic maps

• Folium

• Other visualisations

• Bokeh (interactive plots)

3

Eurostat

ggplot

• Based on one of the most popular R package (ggplot2) for academic publications

• Based on the Grammar of Graphics (Wilkinson, 2005)

• Charts are build up according to this grammar:

• data• mapping / aestetics• geoms• stats • scales• coord• Facets

• Pandas DataFrames are used natively in ggplot.

4

Eurostat

ggplot and qplot

Shortcut function: qplot (quick plot):

5

ggplot(mpg, aes(x = displ, y = cty) ) + geom_point()

qplot(diamonds.carat, diamonds.price)

Data: DataFrame.

Aestatics: x, y, color, fill, shapeGeometry: points

Stacking of layersand transformationswith +

Eurostat

Aesthetics

6

ggplot(aes(x='carat', y='price', color='clarity'), diamonds) + geom_point()

Mapping of data to visual attributes of geometric objects:

– Position: x,y– Color: color– Shape: shape

Eurostat

7

Aesthetics

Mapping of data to visual attributes of geometric objects:

– Position: x,y– Color: color– Shape: shape

ggplot(aes(x='carat', y='price', shape="cut"), diamonds) + geom_point()

Eurostat

Geom

8

ggplot(mpg, aes(x = displ, y = cty)) + geom_point() + geom_line()

• Geometric objects:

• Points, lines, polygons, …

• Functions start with “geom_”

• Also margins:

• geom_errorbar(), geom_pointrange(), geom_linerange().

• Note: they require the aesthetics ymin and ymax.

Eurostat

Stat

• stat_smooth() and stat_density() enable statistical transformation

• Most geoms have default stat (and the other way round)

• geom and stat form a layer

• One or more layers form a plot

9

Eurostat

Scales (and axes)

• A scale indicates how the value of a variable scales with an aesthetic

• Therefore:• A scale belongs to one aesthetic (x, y, color, fill, etc.)

• The axis is an essential part of a scale

• With scale_XXX, the scales and axes can be adjusted (XXX stands for the a combination of aesthetic and type of scale, e.g. scale_fill_gradient)

10

Eurostat

Coord

• A chart is drawn in a coordinate system. This can be transformed.

• A pie chart has a polar coordinate system.

11

df = pd.DataFrame({"x": np.arange(100)}) df['y'] = df.x * 10 # polar coordsp = ggplot(df, aes(x='x', y='y')) + geom_point() + coord_polar() print(p)

Eurostat

Facets

• With facets, small multiples are created.

• Each facet shows a subset of the data.

12

ggplot(diamonds, aes(x='price')) + \geom_histogram() + \facet_grid("cut")

Eurostat

Facets example

13

ggplot(chopsticks, aes(x='chopstick_length', y='food_pinching_effeciency')) + \geom_point() + \geom_line() + \scale_x_continuous(breaks=[150, 250, 350]) + \facet_wrap("individual")

Eurostat

14

Facets example 2

ggplot(diamonds, aes(x="carat", y="price", color="color", shape="cut")) + geom_point() + facet_wrap("clarity")

Eurostat

ggplot tips

• You can annotate plots

• Assign a plot to a variable, for instance g:

• The function save saves the plot to the desired format:

15

ggplot(mtcars, aes(x='mpg')) + geom_histogram() + \xlab("Miles per Gallon") + ylab("# of Cars")

g = ggplot(mpg, aes(x = displ, y = cty)) + geom_point()

g.save(“myimage.png”)

Eurostat

Folium: Thematic maps

• A thematic map is a visualization where statistical information with a spatial component is shown.

• Other libraries are: Basemap, Cartopy, Iris

• Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library.

• Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

16

Eurostat

Folium features

• Built-in tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen

• Supports custom tilesets with Mapbox or Cloudmade API keys.

• Supports GeoJSON and TopoJSON overlays,

• as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.

17

Eurostat

Basic Maps

18folium.Map(location=[50.89, 5.99], zoom_start=14)

Eurostat

Basic maps

19folium.Map(location=[50.89, 5.99], zoom_start=14, tiles='Stamen Toner')

Eurostat

GeoJSON/TopoJSON Overlays

20

ice_map = folium.Map(location=[-59, -11], tiles='Mapbox Bright', zoom_start=2)ice_map.geo_json(geo_path=geo_path)ice_map.geo_json(geo_path=topo_path, topojson='objects.antarctic_ice_shelf')ice_map.create_map(path='ice_map.html')

Eurostat

Choropleth maps

21

map = folium.Map(location=[48, -102], zoom_start=3)map.choropleth(geo_path=state_geo, data=state_data,

columns=['State', 'Unemployment'], key_on='feature.id', fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, legend_name='Unemployment Rate (%)')

Eurostat

The Grammar of Graphics

22

Defaults• Data• Aesthetics

Coordinates

Scales

Layers• Data• Aesthetics• Geometry• Statistics• Position

Facets

Shape• Coordinates and topology.

Spatial types:◊ Polygons• Points⁄ Lines# Raster

• Data• Map projection• Bounding box

Layers• Aesthetics• Statistics• Scale

ggplot2Layered Grammar of Graphics

Facets

Group

1

1 or more

tmapLayered Grammar of Thematic Maps

Eurostat

Creating a choropleth

23

tm_shape(NLD_muni,projection=“rd”) +

tm_fill()

Eurostat

Creating a choropleth (2)

24

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“blue”)

Eurostat

Creating a choropleth (3)

25

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”)

Eurostat

Creating a choropleth (4)

26

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

Eurostat

Creating a choropleth (5)

27

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

Eurostat

Creating a choropleth (6)

28

tm_borders(lwd=2) +

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

tm_shape(NLD_prov) +

Eurostat

Creating a choropleth (7)

29tm_text(“name”, size = .8, shadow = TRUE,bg.color = "white", bg.alpha = .25)

tm_borders(lwd = 2) +

tm_shape(NLD_muni,projection=“rd”) +

tm_fill(“population”,convert2density=TRUE,style=“kmeans”,title=“Population per km2”) +

tm_borders(alpha = .5) +

tm_shape(NLD_prov) +

Eurostat

30

Creating a choropleth with qplot

qtm(NLD_muni) qtm(NLD_muni,fill="population",convert2density=TRUE)

qtm(NLD_muni,fill="population",convert2density=TRUE,fill.style="kmeans",fill.title="Population per km2") +

qtm(NLD_prov, fill=NULL,text="name", text.size=.7,borders.lwd=2,text.bg.color="white",text.bg.alpha=.25, shadow=TRUE)

Eurostat

31

Example: choropleth

tm_shape(World) +tm_fill("income_grp", palette="-Blues",title="Income classification") +

tm_borders() +tm_text("iso_a3", size="AREA") +

tm_format_World()

Eurostat

Example: bubble map

32

tm_shape(World) +tm_fill("grey70") +

tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,

style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),

palette="-RdYlBu",title.size="Metro population",

title.col="Growth rate (%)") + tm_format_World()

Eurostat

Example: choropleth + bubble map

33

tm_shape(World) +tm_fill("income_grp", palette="-Blues",

contrast = .5,title="Income class",) +

tm_borders() + tm_text("iso_a3", size="AREA") +tm_shape(metro) +tm_bubbles("X2010", col = "growth",border.col = "black", border.alpha = .5,

style="fixed", breaks=c(-Inf, 0, 2, 4, 6, Inf),

palette="-RdYlBu",title.size="Metro population",

title.col="Growth rate (%)") + tm_format_World(bg.color = “gray80”)

Eurostat

Example: raster map

34

pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global

Land Cover") +tm_shape(World) +tm_borders() +

tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,

legend.frame="gray50", legend.width=.2)

Eurostat

Example: raster map (with dotmap)

35

pal8 <- c("#33A02C", "#B2DF8A", "#FDBF6F", "#1F78B4", "#999999", "#E31A1C", "#E6E6E6", "#A6CEE3")tm_shape(land, ylim = c(-88,88)) +tm_raster("cover_cls", palette = pal8, title="Global

Land Cover") +tm_shape(World) +tm_borders() +

qtm(metro, dot.color=“E31A1C”) +tm_format_World(legend.bg.color = "white", legend.bg.alpha=.2,

legend.frame="gray50", legend.width=.2)

Eurostat

Example: classic map

36

... + style_classic()

Eurostat

Small multiples

37

tm_shape(NLD_muni) +

tm_polygons("population",style="kmeans",convert2density = TRUE) +

tm_facets(by="province",free.coords=TRUE,drop.shapes=TRUE) +

tm_layout(legend.show = FALSE,outer.margins=0)

Eurostat

38

OpenStreetMap layer

osm_NLD <- read_osm(NLD_muni)

qtm(osm_NLD) +tm_shape(NLD_muni) +tm_polygons("population", convert2density=TRUE,style="kmeans", alpha=.7, palette="Reds")

Eurostat

Interactive maps

• All maps can be made interactive.

• tmap contains two modes:

plot: static maps, shown in graphics device window; can be exported to png, jpg, pdf, etc.

view: interactive maps, shown in the viewing window or in the browser; can be exported to standalone HTML files

39

# switch to plot mode:tmap_mode(“plot”)

# switch to view mode:tmap_mode(“view”)

# toggle between modes:ttm()

Eurostat

Some convenient functions

40

Read ESRI shape file:

Append data:

Set map projection:

Crop shapes:

Create animation

Save to image

NLD_muni <- read_shape(“NLD_2014_muni.shp”)

NLD_muni <- set_projection(NLD_muni, “longlat”)

NLD_muni <- append_data(NLD_muni, NLD_data,key.shp=“code”, key.data=“muni_code”)

NLD_twitter <- crop(twitter, NLD_muni)

tm_twitter <- tm_shape(NLD_muni) + tm_polygons() + tm_shape(NLD_twitter) + tm_dots()

save_tmap(tm_twitter, filename = “twitter.png”, width = 600, height = 800)

animation_tmap(...)

Eurostat

Animation

41

Day Time Population per municipality based on mobile phone network data

tm_dtp <-tm_shape(dtp) +

tm_polygons(paste0(“dtp”,0:23), ...) +tm_shape(NLD_prov) +

tm_borders(lwd = 2)tm_credits(“...”) +tm_facets(ncol=1, nrow=1)

animation_tmap(tm_dtp, filename = “dtp.gif”, width = 600, height = 800, delay = 40)

Eurostat

42

Tableplot

library(tabplot)

# load datalibrary(ggplot2)data(diamonds)

tableplot(diamonds)

• Tableplots can be created with the package tabplot

• It works well with very large tabular data (dozen of variables, millions of works). Internally, it makes use of the ff and ffbase packages which store data on disk rather than in memory.

• Speed is ensured by preprocessing the data.

• Standard deviations can be shown for numeric variables.

Eurostat

43

Tableplot Dutch virtual census test data from 2008

Eurostat

Summary

• R is very suitable for data visualization

• The ggplot2 package is the standard for non-spatial charts

• The tmap package is a package in the same style for spatial data visualization.

• The tabplot package can be used to visualize large tabular data.

44

Eurostat

References

• http://yhat.github.io/ggplot/

• https://folium.readthedocs.io/en/latest/

45