Visualising and Linking Open Data from Multiple Sources

49
Margriet Groenendijk, PhD Developer Advocate for IBM Cloud Data Services Connecting and Visualising Open Data from Multiple Sources Data Driven Innovation Open Summit Rome - 20 May 2016 @MargrietGr

Transcript of Visualising and Linking Open Data from Multiple Sources

Page 1: Visualising and Linking Open Data from Multiple Sources

Margriet Groenendijk, PhDDeveloper Advocate for IBM Cloud Data Services

Connecting and Visualising Open Data from Multiple Sources

Data Driven Innovation Open SummitRome - 20 May 2016

@MargrietGr

Page 2: Visualising and Linking Open Data from Multiple Sources

Please Note

▪ IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

▪ Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

▪ The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

▪ The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

▪ Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

@MargrietGr

Page 3: Visualising and Linking Open Data from Multiple Sources

About me

• Developer Advocate at IBM Cloud Data Services, UK• Data scientist • Python, R, Cloudant, dashDB

• Research Fellow at University of Exeter, UK• Worked with very large observational datasets and the output of

global scale climate models

• PhD at Vrije Universiteit Amsterdam, the Netherlands• Explored large observational datasets of carbon uptake by forests

@MargrietGr

Page 4: Visualising and Linking Open Data from Multiple Sources

Outline

Page 5: Visualising and Linking Open Data from Multiple Sources

Connect and Visualise Data

@MargrietGr

But the first step - getting the data in, in a way you can use it - takes up most of the time

I have spend most of my time just doing this for the last 10 years

In March I joined IBM and I started exploring better and easier ways of data use and analysis

Page 6: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

http://geoawesomeness.com/wp-content/uploads/2015/10/GoogeMaps-vs-OSM-Geoawesomeness.jpg

• Freely available• Constantly updated by

local volunteers• Data format needs

some processing

Page 7: Visualising and Linking Open Data from Multiple Sources

Weather and Climate Data

@MargrietGr

There is a lot of it and the files are large

Binary data format of grids in different shapes and sizes

Clear understanding of where the data comes from is important. Most of it is generated by models or through interpolation of observations

Page 8: Visualising and Linking Open Data from Multiple Sources

Census Data

@MargrietGr

Demographic, economic an statistical data by country

For US also by state and city

Accessible through APIs

Page 9: Visualising and Linking Open Data from Multiple Sources

OpenStreetMap Data

Page 10: Visualising and Linking Open Data from Multiple Sources

OpenStreetMap is built by a community of mappers that contribute and maintain data about roads, trails, cafés, railway stations, and much more, all over the world

Weekly updated

But… large files that can do with some processing to make the data easily accessible

@MargrietGr

https://www.openstreetmap.org

https://www.cloudant.com

use anywhereIBM Cloudant

Page 11: Visualising and Linking Open Data from Multiple Sources

Several data sources - world, continent, country, city or a user defined box

Several data formats for which free to use conversion tools exist - pbf, osm, json, shp

Example for the Netherlands:

@MargrietGr

wget -c http://download.geofabrik.de/europe/netherlands-latest.osm.pbf

use anywhereIBM Cloudant

Page 12: Visualising and Linking Open Data from Multiple Sources

Extract the POIs with osmosis

@MargrietGr

osmosis --read-pbf netherlands-latest.osm.pbf \--tf accept-nodes \aerialway=station \aeroway=aerodrome,helipad,heliport \amenity=* craft=* emergency=* \highway=bus_stop,rest_area,services \historic=* leisure=* office=* \ public_transport=stop_position,stop_area \shop=* tourism=* \--tf reject-ways --tf reject-relations \--write-xml netherlands.nodes.osm

(easy to install with brew on Mac)

Page 13: Visualising and Linking Open Data from Multiple Sources

Some cleaning up with osmconvert

Convert from osm to json format with ogr2ogr

@MargrietGr

osmconvert $netherlands.nodes.osm --drop-ways --drop-author --drop-relations --drop-versions >$netherlands.poi.osm

ogr2ogr -f GeoJSON $netherlands.poi.json $netherlands.poi.osm points

Page 14: Visualising and Linking Open Data from Multiple Sources

Create an account on www.cloudant.com(free trial available)

Upload to Cloudant with couchimport

@MargrietGr

export COUCH_URL="https://username:[email protected]"

cat $netherlands.poi.json | couchimport --db poi-$netherlands --type json --jsonpath "features.*"

https://github.com/glynnbird/couchimport

IBM Cloudant

Page 15: Visualising and Linking Open Data from Multiple Sources

▪ Cloudant screen shot…

@MargrietGr

Page 16: Visualising and Linking Open Data from Multiple Sources

▪ Cloudant screen shot…

@MargrietGr

Page 17: Visualising and Linking Open Data from Multiple Sources

▪ Cloudant screen shot…

@MargrietGr

Page 18: Visualising and Linking Open Data from Multiple Sources

Examples from https://docs.cloudant.com/geo.htmlEasily accessible in Python notebook by with the requests package

@MargrietGr

use anywhere!IBM Cloudant

Page 19: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

use anywhereIBM Cloudant

Weekly updates

Adapt the code and automate it to run weekly

Up to date database

Page 20: Visualising and Linking Open Data from Multiple Sources

Weather and Climate Data

Page 21: Visualising and Linking Open Data from Multiple Sources

Weather and Climate Data

@MargrietGr

There is a lot of it and the files are large

Binary data format of grids in different shapes and sizes

http://www.cru.uea.ac.uk/data/

https://modelingguru.nasa.gov/docs/DOC-2312

Page 22: Visualising and Linking Open Data from Multiple Sources

https://developer.ibm.com/clouddataservices/2016/04/18/predict-temperatures-using-dashdb-python-and-r/

@MargrietGr

Weather and Climate Data

The below blog explains how to process some example data and load it into a relation database (dashDB) This data is now easily accessible

Page 23: Visualising and Linking Open Data from Multiple Sources

Load data into Python directly from dashDB(credentials are easily found in dashDB)

@MargrietGr

from ibmdpy import IdaDataBase, IdaDataFrame

jdbc = "jdbc:db2://dashdb-entry-yp-dal09-09.services.dal.bluemix.net:50000/BLUDB:user=" + username + ";password=" + password

idadb = IdaDataBase(jdbc)

Page 24: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

Average global temperature

import pandas as pd

temp = pd.read_csv("temperature.csv")

temp[0:5]

Page 25: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

From 2D to 3D matrix

import numpy as np

# Determine the size of the 3D matrixlats = np.unique(temp.latitude)lons = np.unique(temp.longitude)nt = 12ni = len(lats) nj = len(lons)

Page 26: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

From 2D to 3D matrix# Create and fill matrix by looping over the 3 dimensionstemperature = np.zeros(nt*ni*nj) temperature.shape = [nt, ni, nj] mo = -1for mon in range(1,13): mo = mo+1 la = -1 for lat in lats: la = la+1 lo = -1 for lon in lons: lo = lo+1 t = temp["temperature"][(temp["month"]==mon) & (temp["latitude"]==lat) & (temp["longitude"]==lon)] temperature[mo, la, lo] = np.array(t)

Page 27: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

import scipyimport matplotlibfrom pylab import *from mpl_toolkits.basemap import Basemap, addcyclic, shiftgrid, maskoceans

Page 28: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

import scipyimport matplotlibfrom pylab import *from mpl_toolkits.basemap import Basemap, addcyclic, shiftgrid, maskoceans

# define the area to plot and projection to usem =\Basemap(llcrnrlon=-180,llcrnrlat=-60,urcrnrlon=180,urcrnrlat=80,projection='mill')

Page 29: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

Global temperature mapimport scipyimport matplotlibfrom pylab import *from mpl_toolkits.basemap import Basemap, addcyclic, shiftgrid, maskoceans

# define the area to plot and projection to usem =\Basemap(llcrnrlon=-180,llcrnrlat=-60,urcrnrlon=180,urcrnrlat=80,projection='mill')

# covert the latitude, longitude and temperatures to raster coordinates to be plottedt1=temperature[0,:,:]t1,lon=addcyclic(t1,lons)january,longitude=shiftgrid(180.,t1,lon,start=False)x,y=np.meshgrid(longitude,lats)px,py=m(x,y)

Page 30: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

rcParams['font.size']=12rcParams['figure.figsize']=[8.0, 6.0]figure()

Page 31: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

rcParams['font.size']=12rcParams['figure.figsize']=[8.0, 6.0]figure()

palette=cm.RdYlBu_rrmin=-30.; rmax=30.ncont=20 dc=(rmax-rmin)/ncontvc=arange(rmin,rmax+dc,dc) pal_norm=matplotlib.colors.Normalize(vmin = rmin, vmax = rmax, clip = False)

Page 32: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

Global temperature maprcParams['font.size']=12rcParams['figure.figsize']=[8.0, 6.0]figure()

palette=cm.RdYlBu_rrmin=-30.; rmax=30.ncont=20 dc=(rmax-rmin)/ncontvc=arange(rmin,rmax+dc,dc) pal_norm=matplotlib.colors.Normalize(vmin = rmin, vmax = rmax, clip = False)

m.drawcoastlines(linewidth=0.5)m.drawmapboundary(fill_color=(1.0,1.0,1.0))cf=m.pcolormesh(px, py, january, cmap = palette)cbar=colorbar(cf,orientation='horizontal', shrink=0.95)cbar.set_label('Mean Temperature in January')

tight_layout()

show()

Page 33: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

Page 34: Visualising and Linking Open Data from Multiple Sources

UN Census datahttps://console.ng.bluemix.net/data/exchange

Page 35: Visualising and Linking Open Data from Multiple Sources

Census Data

@MargrietGr

Demographic, economic an statistical data by country

For US also by state and city

Accessible through APIs

Page 36: Visualising and Linking Open Data from Multiple Sources

36

Page 37: Visualising and Linking Open Data from Multiple Sources

37

Page 38: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

Page 39: Visualising and Linking Open Data from Multiple Sources

39

Page 40: Visualising and Linking Open Data from Multiple Sources

40

Page 41: Visualising and Linking Open Data from Multiple Sources

41

——————————

Page 42: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

import urllib

filelink=urllib.urlopen(“https://console.ng.bluemix.net/data/exchange-api/v1/entries/889ca053a19986a4445839358a91963e/data?accessKey=xxxxxx")

popdf = pd.read_csv(filelink)

list(popdf)

['Country or Area', 'Year', 'Value', 'Value Footnotes']

Page 43: Visualising and Linking Open Data from Multiple Sources

@MargrietGr

popdf[0:10]

Page 44: Visualising and Linking Open Data from Multiple Sources

Combine and visualise

Page 45: Visualising and Linking Open Data from Multiple Sources

Combine and Visualise

▪ POI data in Cloudant▪ Weather data in dashDB▪ Census data

@MargrietGr

Page 46: Visualising and Linking Open Data from Multiple Sources

In the cloud: Data & Analytics on IBM Bluemix

@MargrietGr

Page 47: Visualising and Linking Open Data from Multiple Sources

https://www.datascientistworkbench.com

@MargrietGr

Page 48: Visualising and Linking Open Data from Multiple Sources

Key points

▪ There is lots of data freely available ▪ A lot of analysis tools are free, with examples in blogs and on Github▪ There is still lots of preparation needed before doing any analysis or visualisation▪ But this getting easier and easier

▪ API access of data▪ Data storage, analysis and visualisation in the cloud

@MargrietGr

Page 49: Visualising and Linking Open Data from Multiple Sources

https://github.com/MargrietGroenendijk/notebooks

Thank you!

@MargrietGr

Margriet GroenendijkDeveloper Advocate for IBM Cloud Data Services