Open Source Geospatial

28

Transcript of Open Source Geospatial

Page 1: Open Source Geospatial

1

Open Source Geospatial

Open Source Geospatial

Programs, Data, and Protocols- and why they matter

Barry Rowlingson

School of Health and Medicine,

Lancaster University

Before we can get started with maps and statistics, I just want to talk about the philosophy behind the software.

Page 2: Open Source Geospatial

2

The Academic Tradition

� 395BC

� The First Academy

� This Greek guy

I work at a university much like SFU � concrete, buses, grass, trees, hills in the distance. But we can trace this all back to 395BC and this guy. Anyone recognise him? Plato. He started the first academy just outside Athens as a place for advancing learning � in contrast to other famous ancient greek guys you might have heard of such as Pythagoras who was all into secrecy and obfuscation.

And that contrast between Pythagorean secrecy and Platonic openness continues today, as we will see.

Page 3: Open Source Geospatial

3

I Think

� Cogito ergo sum� I think, therefore I do math?

� This french guy

Fast forward 2000 years or so, this guy crops up and says �Cogito ergo sum�. Which is often mistranslated, in my opinion, as �I think therefore I am� but is really better translated as �I think therefore I do sums�. Remember if you are doing a PhD then the Ph stands for Philosophy. Science was known as Natural Philosophy for a long time, and so I think all PhD candidates should have a basic wider knowledge of philosophy � even if thats just knowing all the words to the Monty Python Philosopher's Drinking Song.

Descartes here will be very important also in our sessions as the inventor of Cartesian coordinates.

Page 4: Open Source Geospatial

4

Moral Philosophy

� All about doing good things

� This English Guy

Another branch of philosophy is Moral Philosophy, or Ethics. This is all about doing good things. This is Jeremy Bentham, one of the founders of the Utilitarian movement, who thought we should try and do the greatest good to the greatest number. And being someone who works at a university, I don't have to worry about making a profit for shareholders, and the bottom line of my job description could well be 'make the world a better place'. One of the ways I work towards that is by making any useful programs I write available for anyone to use free and with only one restriction � that they can never remove the freedom of use from the software. And I also only use software with those same freedoms.

Page 5: Open Source Geospatial

5

Open and Free

� Freely available

� Visible source code

� No usage restrictions

� Bug reports welcome

� Bug fixes even more welcome

We're going to be using R for our statistics, and there's a whole bunch of reasons why. The philosophical reasons and the practical reasons, and they are hard to disentangle.

R is open source, and freely available. There are no usage restrictions, and bug reports and fixes are welcome. This all leads to high-quality high-functionality software.

Page 6: Open Source Geospatial

6

Availability

� GPL License� You can sell it...� � but you can't take the GPL license away.� So it gets given away.� Download from the internet

R is released under the GNU GPL license. I have here on this CD a copy of R, and if anyone would like to pay me 100 dollars they can have it. There's nothing in the license that says I can't do that. However, the license does stop me stopping you from then selling it for 10 dollars. Or giving it away.

So all open source software gets given away, but that's due to the freedom and viral nature of the GPL license.

Page 7: Open Source Geospatial

7

Source Code

By Open Source I mean that the underlying code that R is written in, a mixture of C, Fortran and S, is visible. This is a piece of the C code that computes the gamma distribution.

If you want to see the source code for SPSS that computes the gamma distribution IBM will let you, but then they have to kill you.

Page 8: Open Source Geospatial

8

Usage Restrictions

� There are none.

� Commercial, Academic, Military.

� No warranty!

The GPL is not a license on usage, its a license on distribution and copying. R has no usage restrictions so whatever application you have in mind, from saving sea otters to weapons of mass destruction, its all yours.

Not only can you run it on any application, you are also free to take parts of the code out, learn from it, incorporate into your own applications, as long as when you release it you don't impose any restrictions on the people who get it. That's the viral nature of the GPL.

But as a consequence, there's no warranty.

Page 9: Open Source Geospatial

9

Support

� Bug reports by email

� Bug fixes welcome by the team

� Free support via the email lists

� Paid support if you need it from commercial companies or individuals.

That makes people worry about support in open source code. But you don't have to.

There is a large community of R users, and this applies for most open source software that is any good � judge a project by its community. Bug reports are taken seriously and followed up, and if you can fix it � because you can see the source code you might be able to work it out � they will love you.

There's a lot of help on the R mailing lists with user problems, as well as helpful web sites.

But if you need something done you can pay a consulting individual or company. And because the software is open, there's a big market of support and not just the guys who make the program. Win.

Page 10: Open Source Geospatial

10

Here's an example slide we like to use to show off how open source programs get fixed. This is a transcript of the IRC chat log of the Quantum GIS package. At the top, at 12:13, Jason asks if there's a problem with map layers moving around. One of the developers is on, and asks for some confirmation. At 12:18, Jason gives a precise statement of the problem. At 12:32 the developer has checked in a fix to the source code repository. At 12:55 Jason has got the latest version and confirms the bug is fixed. It took about 40 minutes.

In contrast a commercial GIS company boasts of its 3-month upgrade cycle for bug fixes.

Page 11: Open Source Geospatial

11

R > S

� S language developed at Bell Labs

� Licensed to StatSci � produced Splus

� Re-implemented as Open Source R

� Caught on in a big way

� Thousands of add-on packages

The S language was developed by researchers at Bell Labs in the US, and after a few years it was licensed to a company called StatSci, and they added on some bells and whistles and released it as Splus. This was commercial closed-source software which was quite expensive and controlled by a license manager server which would often crash and stop anyone working.

This annoyed some guys in New Zealand enough that they decided to rewrite it themselves from scratch.

And they released their work as open source, lots of other people got involved, and now we have thousands of add-on packages and hundreds of developers, and possibly hundreds of thousands of users.

Page 12: Open Source Geospatial

12

I can't read my Splus Data!

� We stopped paying for Splus

� I can't read my Splus data files!

� Uh oh.

� This can't happen with Open Source

So eventually we stopped paying for Splus, and we stopped using it and now we just use R.

But then I try and go back and read my old Splus data files. Uh oh. I can't read them. I don't know how the data is stored in there. I'm stuck, at least until I can find someone with Splus to read them. And it better be the right version of Splus too.

This can't happen with open source software because I can always get the program I used to create the data. Even old versions.

And if I can't make the old versions run, well, I can always pay someone to work it out.

But in general, having a public specification of a data file structure is a good thing.

Page 13: Open Source Geospatial

13

Open Data

� So we need open specifications for data.

� For R, the program is the specification!

� There exists ISO specifications for data file types.

Now every so often on the R mailing list someone will ask where is the specification of the R dot-Data file structure. And the usual answer is that the spec is the code. Nobody has really bothered to document it partly because you can always read the data with R anyway.

But in many other cases there are public open specifications of data file types, some are recognised by the ISO, and some have the backing of other standards bodies.

Page 14: Open Source Geospatial

14

OGC

The relevant body for geographic data is the OGC � the Open Geospatial Consortium. They publish standards documents for spatial data which are open and free for all to read and re-use, and the standards are produced through an open process so you can all see the revisions.

Page 15: Open Source Geospatial

And it's not some tiny organisation. Big companies are putting real money into the OGC, from major software houses, defence groups, and even US homeland security.

So the OGC is handling a lot of the formal business of open geospatial programs and data, there's also a separate organisation set up to promote the development of programs and libraries that use these standards, and that's the OSGeo...

Page 16: Open Source Geospatial

16

OSGeo

The Open Source Geospatial Foundation provide a central service for high-quality geospatial software, including desktop mapping, spatial database, web mapping systems, low-level library code etc.

Page 17: Open Source Geospatial

17

OSGeo Projects

Web Mapping

* deegree * geomajas * GeoServer * Mapbender * MapBuilder * MapFish * MapGuide Open Source * MapServer * OpenLayers

Desktop Applications

* GRASS GIS * OSSIM * Quantum GIS * gvSIG

Geospatial Libraries

* FDO * GDAL/OGR * GEOS * GeoTools * MetaCRS * PostGIS

Metadata Catalog

* GeoNetwork

Other Projects

* Public Geospatial Data * Education and Curriculum

Here's the current list of Osgeo projects. The ones in red are things I'm going to talk about and some of them we will be using during the course.

I think the important thing these days is having a good open foundation, because that means you can build systems that interact via standards without human intervention of much kind. And now that much data is available on the web and people are pulling data from many sources, its important to be able to get data together from many servers. To do this we need open protocol standards.

Page 18: Open Source Geospatial

So there are some OGC standards for getting various types of map data from servers � this means you can build applications that take their data from many sources and integrate them in one place.

These applications could be fancy web pages with multiple map layers, desktop applications that integrate local data with regular updated information, or even invisible server-based applications that do some analysis and produce regular emailed reports. All good stuff.

Page 19: Open Source Geospatial

19

Workflow

� Takes more than one tool to fix a Land Rover

So now I want to go through a little mini-application of how all these open-source tools work together, and in one of the workshop sessions we'll do something similar.

When you fix a car, or at least an old car like mine that doesn't just require the mechanic to plug a computer into it, you need a bunch of tools, and you need to know how to use them all.

Page 20: Open Source Geospatial

20

First look - GIS

So if someone gives you some data points, the first thing you might do is load them up into your favourite mapping package � here's some cases of food poisoning in southern england. I load them in, I add some background data for context, voila. I might also do a bit of simple styling to see what's going on � here I've coloured males and females separately.

But if I want to do some analysis, well, I better load the data into R...

Page 21: Open Source Geospatial

And I can do that because qgis and R both support the OGC standards for spatial data. Simple. Here I've reproduced the map from the GIS in R, with the same colouring, but you can see I've also done a histogram of the data � something hard to do in a GIS.

So I can then run my complex statistical analysis on the data...

Page 22: Open Source Geospatial

And my output is an map of the probability of the disease rate going above a certain ratio of the usual rate. You can see there's a hotspot in the middle and a few other high areas. The rest of the region is just noisy low values.

That's great, but maybe we want to get this back into our GIS.

And because everything is supporting OGC standards, that's easy. Export, Import...

Page 23: Open Source Geospatial

23

Results � in GIS

Here's the output map overlaid on my county data in Qgis � and I can do all the fancy mapping stuff on it here with scales and north arrows and produce pretty pictures for my publication.

But what if I want to disseminate the results publicy and more widely? In this case I want to put it all on the web.

And for that I'm going to use OpenLayers, a web mapping system.

Page 24: Open Source Geospatial

24

Results � on the web

That lets me create a web page like this � with the cases and the risk map embedded in a web page. Now here this is the polished, finished system with a calendar for historical data and so on, but a simple interactive map like this can be done in OpenLayers very easily, and we'll do that in the workshop session.

Page 25: Open Source Geospatial

25

Interactions

And these maps are interactive, with multiple layers and zoom and pan and everything you'd expect from a web map.

Page 26: Open Source Geospatial

26

Zoom

And there's popup info boxes and permalinks and scale bars. And it's all FREE and Open Source.

There's a lot of 'first-generation' web based maps on the web, and if you ever find a web page with a clunky, slow poorly designed map on it, you can be fairly sure its not using OpenLayers.

Now there's another step you could do, and that is to save the data on the web so that it can be read with Google Earth:

Page 27: Open Source Geospatial

27

Results on Google Earth

http://benwyss.files.wordpress.com/2010/07/screen-shot-3.png

Now google earth is lovely, but although it is free in terms of cost it isn't free in terms of freedom, so there's a license agreement and google could start charging for it � in fact they already do charge for the Enterprise version. So one day you might not have your lovely spinning earth data.

Note this is someone else's data, I haven't quite got round to getting my data into Google earth format, but we may get round to that later...

Page 28: Open Source Geospatial

Break time