Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
-
Upload
adunne -
Category
Technology
-
view
3.359 -
download
0
description
Transcript of Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
![Page 1: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/1.jpg)
GeoNames is ...aggregator of free geo data
I am ...Marc Wick
self employed software engineer, Switzerland
GeoNames“Under the Hood: How GeoNames Aggregates
many Sources into One Data Set“
![Page 2: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/2.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
GeoNames Feature Density Map
![Page 3: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/3.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
GeoNames - Gazetteer
Pragmatic, useful, ease of useOver 6.5 million features Cc-by licence9 feature classes
![Page 4: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/4.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
Screen shot Berlin
![Page 5: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/5.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
Origins and Goal
Proprietary applicationTeam up togethercontribute modifications to central data base.applications switch to GeoNames from proprietary aggregation
![Page 6: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/6.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
Challenge
A lot of data IS availableMany providersLanguagesScripts
![Page 7: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/7.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
GeoNames Ambassadors
GeoNames contactSpeak local languageKnow local situation
![Page 8: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/8.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
Data Sources
National Mapping AgenciesStatistical OfficesPostal codesNational Geospatial-Intelligence Agency (NGA) Applications using GeoNames− Data files− Manual modifications
![Page 9: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/9.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
US vs Europe
US data is freely availableEuropean data is not availableRest of the World?Consequences
![Page 10: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/10.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
![Page 11: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/11.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
Future of geodata availability
We believe basic geodata will be free in most countries
Why :− Economy− Traffic Policy and Road Safety (road signs)
![Page 12: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/12.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
![Page 13: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/13.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
Free Availability is only a First Step
![Page 14: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/14.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
Who aggregates data
GeoNamesSuper national mapping agenciesSuper national organisations
INSPIRE
![Page 15: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/15.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
Problems and Solutions I
Shape / GMLDatum reprojection
FWTools/ GDAL/OGRPostgis/epsg/native tools/custom impl
![Page 16: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/16.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
Problems and Solutions II
FeatureCodes not 1:1non-ASCIICountry codesAdmin1 codes
Pattern matchingTransliteration
![Page 17: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/17.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
Place name matching
GeocodingDistancefeature type and feature codeReverse geocoding, compare name similarity− levenshtein distance− letter pair similarity
![Page 18: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/18.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
![Page 19: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/19.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
Wikipedia GeoTemplates
Proliferation of GeoFormatsNo consensus, AnarchyExamples− <geo>48 46 36 N 121 48 51 W</geo>− {{coor d|48.7767|N|121.8142|W|}}− Berlin : |lat_deg = 52|lat_min = 31− ... (Any template you could possibly think of is used somewhere)
![Page 20: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/20.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
Alternate Names
...Italian : BerlinoEnglish : BerlinArabic : نيلربKorean :���Thai : เบอรลินRussian : БерлинChinese :��Marathi : बर् लि न... (ca 100 names)
![Page 21: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/21.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
Postal codes
Geocode – postal code numeric distanceAccuracy, completeness
ScribbleMaps by Robert Kosara
![Page 22: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/22.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
![Page 23: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/23.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
![Page 24: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/24.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
Data Dump
Flat csv filesSimple formatEase of useFull daily dumpdaily modificationsrdf
![Page 25: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/25.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
Web Services
Search− Ranking
Tf idfRelevancy
− I18n
![Page 26: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/26.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
![Page 27: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/27.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
Hierarchy Web Services
HierarchyChildNeighbour Sibling
![Page 28: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/28.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
Gto
po30
SRTM
3
JDBC
Database : Postgres(postgis)
Lucene
Full Text IndexTF-IDF
Tomcat (Java)
Apache
mod rewrite
JSONjdom.org (xml) ROME (RSS)
JMSactiveMQ
![Page 29: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/29.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
Libraries
JavaDrupalRubyPhpPerlPythonLisp
![Page 30: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/30.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
Synchronization
Dail dumpDaily modificationJms
Rdf dump, periodically
![Page 31: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/31.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
Linked Data
![Page 32: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/32.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
Applications using GeoNames
thousands of applicationssearchSite navigationgeo-coding
![Page 33: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/33.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
![Page 34: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/34.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
![Page 35: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/35.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
![Page 36: Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b94ae94a7959d3078b45d0/html5/thumbnails/36.jpg)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36
Thank you for your attention.