Geographic Information Retrieval From Disparate Data Sources
-
Upload
ian-turton -
Category
Technology
-
view
1.914 -
download
1
Transcript of Geographic Information Retrieval From Disparate Data Sources
![Page 1: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/1.jpg)
Geographic Information Retrieval from Disparate Data SourcesIan Turton, Anuj Jaiswal, Mark Gahegan
GeoVISTA Center, School of Geography, Pennsylvania State University
ijt1,arj135,[email protected]
![Page 2: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/2.jpg)
Summary
Information Retrieval? Geographic? Disparate Data Sources? Does it work? Semantics and Ontologies, do they help? Further work? Conclusions
![Page 3: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/3.jpg)
Information Retrieval
Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web.
Wikipedia
![Page 4: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/4.jpg)
OR more simply
Is there some way I can avoid reading all 19,000 of those articles about measles and still sound like I know what I’m talking about at the next conference?
![Page 5: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/5.jpg)
Geography
Well we all know that geography is important. Depending on who you ask more than 80% of
all information contains a geographic element.
Explicit: Has a map coordinate
Implicit: Has a place name
![Page 6: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/6.jpg)
Disparate Data Sources
Large collections of text containing implicit geographic references about Avian Flu and Measles: PubMed abstracts News Feeds (RSS) WHO incident reports
![Page 7: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/7.jpg)
Building the System
Acquire data Extract geographic information Extract semantic and ontological information Present in a form that allows easy exploration
by users.
![Page 8: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/8.jpg)
Acquire Data
First extract abstracts from PubMed http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ ((avian OR bird) AND (influenza OR flu)) OR
H5N1 Returns a structured XML file with citation
data and abstract for selected papers. Process XML into PostGIS database
![Page 9: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/9.jpg)
Extract Geographic Entities
Use FactXtractor (http://julian.mine.nu/snedemo.html)
Uses GATE to detect and extract Named Entities and Entity Relationships
Usually finds People, Places and Organizations
Returned as an OWL encoded ontology In this case we just make use of places
![Page 10: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/10.jpg)
<rdf:RDF xml:base="http://ist.psu.edu/sna/ontology#"> <owl:Class rdf:ID="Location"/><owl:Class rdf:ID="Organization"/><owl:Class rdf:ID="Person"/><owl:DatatypeProperty rdf:ID="counts"/> <Location rdf:ID="Africa"> <counts>1</counts> <mentioned_in> <_Article rdf:ID="InputString0">
</_Article> </mentioned_in> </Location> <Location rdf:ID="Asia"> <counts>1</counts> <mentioned_in rdf:resource="#InputString0"/> </Location> <Location rdf:ID="Vietnam"/> <Location rdf:ID="South_East"/> <Location rdf:ID="Europe"> <counts>1</counts> <mentioned_in rdf:resource="#InputString0"/> </Location></rdf:RDF>
![Page 11: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/11.jpg)
GeoLocation
Converting a place name into a location State College, PA -> (40.7934, -77.86) Call the GeoNames web service to carry out
a gazetteer lookup on the name.
![Page 12: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/12.jpg)
Disambiguation
Which London did you mean?
![Page 13: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/13.jpg)
Types of Ambiguity
Geo/Geo London, UK vs London, Ontario South Wales, UK vs New South Wales, Au Paris, France vs Paris, Texas
Geo/Non Geo Washington, DC vs George Washington Van, Turkey vs delivery van West Nile, Egypt vs West Nile Virus
Sort of Ambiguous avian A/Mallard/Pennsylvania/10218/84 (H5N2) influenza
virus strains
![Page 14: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/14.jpg)
Disambiguating Multiple PlacesChoose A if A is a Political Entity and B is not,Choose B if B is a Political Entity and A is not,Choose A if A is a Region and B is not,Choose B if B is a Region and A is not,Choose A if A is an Ocean and B is not,Choose B if B is an Ocean and A is not,Choose A if A is a Populated Place and B is not,Choose B if B is a Populated Place and A is not,Choose A if A's population is greater than B's,Choose B if B's population is greater than A's,Choose A if A is an Administrative Area and B is not,Choose B if B is an Administrative Area and A is not,Choose A if A is a Water Feature and B is not,Choose B if B is a Water Feature and A is not,Choose A.
![Page 15: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/15.jpg)
Solving Geo/Non Geo Ambiguity Stop word lists – hand crafted by experience Province, valley, way, hill, Children, Children's, new, cross, red,
clinic, general, côte, ii, iii, bas, pays, chem, northern region, eastern region, central region, southern region, region, off, square, census, islands, city, district, park, USA, State, Virology, Microbiology, Immunology, Medical, Science, Employee, Surveillance, Disease, Biochemistry, Prevention, for, and, mail, natl, dept, dev, agr, Rural, inst, mil, med, coll, Internal, Publ, Bur, Hosp, Jude, Childrens, Chai, yan, Virol, Dis, Div, Enter, Cent, lab, Univ, res, ist, prevent, roc, prod, Roche, vet, castle, peak, stat, garden, Atl, Anim, mar, queen, central, Director, LAT, AC-EIA, register, north, east, south, west, northern, southern, eastern, western
![Page 16: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/16.jpg)
Concept Extraction
Automatically extract keywords or tags from article abstracts by Selecting keywords which exceed a preset
frequency. Passing text through Yahoo! tagging service,
returns key phrases using latent semantic indexing.
![Page 17: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/17.jpg)
Store everything in a big database Open up PostGIS and stuff in all the data
keyed by article id. Article
Citation data – authors, title, abstract, journal, volume, issue, etc
Places Name, Country, Latitude, Longitude, etc
Concepts Key phrase or word
![Page 18: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/18.jpg)
Provide Intuitive Front End for Users Tag Cloud
Popularized on many web 2.0 sites such as Flickr, del.icio.us, citeUlike.org etc.
![Page 19: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/19.jpg)
Place Cloud
![Page 20: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/20.jpg)
Author Cloud
![Page 21: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/21.jpg)
Choose a tag
![Page 22: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/22.jpg)
Choose a place
![Page 23: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/23.jpg)
Select a child of the place
![Page 24: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/24.jpg)
Tag limited by place
![Page 25: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/25.jpg)
Implementation
Initially implemented as a java servlet using JDBC link to PostGIS
Reimplemented using Ruby on Rails in last week using ActiveRecord to PostGIS
In page mapping OpenLayers WMS map client to GeoServer over PostGIS.
![Page 26: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/26.jpg)
Semantics and Ontologies
Geographic ontology is provided by GeoNames semantic web service.
A query allows the look up of parent, children and nearby features for most features.
Results are cached in PostGIS database to save processing time and load on server.
![Page 27: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/27.jpg)
WordNet Ontology
![Page 28: Geographic Information Retrieval From Disparate Data Sources](https://reader036.fdocuments.net/reader036/viewer/2022062405/554e8c22b4c90526358b4afc/html5/thumbnails/28.jpg)
Conclusions
It is possible to construct a useful system to ingest arbitrary text and extract place names.
A sufficiently good automated location disambiguation system can be built for a specific domain to process 80-90% of places correctly.
Semantic expansion and narrowing of searches appears useful in early experiments.
Providing users with a familiar, and highly linked, interface seems to aid exploration of the document space.