Geographic reference analysis for geographic document querying
-
Upload
cadman-mckinney -
Category
Documents
-
view
53 -
download
3
description
Transcript of Geographic reference analysis for geographic document querying
Geographic reference analysis for geographic document querying
F.Bilhaut , T.Charnois, P.Enjalbert & Y.Mathet
{bilhaut, charnois, enjalbert, mathet}@info.unicaen.fr
GREYC, CNRS UMR 6072
University of Caen
The "GéoSem" project
• Passage extraction from geographical documents
• From a query to a ranked set of passages
• Queries are concerned with :- time
- phenomenon
- space
Excerpt from "Hérin" corpus
From 1965 to 1985, the number of high-school students has increased by
70%, but at different rythms and intensities depending on academies and
departments. Lower in South-West and Massif Central, moderate in
Brittany and Paris, the rise has been considerable in Mid-West and Alsace.
[…] Also occurs the schooling duration increase which was more important
in departments where, in the middle of the 60's, study continuation after
primary school was far from beeing systematic.
Excerpt from "Hérin" corpus
From 1965 to 1985, the number of high-school students has increased by
70%, but at different rythms and intensities depending on academies and
departments. Lower in South-West and Massif Central, moderate in
Brittany and Paris, the rise has been considerable in Mid-West and Alsace.
[…] Also occurs the schooling duration increase which was more important
in departments where, in the middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time
Excerpt from "Hérin" corpus
From 1965 to 1985, the number of high-school students has increased by
70%, but at different rythms and intensities depending on academies and
departments. Lower in South-West and Massif Central, moderate in
Brittany and Paris, the rise has been considerable in Mid-West and Alsace.
[…] Also occurs the schooling duration increase which was more important
in departments where, in the middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time Phenomenon
Excerpt from "Hérin" corpus
From 1965 to 1985, the number of high-school students has increased by
70%, but at different rythms and intensities depending on academies and
departments. Lower in South-West and Massif Central, moderate in
Brittany and Paris, the rise has been considerable in Mid-West and Alsace.
[…] Also occurs the schooling duration increase which was more important
in departments where, in the middle of the 60's, study continuation after
primary school was far from beeing systematic.
Time Phenomenon Space
Queries
• Which passages address educational difficulties in west of France in the 50's ?
• Which passages address variations of the number of pupils in rural areas ?
• Which passages address Calvados district?
Queries
• Which passages address educational difficulties in west of France in the 50's?
• Which passages address variations of the number of pupils in Paris area?
• Which passages address Calvados district?
Some Signifiant Spatial Expressions
Paris
in north of France
from south of Loire
Some seabord towns
The quarter of
The districts in north of France
Fifteen
All
Some seabord towns of Normandy
The most rural districts situated from south of Loire
The type "zone"a georeferenced area anchored in a named place
Paris
in north of France
Normandy
From Normandy to Alsace
from south of Loire
The ‘LocGeo’ type
Quant Type Zone qualification administrative Position named geo. entity
The quarter of / districts in north of France
Fifteen / All /
Some seabord towns of Normandy
The most rural districts situated from south of Loire
Some seabord towns
The canonical form:
[quantification]+[type]+[zone]
The ‘LocGeo’ type
Quant Type Zone qualification administrative Position named geo. entity
The quarter of / districts in north of France
Fifteen / All /
Some seabord towns of Normandy
The most rural districts situated from south of Loire
Some seabord towns
quant
type
zone
Semantic Representation« Paris »
zone: loc: internal
egn:
coord:
ty_zone: town
nom: Paris
Long: 5.733333
Lat: 45.633333
Semantic Representation« Some seabord towns in north of Normandy »
locgeo:
quant:
type:
zone:
type: relative
ty_zone: town
geo: seabord
nom: Normandy
ty_zone: region
loc: internal
position: north
egn:
Implementation and (first) Results
A tokenisation and a morphological analysis
A DCG to perform altogether syntactic and semantic analysis• the grammar contains 160 rules• an internal lexical base of 200 entries• a gazetteer of 100000 named places (France)
9OO expressions recognised and analysed from a geographical corpus (200 text pages)
Good results but a precise and quantitative evaluation to be done
Semantic matching : Why ?
a query
corpora
Tex
t AT
ext B
[…] the northern half of France […]
[…] the south of a Bordeaux-Genève line […]
"Which passages address Paris ?"
[…] In Paris and Toulouse […]
[…] In Ile de France region […]
1
3
2
Semantic matching : How ?
• Spatial compatibility : Is the zone denoted by the passage spatially compatible
with the one of the query? (is there, at least, an intersection?)
• Relevance degree : if this zone is compatible, how relevant is it w.r.t.the
query?- probability- granularity
Compatibility computation
• Q1) Which passages address Paris ?
• P1) […] the capital city […]
• P2) […] big cities in France.
• P3) […] the northern half of France […]
• P4) […] South of a Bordeaux-Genève line.
YES
YES
YES
NO
gazetteer
gazetteer + computation
gis+
com
puta
tion
"the northern half of France"
"the south of a Bordeaux-Genève line"
Relevance degree (1)Quantification
Query= "Calvados" (french district)
P1= "The quarter of districts in north of France"
P2= "All districts in north of France"
P3= "Some districts in north of France"
P4= "Fifteen districts in north of France"
r=25%
r=100%
r=i/n=5/52=9.6%
r=i/n=15/52=29%
GIS
GIS
1
2
4
3rank
Relevance degree (2)Granularity
"Basse Normandie"
"Calvados"
’the northern half of France’
"Caen"
countryregiondistrictcity
"zone"
locgeo(locgeo:(det:Det..type:Type..Zone)) --> #prep, det(Det), type(Type), zone(Zone).
det(Sem) --> [X],{lexique(X,[X|R],det,Sem)}.
type(X) --> typeQualif(X).type(ty_zone:N) --> nomtype(N).
typeQualif(ty_zone:N..Q) --> option, nomtype(N), #prep, qualif(Q).
nomtype(Sem) --> [X], {lexique(X,[X|R],nom,Sem)}.
zone(X)--> egn(X).
egn(egn:(ty_zone:T..nom:Y..coord:C)) --> --> ls_lexiconExtDCG(np, type_sem:egn..type_zone:T..nom:Y..coord:C ).
egn(egn:(ty_zone:T..nom:Y)) --> [X],{lexique(X,[X|R],np, type_sem:egn..type_zone:T..nom:Y)}.
lexique(quelque,[quelque],det,type_sem:relatif..type:relatif_qualifie..nb:'qualitatif:faible').
lexique(tout,[tout,le],det,type_sem:exhaustif).
lexique(région,[région],nom,type_sem:zone(administrative)..nom_zone:région).
lexique(ville,[ville],nom,type_sem:zone(administrative)..nom_zone:ville).
Lexique('Bretagne',['Bretagne'],np,type_sem:egn..type_zone:région..nom:'Bretagne').