People,’places’and’events’in’ charters:’exploring’the ... ·...

22
People, places and events in charters: exploring the language of charters within ChartEx Robin SutherlandHarris University of Toronto Lynne Cahill NLTG, University of Brighton (now at University of Sussex) 4 July 2013

Transcript of People,’places’and’events’in’ charters:’exploring’the ... ·...

Page 1: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

People,  places  and  events  in  charters:  exploring  the  language  of  

charters  within  ChartEx  Robin  Sutherland-­‐Harris  University  of  Toronto  

Lynne  Cahill  NLTG,  University  of  Brighton  (now  at  University  of  Sussex)  

4  July  2013    

Page 2: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Grant  from  Walter  de  Soke  to  Richard  de  Forde  Somerset  Heritage  Centre,  DD\WHb/244      

Page 3: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Final  concord  made  in  the  Court  of  Common  Pleas  at  Westminster  before  James  Dyer,  Thomas  Meade,  Francis  Wyndam,  William  Peryam,  jusRces,  in  dispute  between  Richard  Broke  and  George  Armond,  plainRffs,  and  Robert  Jenner,  defendant,  over  lands  (specified)  in  Fordham,  Essex  

Sample  document  summary  

Summary  provided  by  the  NaRonal  Archives,  Ward2_55A_188_30  

Page 4: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Fig.  1      The  phases  of  the  Kanga  Methodology  <ce:cross-­‐ref  refid="bib0180">  [36]</ce:cross-­‐ref>  .  The  white  boxes  indicate  the  phases  performed  by  domain  experts.  The  formal  structuring  is  done  by  domain  experts  using  Rabbit,  while  the  translaRon  to  OWL  is  ...  

Ronald    Denaux  ,  Catherine    Dolbear  ,  Glen    Hart  ,  Vania    Dimitrova  ,  Anthony  G.    Cohn  Suppor1ng  domain  experts  to  construct  conceptual  ontologies:  A  holis1c  approach  Web  SemanRcs:  Science,  Services  and  Agents  on  the  World  Wide  Web  Volume  9,  Issue  2  2011  113  -­‐  127  hkp://dx.doi.org/10.1016/j.websem.2011.02.001  

Kanga  Methodology  

Page 5: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 6: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 7: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

BRAT  rapid  annotaRon  tool  

Pontus  Stenetorp,  Sampo  Pyysalo,  Goran  Topić,  Tomoko  Ohta,  Sophia  Ananiadou  and  Jun'ichi  Tsujii  (2012).  brat:  a  Web-­‐based  Tool  for  NLP-­‐Assisted  Text  AnnotaRon.  In  Proceedings  of  the  DemonstraAons  Session  at  EACL  2012.    (hkp://brat.nlplab.org/)  

Page 8: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

ChartEx  architecture  Charter  Documents  

Analysed  individual  documents  

Language  processing  

Data  Mining  

Analysed  integrated  documents  

Researcher’s  workbench  

Page 9: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

NLP  in  ChartEx  

Page 10: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

NLP  Training  Methodology  

50  documents  

Training  documents  

Unmarked  Documents  

Newly  marked  documents  

NLP  

Compare  

Page 11: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

What  is  NLP?  

•  Natural  Language  Processing  – gepng  computers  to  process  human  language  

•  Natural  Language  Understanding  (NLU)  – aka  InformaRon  ExtracRon  – gepng  meaning  from  text  – applicaRons  in  wide  variety  of  text  types  in  modern  languages  

•  Ontologies  used  to  help  process  texts  and  as  target  “meanings”  

Page 12: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

NLP  in  ChartEx  •  (Compared  to  much  NLU)  •  Short  documents  •  Standardised  (formulaic)  overall  structure  •  Complex  referring  expressions  •  Phrases  (especially  noun  phrases)  osen  more  important  than  clauses  

•  Usually  key  informaRon  within  a  single  sentence:  –  Grant/quitclaim/enfeoffment  etc.  –  by  [person_A]  –  of  [land_descripRon]  –  to  [person_B]  –  paying  [payment]  

Page 13: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Training  data  

•  Three  principle  types  of  data  provided  by  historians:  

1.  Lists  of  key  terms  extracted  from  charters  2.  Raw  (unannotated)  charter  documents  3.  Annotated  charter  documents  •  2  used  to  supplement  1  •  1  feeds  directly  into  NLP  lexicons  •  3  used  for  more  complex  analysis  to  train  

system  and  to  test  system  against  

Page 14: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

BRAT  markup  

•  Historians  work  with  “preky”  interface  •  For  NLP  training  we  use  plain  text  output:    

T1 Document 0 17 vicars-choral-387 T2 Transaction 18 23 Grant T3 Person 27 34 William T4 Institution 73 90 prebend of Masham T5 Person 42 49 Richard T6 Occupation 51 64 canon of York … R3 is_son_of Arg1:T3 Arg2:T5 R4 is_grantor_in Arg1:T3 Arg2:T2 R5 occupation_is Arg1:T3 Arg2:T6

 

Page 15: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Differences  •  Most  NLP  systems  idenRfy  all  (or  at  least  many)  possible  interpretaRons,  then  rank  them  in  order  of  likelihood  

•  Humans  rarely  aware  of  ambiguiRes  a  computer  will  spot:  –  I  met  her  yesterday  –  Apparently  unambiguous,  but  –  All  our  yesterdays  –  Need  to  allow  yesterday  to  be  a  noun  –  So,  met  has  to  have  a  person  as  its  object?  –  I  met  my  deadlines  –  Need  to  use  general  and  contextual  knowledge  

Page 16: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

A  ChartEx  example  

Grant  by  William  son  of  Richard,  canon  of  York,  to  the  prebend  of  Masham,  for  the  salvaAon  of  the  soul  of  the  late  king  Richard  and  his  own  sould  and  the  sould  of  his  ancestors  and  parents  of  land  in  Petergate  with  buildings.    

(Vicar’s  Choral  387,  first  sentence)  

Page 17: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 18: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 19: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 20: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’
Page 21: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

EvaluaRng  NLP  output  

•  Checking  against  manual  annotaRon:  – Does  the  NLP  idenRfy  all  the  enRRes  and  relaRonships  the  manual  annotaRon  includes?  (RECALL)  

– No,  especially  not  relaAonships  as  yet  – Are  all  the  enRRes  and  relaRonships  idenRfied  by  the  NLP  in  the  manual  annotaRon?  (PRECISION)  

– No,  but  not  all  are  incorrect  •  Data  mining  can  help  disRnguish  important  from  irrelevant  

Page 22: People,’places’and’events’in’ charters:’exploring’the ... · People,’places’and’events’in’ charters:’exploring’the’language’of’ charters’within’ChartEx’

Thank  you!