AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results...
Transcript of AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results...
![Page 1: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/1.jpg)
AUTOMATICACCESSTOLEGALTERMINOLOGYAPPLYINGTWODIFFERENTATRMETHODS
MARÍAJOSÉMARÍNCAMINOREA
![Page 2: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/2.jpg)
1.INTRODUCTION-Importance ofidentifying the terms inaspecialisedcorpus.
- Terms are“textualrealisations ofspecialisedconcepts”(Spasic etal.2005).
-They areemployed to communicate amongst specialists (Rea,2008).
- They aremono-referential andhave aunivocal character(Cabré,1993):form content.
-Potential applications ofautomatic term recognition (ATR):building dictionaries andglossaries;machinetranslation;ontology building,etc.
![Page 3: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/3.jpg)
1.INTRODUCTION- This paper evaluates the efficiency oftwo ATRmethods:Keywords (2008)andChung’s (2003)on a2.6mword legalcorpus(UKSCC).
- Both will be validated interms ofprecisionandrecall.
![Page 4: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/4.jpg)
2.UKSCCand LACELL:the study andreference corpora- UKSCC(United Kingdom Supreme Court Corpus):
- Legalcorpusof192judicialdecisions issued by theSupremeCourt ofthe United Kingdom (2008-2010).
- Compiled adhocaccording to CLstandards (Sánchez,1995; Wynne,2005;Pearson,1998;Rea,2010).
- Monololingual andsynchronic.
- The Supreme Court chosen assource oftexts due to itsimportance asajudicialinstitution:touches upon allbranches oflaw andgreater geographical scope.
- Judicialdecisions appear asthe main source oflaw incommon law countries.
![Page 5: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/5.jpg)
2.UKSCCand LACELL:the study andreference corpora- LACELL(LingüísticaAplicadaComputacional,EnseñanzadeLenguasyLexicografía):
- Balanced generalEnglish corpusof20mwords:written(newspapers,books,magazines,brochures,letters,etc.)andorallanguage samples.
- Compiled by the LACELLresearchgroup atMurciaUniversity (EnglishDept.).
![Page 6: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/6.jpg)
3.ATRmethod descriptionAutomatic term recognition (ATR) methods date back to the1980s: they allow handling large amounts of data automaticallyspotting the most relevant terms in a specialised corpus. Theyhave been profusely reviewed (Maynard and Ananiadou, 2000;Cabré et al., 2001; Drouin, 2003; Lemay et al., 2005; Vivaldi et al.,2012, etc.)
- Keywords (Scott,2008):
- Not an ATR method proper, however, it has proved toidentify legal terms more efficiently than other methodsdesigned to that purpose.
- Automatically implemented using Wordsmith 5.0.Settings adjusted tu use Dunning’s (1993) log-likelihoodalgorithm.
![Page 7: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/7.jpg)
3.ATRmethod description
- Chung’s method (2003)
- Singled out due to high rate of success recorded by theauthor (86% precisionon average).
- Chung compares a qualitative term recognition method:the rating scale approach with her own quantitative one.She concludes that terms displaying a > 50 ratio ofoccurence are terms.
- How to calculate it:
Wr= SF(w)/RF(w) (freq. counts must be normalised)
![Page 8: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/8.jpg)
4.Validation process andresults
4.1. Validation process-Precision: Percentage of true terms out of candidate termsextracted.
-Recall: Percentage of true terms out of total amount of termsin the whole corpus.
-We resorted to automatic validation: 10,000 entry legalelectronic glossary compiled by authors used as gold standard(human validation poses problemsdue to subjectivity) .
-Terms confirmedas true if found in glossary.
-Keywords implemented automatically; Chung’s methodapplied using spreadsheet (data obtained withWordsmith too)
![Page 9: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/9.jpg)
4.Validation process andresults4.2. Results
Fig.1Overallprecisionandrecallachievedbyeachmethod
62,00%
42,25%
31,00%
11,75%
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
Keywords Chung
Precision
Recall
![Page 10: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/10.jpg)
4.Validation process andresults4.2. Results
Fig.2Cumulative precisionattainedontop2000candidates
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Keywords
Chung
![Page 11: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/11.jpg)
4.Validation process andresults4.2. Results
-Keywords excels Chung’s method both in terms ofprecisionand recall.
-Keywords decreases its efficiency smoothly andconstantly whereas Chung’s method performance is muchmore irregular.
-Chung’s method only performs better within candidates1600-1800dropping sharply to 3.5% precisionafterwards.
-Chung’s bad results may be due to the automatic inclusionof words not in the reference corpus in the terms group,especially proper names so typical of judicial decisions.
![Page 12: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/12.jpg)
4.Validation process andresults4.2. Results
Chung’smethod(2003) Ratio Keywords(2008) KeynessEHRR ∞ COURT 28955.793EWCA ∞ SECTION 27627.5586UKHL ∞ PARA(paragraph) 25311.1152MANCE ∞ LORD 25155.4434SIAC ∞ V(versus) 22486.0918ECHR ∞ APPEAL 21236.8652EWHC ∞ ARTICLE 19301.6328BAILII ∞ ACT 18577.8652GESTINGTHORPE ∞ CASE 18328.9512FOSCOTE ∞ LAW 10458.0918EARLSFERRY ∞ JUDGMENT 9297.75JFS ∞ APPELLANT 8048.33496ECTHR ∞ PROCEEDINGS 7787.61963STOJEVIC ∞ CONVENTION 7764.64355TURPI ∞ WHETHER 7716.16992LJ ∞ LJ 7707.0918DALLAH ∞ RIGHTS 7023.53613SUMPTION ∞ DECISION 6950.50488SEISED ∞ ORDER 6632.18164BANKOVIC ∞ JURISDICTION 6374.33105
Table1.Top25candidatetermsextractedbyeachmethod
![Page 13: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/13.jpg)
5.Conclusion
- Evaluating the efficiency ofATRmethods is highlyrecommendable to select the ones that suit our corpusbest,especially due to the fact that some ofthem aredomain-dependent like Chung’s .
-Nevertheless,asputforwardbyLemay(2005:245),“muchstillremainsontheterminologist’sabilitytodifferentiatearelevantunitfromanon-relevantone.Listsmustbescannedtoremoveirrelevantunits”.Actually,“fine-grainedsemanticdistinctionsstillrely...onterminologists”.
![Page 14: AUTOMATIC ACCESS TO LEGAL TERMINOLOGY APPLYING TWO ... · 4. Validation process and results 4.1.Validation process-Precision: Percentage of true terms out of candidate termsextracted.-Recall:](https://reader035.fdocuments.net/reader035/viewer/2022071423/611de7561b08636f6e439079/html5/thumbnails/14.jpg)
ReferencesAlcarazVaró,E.(2000)Elinglés profesional yacadémico.Madrid:AlianzaEditorial.Cabré,M.T.(1993).Laterminología.Teoría,metodología,aplicaciones.Barcelona:Antártida/Empúries.Cabré,M.T.,Estopà,R.,Vivaldi,J.(2001).Automatictermdetection:areviewofcurrentsystems.Bourigault,D.,Jacquemin,C.,L’Homme,M.C.(Eds.).RecentAdvancesinComputationalTerminology2,53-87.Amsterdam:JohnBenjamins,NaturalLanguageProcessing.Chung,T.M.(2003).Acorpuscomparisonapproachforterminologyextraction.Terminology9(2),221-246.Heatley,A.,Nation,I.S.P.1996.Range(computersoftware).Wellington:VictoriaUniversityofWellington.Kit,C.andLiu,X.(2008)“Measuringmono-wordtermhoodbyrankdifferenceviacorpuscomparison”.Terminology,14(2):204-229.Lemay,C.,LHomme,M.C.,Drouin,P.(2005).Twomethodsforextracting"specific"single-wordtermsfromspecializedcorpora:experimentationandevaluation.InternationalJournalofCorpusLinguistics,10(2),227-255.Marín,M.J.,Rea,C.(2011)."DesignandcompilationofalegalEnglishcorpusbasedonUKlawreports:theprocessofmakingdecisions".Carrió Pastor,M.L.,Candel Mora,M.A.(Eds.).LasTecnologías delaInformación ylas Comunicaciones:Presente yFuturo enelAnálisis deCórpora.Actas delIIICongreso Internacional deLingüística deCorpus(101-110).Valencia:UniversitatPolitècnica deValència.Maynard,D.andAnaniadou,S.2000.TRUCKS:Amodelforautomaticmulti-wordtermrecognition.JournalofNaturalLanguageProcessing8(1),101–125.Pearson,J.(1998).TermsinContext.Amsterdam:JohnBenjamins PublishingCompany.Rea,C.(2008).ElInglésdelasTelecomunicaciones:EstudioLéxicoBasadoenunCorpusEspecífico.Tesisdoctoral.Murcia:UniversidaddeMurcia.Rondeau,G.(1983).Introduction àlaterminologie.Québec:Gaëtan Morin Editeur.Sánchez,A.,Cantos,P.,SarmientoR.,Simón,J.1995Cumbre.Corpuslingüísticodelespañolcontemporáneo.Fundamentos,metodologíayanálisis.Madrid:SGEL.Scott,M.2008.WordSmith Toolsversion 5.Liverpool:LexicalAnalysisSoftware.Spasic,I.,Ananiadou,S.,McNaught,J.&Kumar,A.2005.‘Textminingandontologies inbiomedicine:Makingsenseofrawtext’.BriefBioinform,6(3),239-251.Vivaldi,J.,Cabrera-Diego,L.A.,Sierra,G.,Pozzi,M.(2012).‘UsingWikipedia to Validate the Terminology Found inaCorpusofBasicTextbooks’.InProceedings ofthe Eight InternationalConferenceon LanguageResources andEvaluation (LREC'12).Instambul,May 2012.Wynne,M.(Eds.)2005DevelopingLinguisticCorpora:aGuidetoGoodPractice. ASDSLiterature,LanguagesandLinguistics.Oxford.