Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents...

41
Copyright © Doctrine Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27

Transcript of Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents...

Page 1: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Copyright © Doctrine

Meetup NLP #2 season 4

Structuring legal documents

with Deep Learning

Pauline Chavallard2019/11/27

Page 2: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Plan● About Doctrine and legal

research● Motivations● Modeling● Results● Further work

Page 3: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Google for law

Doctrine was created in 2016

Challenges

- volume of data

- heterogeneity

- domain specificity

Page 4: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Legal contents have tons of links

Page 5: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Challenges in data science at Doctrine

Low/weak supervision:

● No labeled data (esp. in French)

High specificity/heterogeneity:

● Language is different between decisions, legislations and commentaries

● Among decisions, depending on courts, structures are different

● Content comes in various formats (papers, images, PDFs, texts)

Page 6: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

An example of French court decision

Page 7: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Plan● About Doctrine and legal

research

● Motivations● Modeling● Results● Further work

Page 8: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Motivation

● Four million court decisions delivered each year in France

● Critical information for lawyers

Problem:

● Long and complex documents

● One may be interested only in a very precise part

Page 9: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

French court decisions

A french court decision is generally structured following these sections:

● Metadata (« En-tête » in French): court, number, date, etc., of the trial.

● Parties (« Parties » in French): information about the claimants and defendants

● Composition of the court (« Composition de la cour » in French)

● Facts (« Faits » in French): what happened?

● Pleas in law and main arguments (« Moyens » in French): arguments presented by

the claimant and defendant.

● Grounds (« Motifs » in French): reasons and arguments used by the court

● Operative part of the judgment (« Dispositif » in French): final decision

Page 10: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

French court decisions - Example

Cour d'appel de Metz, 28 janvier 2015

Page 11: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

French court decisions

Unfortunately, there is no mandatory guideline on how to

release a court decision.

Courts may use:

● different styles in term of writing

● different styles in term of organising the documents

● all sections from previous slide, or a subset

Page 12: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

The French Court of Appeal usually has a very unified way of

writing: ~55 % have explicit titles for their categories

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Orleans/2007/SKDD824CCFE8D8D9D93128.

Page 13: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

The French Court of Appeal usually has a very unified way of

writing: ~55 % have explicit titles for their categories

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Orleans/2007/SKDD824CCFE8D8D9D93128.

Facts

Page 14: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

For the remaining 45 %, it’s harder...

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Metz/2015/RAC1261A1563690C06B77

Page 15: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

How would an algorithm automatically generate table of contents ?

Page 16: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Plan● About Doctrine and legal

research● Motivations

● Modeling● Results● Further work

Page 17: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Information needed

To complete this task, a human being would take advantage of:

1. The vocabulary used

2. The order of the paragraphs

Page 18: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Information needed

1. The vocabulary used

Not always so obvious, legislation references in both...

-> standard approaches

(BoW - TF-IDF)

encodings performed

poorly

Page 19: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Information needed

1.

2. The order of the paragraphs

● Metadata

● Parties

● Composition of the court

● Facts

● Pleas in law and main arguments

● Grounds

● Operative part of the judgment

-> sequential

information is

important

Page 20: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling

Split decisions into paragraphs (X)

Pre-process Replace rare words by <UNK> with p=0.5

Page 21: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Dataset creation

● Find labeled data from structured decisions with titles

● Remove titles

● Assign each paragraph to its corresponding label (y)

● y ∈ [0, 6]

-> Supervised classification

Page 22: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Looks like Named Entity Recognition... at paragraph scale.

With LSTM / CRF, we capture information from

● paragraph inherent properties

● paragraph context (the neighborhood gives insights on the label)

[1] Neural Architectures for Named Entity Recognition. Lample, Ballesteros, Subramanian, Kawakami, Dyer.

NAACL 2016.

Modeling

Page 23: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: paragraph embedding

Page 24: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: paragraph embedding

source: A structured self-attentive sentence embedding

Page 25: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: all in one

Page 26: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: all in one

Page 27: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: all in one

end-to-end training

Page 28: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Plan● About Doctrine and legal

research● Motivations● Modeling

● Results● Further work

Page 29: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: results

● Trained on 20.000 decisions

Page 30: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: results

bi-LSTM outperforms mean but is more computation hungry

Page 31: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: results

CRF outperforms softmax for the same computation cost

Page 32: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: results

Attention brings a few points for a low computation cost

Page 33: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: results

CRF enables to watch transition

probabilities:

● Each class followed by itself

● Metadata -> Parties

● Metadata -> Composition

● Low triangle part: green

● High triangle part: red

Page 34: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: attention

Page 35: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Modeling: attention

Page 36: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Product outcome

On the 45% incomplete table of contents of Court of

Appeal decisions, we now manage to get 90% complete

ones with this approach

Page 37: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Plan● About Doctrine and legal

research● Motivations● Modeling● Results

● Further work

Page 38: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Copyright © Doctrine 38

Errors of the model

Page 39: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Copyright © Doctrine 39

Further work

- better paragraphs / sentences splitting

- one of the tag is very rare, doesn’t perform well

- play with optimizers, dropout, …

- try different architectures ?

Page 40: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Blog post is available

Paragraph classification: article by Doctrine

Page 41: Meetup NLP #2 season 4 Structuring legal …...Meetup NLP #2 season 4 Structuring legal documents with Deep Learning Pauline Chavallard 2019/11/27 Plan About Doctrine and legal research

Thank you for your attention!

Any questions ?