Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X >...

26
௳מ༄ᔱቘ᧞ᑕ Beihang K¢ƴ 1. >“ņŅőç×°ŅőĨťdVò( 2. ;žñōþbƯōþb~eōþbç×ĨťdV 3. ¸Yßņ·áƏ$ƣƸ 4. Big Data£>“ņÝQ$ÖÉ 5. Āŭ9ĐƋGoogleBaidu±źƝĥ(ĄĨÔă~ìĔ 6. ũ$µĚĨßņ·á 102 Ľű >“ņĨŴ3 ØƁ'ęƼ 201609õ23ï

Transcript of Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X >...

Page 1: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

1.  “

2. 

3. 

4.  Big Data “

5.  Google Baidu

6. 

1 0 2

2016 09 23

Page 2: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  ”

• 

• 

Beihang

( )

•  Evaluation– – 

Page 3: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

“ (1)

• – 

• • • • 

– • 

Beihang

“ (2)

• –  v.s.

•  Indexing structures •  Interaction with OS •  Communication delays •  Other overheads

– “ retrieval performance evaluation

•  IR : Relevance

Page 4: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

: Relevance •  “ “

–  Answer precise question precisely. –  Partially answer question. –  Suggest a source for more information. –  Give background information. –  Remind the user of other knowledge.

Beihang

•  ““

– •  0 1

– •  0 1 2 3

4

– •  1994 Stefana Mizzaro 4•  < >

•  http://www.psy.gla.ac.uk/~steve/stefano.html

Page 5: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  “–  Batch mode

• • 

–  “ Interactive retrieval • 

• – – 

Beihang

•  ”

• 

• 

Page 6: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Beihang

Page 7: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Beihang

Page 8: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  / (Recall rate) – “

•  / (Precision) – “ “

C C CR CA

CRa

Recall = RaR A

Ra=Precision

Beihang

-•  → q

–  Rq={d3,d5,d9,d25,d44,d56,d71,d89,d123,d23} –  “

{ d123, d84, d56, d6, d8, d9, d511, d129, d187, d25, d38, d48,d250, d113,d3 }

•  11 (11 standard recall levels) 0%,10%, 20%...90%, 100%

))

(

%

)

%

% (

61027254

.10 33

61027254 7 .10 33 86 1

Page 9: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

A problem

•  11–  11–  Rq={d3, d56, d129} – “

{ d123, d84, d56, d6, d8, d9, d511, d129, d187, d25, d38, d48,d250, d113,d3 }

Recall: 33.3%, 66.7%, 100% Precision: 33.3%, 25%, 20%

%

%

(( )

.050

32

. 11

.05032 7 5 . 11 6 7

Beihang

: Interpolation

•  rj j j=0,1,…,10

)(max)( 1 rPrP jjj rrr +≤≤=

(

(

%

%(

(

% ( )

.721

3836

5

21044

.72138365 8 21044 97 2

%

%

(( )

.050

32

. 11

.05032 7 5 . 11 6 7

Page 10: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  Average Precision –  “

– 

∑=

=Nq

i q

i

N(r)P(r)P

1

Nq Pi(r) r

Beihang

ROC/AUC

TPR: True Positive Rate, Recall/Sensitivity FPR: Fall-out FNR: Missing rate TNR: Specificity

Precision v.s. Accuracy ROC: Receiver Operating Curve AUC: Area Under the Curve

63 37

7228

Page 11: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

• 

•  “

•  “

Beihang

•  P@5/P@10/P@N –  5/10

•  R (R-precision) – “ R R

RR

PrecisionR����������

=−

Page 12: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  (Mean Average Precision)

–  AP ri,

–  MAP: AP–  , MAP

•  A(q1): d1,d2,d3,d4,d5

•  A(q2): d1,d3,d4,d2,d5

��

��

��

∑ ∑= =

×= � � �� �����

���

Beihang

• –  “

0652 40-3.

061

0-34.

Page 13: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

• – 

– E•  b=1 E=1-F, E F•  b>1 p r •  b<1 r p

rppr

rp+

=

⎟⎟⎠

⎞⎜⎜⎝

⎛+

=2

112F

( )

⎟⎟⎠

⎞⎜⎜⎝

⎛+

+−=

prbb1

11E2

2

Beihang

Page 14: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Beihang

•  Discounted Cumulated Gain – CG – DCG – NDCG

•  BPREF –  /

– 

Page 15: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Beihang

Page 16: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

• –  C=Rk/U

• –  novelty=Ru/(Rk+Ru)

•  (relative recall) – “

•  (recall effort) – 

CU CRk

CRu

CR CA

Beihang

•  ”

• 

• 

Page 17: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

TREC •  TREC

–  Text REtrieval Conference “

–  “

• –  NIST(National Institute of Standards and Technology) –  U.S. Department of Defense

• –  “

–  1992~2012 21

Beihang

TREC

•  ““

–  “

– 

  “

  “ “

Page 18: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

•  Track –  TREC

•  Topic –  “

–  topicèquery ( ) –  Question (QA)

•  Document – 

•  Relevance Judgments – 

Beihang

TREC

•  TREC ()

•  TREC–  : NIST–  :–  :

NIST –  : NIST

– 

Page 19: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

TREC

• –  GB– –  SGML (Standard Generalized Markup

Language)

•  Topic – –  SGML

• – 

Beihang

Topic

•  Title

•  Description TitleTitle

•  Narrative

Page 20: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Topic<topic number="2" type="diagnosis"> <description> A 62 yo male presents with four days of non-productive cough and one day of fever. He is on immunosuppressive medications, including prednisone. He is admitted to the hospital, and his work-up includes bronchoscopy with bronchoalveolar lavage (BAL). BAL fluid examination reveals owl's eye inclusion bodies in the nuclei of infection cells. </description> <summary> A 62-year-old immunosuppressed male with fever, cough and intranuclear inclusion bodies in bronchoalveolar lavage </summary> </topic>

Beihang

Topic

•  Topic

• 

Page 21: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

• –  Set Precision/Set Recall

• – P@n/Average Precision/Reciprocal Rank

• – Filtering Utility

Beihang

(1)

•  topic NIST

100

•  Pooling –  n

Page 22: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

(2)

•  NIST trec_eval

precisionrecall)

•  track

Beihang

TREC

•  Ad hoc – 

•  Information Routing – 

Page 23: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

TREC

Beihang

TREC 2016-Tracks •  Clinical Decision Support Track •  Contextual Suggestion Track •  Dynamic Domain Track •  Live QA Track •  OpenSearch Track •  Real-Time Summarization Track •  Tasks Track •  Total Recall Track

Page 24: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

Beihang

NTCIR

•  NII Test Collection for IR Systems –  NII (National Institute of Informatics)

–  1998–  “

– • •  “

• •  “

Page 25: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

CLEF

•  Cross-Language Evaluation Forum –  2000– 

– •  “

•  “

•  “

• •  “

•  “

•  “

•  “

Beihang

User-Based Evaluation

•  Human experimentation in the lab •  Side-by-side panels •  A/B testing •  Crowdsourcing •  Using clickthrough data

Page 26: Lecture2-Evaluationact.buaa.edu.cn/hsun/IR2016/slides/Lecture2-eval.pdf · Beihang = Þ ó 1 ; X > Þ U -FÀ: Relevance • ] ““ F M p Z -F ( M – Answer precise question precisely.

Beihang

• 

Beihang

Q&A