場所参照表現タグ付きコーパスの構築と評価

32
場所参照表現タグ付きコーパスの 構築と評価 松 耕史 * , 佐々 彬 * , 岡崎 直観 †* , 乾 健太 * *) 東学, †) 科学技術振興機構 さきがけ 220回 情報処学会 然語処研究会(NL) 九州学医学部 百講堂 2015/01/20 1

Transcript of 場所参照表現タグ付きコーパスの構築と評価

  • *, *, *, * *) , )

    220 (NL)

    2015/01/20

    1

  • ()()

    SNS(Twitter)

    NLP J[,

    2013] L

    Fig. DISAANA (NICT)

    Fig. HotLocation Finder () 2

  • a) [] b) [()] c) [()]

    [] [] Web()

    3

    ()

  • /

    a) b) c) / () /

    a) //

    Web 4

  • 38.00 140.62

    Web

    38.26 140.87

    38.00 140.62

    38.00 140.62

    SNS 5

  • /

    Web

    6

  • /

    7

  • [, 2004], [, 2008]

    ()

    a)

    Web 15 Yahoo!(+CGM) 500

    8

  • 1. Mention Detection: a) 3 b)

    2. Entity Resolution :() a) /3 b) /JR c) /

    Mention Detection

    Entity Resolution

    9

    Entity Linking(TAC KBP)

  • NER ()

    ()

    10

    ( / )

    DB

    DB

    UI UI

  • 11

  • a) b)

    / a) D /

    () a) b)

    12

  • Twitter(1000)

    Bot

    / () /

    1000 105

    13

  • /

    14

  • (1)

    (1000)

    (1000)

    En En

    LOC() 977 901(92%) 36 29(80%)

    FAC() 356 286(80%) 88 32(36%)

    RAIL() 61 2

    ROAD() 7 1

    J 2 L

    15

  • YES NO

    Entity Resolution

    (1/2)

    16

    YES

    NO

    NO YES

    Mention Detection

    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

    7000 389

  • (2/2)

    1 () a) b)

    2 a) b)

    90%

    10%

    17

  • F LOC() 93.3 FAC() 77.9 ORG() 83.3 RAIL() 72.0 ROAD() 50.0

    (NER)

    J L () Cohens =0.892

    (m)

    1648 0 72101

    J80% ()515

    (2)

    2002

    18

  • a) b)

    a)

    b)

    19

  • /

    20

  • 200250 (/)

    20(11%) 13(18%) 33(15%) 126(71%) 0(0%) 126(50%) 20(11%) 7(10%) 27(10%) 0(0%) 21(29%) 21(8%) 3(2%) 21(29%) 24(10%) 10(6%) 17(23%) 27(11%) 1(0.5%) 4(6%) 5(2%) 21

  • 78

    (: ) 3 3

    (1)

    20(11%) 13(18%) 33(15%) 126(71%) 0(0%) 126(50%) 20(11%) 7(10%) 27(10%) 0(0%) 21(29%) 21(8%) 3(2%) 21(29%) 24(10%) 10(6%) 17(23%) 27(11%) 1(0.5%) 4(6%) 5(2%)

    22

  • a) / /

    / b)

    ()

    a)

    23

  • : SNS(Twitter) : Web 2

    1.

    2.

    1. ()

    2.

    24

  • 8000

    25

  • 38.00 140.62

    Web

    38.26 140.87

    38.00 140.62

    38.00 140.62

    SNS 26

  • 27

  • TR-CoNLL() [Leidner2007] CoNLL-2003()

    : :

    CWar()[Crane, 2000] 340

    :(19)

    28

  • TR-CoNLL / CWar / LGL#doc #token #toponym domain

    [EN] TR-CoNLL (Leidner 2008)

    1000 200000 6000 Reuters International News

    CoNLL2003(NER)

    [EN] CWar (Crane 2000) (Speriosu 2013)

    341 58mil 232000 OCRed Books (About US Civil War)

    OCR+NER+

    [EN] LGL (Lieberman 2010)

    588 213000 5088 News Articles (Localized news sources)

    [JA]

    1000 ()

    3 36()/ 88()

    SNS(Twitter)

    29

  • DB

    NE

    30

  • a) ()

    b)

    c) ()

    d) TDL

    31

  • SNS

    Twitter

    Twitter

    32