HYP Progress Update By Zhao Jin. Outline Background Progress Update.
-
Upload
agatha-owen -
Category
Documents
-
view
223 -
download
1
Transcript of HYP Progress Update By Zhao Jin. Outline Background Progress Update.
Background
• Query (Text-based)– The set of keywords to be entered into the
system to retrieve the desired information or resources
– Main category• Traditional IR • Web (ex. Google)• OPAC (ex. LINC)• Video (ex. TRECVID)
Background
• Query Analysis– To analyze the pattern and hidden information
in the queries
– To efficiently classify and support such queries.
Progress update
• Mid-May to Early June– Background reading– Around 30 to 40 papers on various topic– Summarizing of key points in the paper
Progress update
• Mid-June to late-June– Log analysis
• BBC Video Query• NUS OPAC Query
– Background reading on OPAC and TRECVID
Progress update
• July to now– Follow up on two main topics
• Query classification and division on content-based and feature-based keywords (OPAC)
• Identifying ASR-oriented keywords in a video query (TRECVID)
– Background reading on MARC, wordnet and LOC subject heading
Progress update
• Plan for the near future– Refine and experiment with the current ideas
– Log analysis
– Background reading (Textbook & Related paper)
– Preparation for implementation
Two types of keywords
• Content-Based Keyword (CBK)– The keywords that concern what the item is
about– Ex. title, subject heading, etc
• Feature-Based Keyword (FBK)– The keywords that concern the features of the
item.– Ex. author, publisher, genre, medium
Possible implementation
• Possible implementation: – term co-occurrence for concept division
– list of special words and machine learning for FBK and CBK division
– wordnet for classification among CBKs
Possible implementation
• Possible implementation: – CL and IL search algorithms for actual
searching with CBKs.
– list of special words and machine learning for classification among FBKs.
– Marc record search algorithms for actual searching with FBKs.
Back
Means to retrieve shots
• Example:– To find shots of “Bill Clinton”
• Face recognition
• Closed-caption
• Automatic Speech Recognition (ASR)
Metrics
• Common VS Special (In reality) – How common in reality is the concept
represented by the keyword.
• Generic VS Specific – How generic is the concept represented by the
keyword.
Metrics
• Concrete VS Abstract – Whether the keyword represented is concrete
or abstract
• Topic frequency (Low VS High) – How often the keyword becomes (closely
related to) a topic.
Metrics
• Formal VS Informal – Whether the keyword is in formal or informal
language
• Written VS spoken – Whether the keyword is in spoken or written
language