The growing pains of a controlled vocabulary
-
Upload
karen-loasby -
Category
Technology
-
view
2.261 -
download
0
description
Transcript of The growing pains of a controlled vocabulary
1Karen Loasby 7 March 2005
The growing pains of a controlled vocabulary
2Karen Loasby 7 March 2005
Introduction
• Karen Loasby• Information architect• Worked for BBC for 4 years on search,
navigation, metadata and content management projects
• 2 years previously for the Guardian newspaper archiving the paper and arranging content on the website
• MSc in Information Science from City University, London
3Karen Loasby 7 March 2005
Agenda
• Background
• The problem
• Formal classification vs. Folk tags
• Our middle ground
• What happened
• Learning points
• Questions
4Karen Loasby 7 March 2005
Background
• Content management project
• Regional websites
• Need for metadata
• Authors around the UK
5Karen Loasby 7 March 2005
6Karen Loasby 7 March 2005
Problem
• Faceted classification system
• Authors to tag
• Central control
• But …
• Journalists are the specialists – know the domain and the vocabulary.
7Karen Loasby 7 March 2005
Formal classification
• Pre-determined terms
• Centralised control• Rich relationships
8Karen Loasby 7 March 2005
Folk tags
• What it is then?• Folksonomy,
ethnoclassification, social classification, social categorisation and so on
9Karen Loasby 7 March 2005
Comparing approaches
Formal• High maintenance• Consistent/predictable• Rich relationships• Can be artificial
Folk• Low maintenance• Quirky/surprising• Less added value• Real user language
10Karen Loasby 7 March 2005
A role for both
• Where we are using folk tagging
• And where we won’t– Trust & Authority– High value to business– Missing motivation from users– Broad domain/user base– To avoid tryanny of minority
11Karen Loasby 7 March 2005
An experimental middle ground
• Centralised control of terms
• But encouraging absorption of user language
• Higher maintenance than folk tags
• Cheaper than professional cataloguing
12Karen Loasby 7 March 2005
BBC Experience
Semi-automatic classification
Terms suggested from the CVs
Terms are OK
The suggested terms do not describe
the content
Search or browse for terms
Send suggestion to the CV team
Terms are OK
Send suggestion to the CV team
CV team evaluatesuggestion Say no to the term
– change the classification on
the content object
Add to CV as a variant term
or preferred term
13Karen Loasby 7 March 2005
Operational system
• 8000 requests in 10 months
• From 160 journalists– Average per user of 50 terms– However this varied wildly. Our top user has
suggested 476 terms
14Karen Loasby 7 March 2005
Graph showing variationbetween teams
0
100
200
300
400
500
600
700
800cum
bria
tyne
cam
bridgeshire
guern
sey
leic
este
r
south
york
shire
wilt
shire
suffolk
liverp
ool
mancheste
r
berk
shire
bristo
l
kent
coventr
y
tees
jers
ey
sto
ke &
sta
ffs
nottin
gham
derb
y
hum
ber
som
ers
et
nort
ham
pto
nshire
norf
olk
beds, bucks &
hert
s
leeds
here
ford
& w
orc
s
birm
ingham
15Karen Loasby 7 March 2005
Growth in the CVs
• Up 15000 terms in 10 months
• Most growth in person/proper names • People, venues and organisations• Up by 50% to 35,000
16Karen Loasby 7 March 2005
Growth of facets
CV Requests By Month
0
1000
2000
3000
4000
5000
6000
7000
Month
Qu
an
tity Name
Location
Subject
BBC Brand
Time Period
17Karen Loasby 7 March 2005
Types of terms
• Mostly good– Only 200 terms actually rejected
• Synonyms vs. entirely new terms– New for names (only 2% synonyms)– Synonyms for subject (15% synonyms)– Location – needed colloquial terms
18Karen Loasby 7 March 2005
Resourcing
• Handling the requests from journalists
• First 3 months – one IA
• Subsequently 2 to 3 junior IAs
• Too much – how to reduce?
19Karen Loasby 7 March 2005
Lessons learned
• Success with the journalists– They suggested terms!– Got the faceted classification – Began to suggest terms in “our” format – Some did engage at a detailed level
20Karen Loasby 7 March 2005
Lessons Learnt
• Difficulties for journalists– System looks as if totally automatic as part of
a content management system– “Journalists are people too”
– Users struggling with a content object tagging system; rather than page based
21Karen Loasby 7 March 2005
Example
Subject: Pregnancy
22Karen Loasby 7 March 2005
Lessons Learnt• Difficulties for journalists, cont.
– They find it boring – Makes it harder for the aim of “finding and re-
use” to apply – Needed to do more pre-emptive work for them
23Karen Loasby 7 March 2005
Lessons learnt
• Number of terms suggested depends on– Type of facet– Dynamism of content– Scope of the content– Enthusiasm of users
24Karen Loasby 7 March 2005
Next?
• High value facets still need control– Make use of the metadata(!)– Sell the message– Federated management– Earlier in production
• And for folk tagging?
25Karen Loasby 7 March 2005
Thanks to the IA team for their analysis work;– Jon Carey– Adil Hussein– Christine Rimmer