Content Findability in a Portable Content World

65
May 28, 2022 1 Content Findability in a Portable Content World Lise Kreps, M.S.L.S. Relevant Information Services lise@relevantinfoservices .com

description

What makes information worth finding? A discussion of the joys and perils of subject access, by Lise Kreps, taxonomist and librarian. Presented at the March 2008 Content Convergence and Integration Conference in Vancouver Canada. For more information, see my website, www.relevantinfoservices.com.

Transcript of Content Findability in a Portable Content World

Page 1: Content Findability in a Portable Content World

April 12, 2023 1

Content Findability in a Portable Content World

Lise Kreps, M.S.L.S.Relevant Information Services

[email protected]

Page 2: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Prologue: Who am I?

Master of Library Science, 1987– Academic & public librarian– Taught at University of Washington’s iSchool

20 years in technical documentation, usability and e-commerce

Software manual & online Help indexing Cataloging books, images, and audio

– Amazon, Microsoft, Corbis, National Public Radio

Page 3: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Content Findability in a Portable Content World

Act I: What makes information worth finding? Act II: What’s it all about? Act III: Too much of a good thing? Act IV: Can’t machines do this? Act V: What’s findability worth to you?

Page 4: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Act I: What makes information worth finding?

1. It satisfies my need well enough

2. Not more trouble than it’s worth to get it

Sounds simple, eh?

I know what I want, so why doesn’t it just magically appear?

Let’s look more closely…

Page 5: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The information satisfies my need well enough

The info I need -- not someone else needs

…for my specific purpose

… at this particular time

…in my particular context

Page 6: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The information satisfies my need well enough

Contains enough useful info to be worth my while

From a source I trust In language or style appropriate for my need In a format I can use for this need No legal or financial barriers to my using it

Page 7: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The information satisfies my need well enough

I didn’t miss anything too important In the Library & Information Science

world, this is called Recall:

Number of relevant items retrieved [divided by the] Total number of relevant items available

“Do I think I got enough of the good stuff that’s probably out there?”

Recall Example

Got it 60%

Missed it 40%

Page 8: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Not more trouble than it’s worth to get the information

Understands my question– Search interpreter (human or computer) speaks

my language at my level– Doesn’t make me guess or learn its terminology– Doesn’t give me results that seem unrelated to

my question

Page 9: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Not more trouble than it’s worth

Helps me make good choices– I can tell what each menu item means and how it differs from its

neighbours– Asks me to clarify my intention (“did you mean… or …”)– Shows other useful search terms within good items I find– Offers useful ways to change and narrow my search– Let’s shop for shoes on LandsEnd…

And narrow our choices by– Women’s

Sandals Leather etc.

Page 10: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Not more trouble than it’s worth

Most of all: I didn’t have to wade

through too much JUNK In the InfoSci world, this is called

Precision: Number of relevant items retrieved

[divided by the] Total number of items retrieved

“Out of all the stuff I got, how much of it was what I really wanted ?”

“Info-Noise:” the biggest barrier to findability

Precision Example

Good 30%

Junk 70%

Page 11: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Retrieval Effectiveness

High Recall + High Precision = High Information Retrieval Effectiveness = I found all of and only the info worth finding But here are the Gotchas…

Page 12: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Gotchas…

# 1: You can’t have it both ways.– Recall and Precision are inversely related: – Better Recall = worse Precision, and vice versa

# 2: You don’t know what you’re missing.– In the real world, Recall is hard to assess;

you may never know what relevant information you didn’t find.

Page 13: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Gotchas…

# 3: Size matters.– The way you seek information changes depending on how

much info you think you’re dealing with

# 4: You don’t know what you want...– Until you know what your choices are.

Finding out what’s available redefines your information need.

– Good searching is iterative

Page 14: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What’s a searcher to do?

In the InfoGlut world, we care most about Precision

Make educated guesses about – what is in the collection– whether we’re missing something important

Refine our search strategies, until we Get “enough” good results for this information

need…and then we quit

Page 15: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What’s a content producer to do?

Make your content very smart about how it presents itself

So it is findable in the contexts where it is most useful, and

It doesn’t become more of the InfoNoise

Page 16: Content Findability in a Portable Content World

April 12, 2023 16

Act II: What’s it all about?

[email protected]

Page 17: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Tags

– Smart content uses metadata tags, or database fields– Tags contain information such as the content’s

Creator Date of creation Format Location or destination Title Subject area

– Title might not be descriptive, e.g. Metaphoric or idiomatic: “Your new bundle of joy” Generated from filename: 2008030601.jpg

Page 18: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Subject area categorization

“Aboutness” Sometimes called “keywords,” Often the most useful “findability” access

point Subjective: aboutness is in the eye of the

beholder Each item is usually “about” multiple subjects What do you think this image is about?

Page 19: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What do the Corbis catalogers think it’s about?

About 20 keywords, for “Foreground” subjects

– Fishing boat – Harbor– Ocean

Implied subjects – Marine scenes – Industry – Travel

Geographic location – Goose Cove – Newfoundland– Canada

Image composition attributes

– Nobody – Reflection

“Emotional” attributes – Serenity – Simplicity

Page 20: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Weighting

Are all these keywords equally important? Which are most important? Best practice: “weight” each “aboutness” tag Items with “high aboutness” get ranked

higher in search results for that keyword

Page 21: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Size (still) matters

If you had just one bookcase, you couldorganize it by colour like this

But imagine if a public library waslike this Unshelved comic

As collection size increases, you need an increasingly complex system of subject categorization

Page 22: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Size (still) matters

The Problem of the World’s Biggest Bookstore (Amazon.com)

– Subject headings converge from multiple sources Publishers of all sizes Library of Congress Users’ tags

– “Education” is okay for some small publishers, but– Not usually specific enough for the Amazon Universe

Page 23: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What words should I use for “aboutness” keywords?

Prominent or frequent location– Featured in title, headings, description or summary– Appears frequently in the content– “Foreground” or main subject of image

High semantic value– Nouns (“snow”)– Gerund verbs for activities (“skiing”)– Short modifier phrases (“cross-country skis”)

Page 24: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What words should I use for “aboutness” keywords?

Differentiates this content from other content Users want this keyword

– Appears frequently in user search logs– Often suggested by users

Similar content uses this keyword– Competitors’ websites– Published thesauri

Page 25: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Which keywords have I already used?

You need to be able to– Browse all keywords as alphabetical list– Use this list when tagging new content– Edit the keywords -- both in the list and in the

content

Tagging consistently increases Precision

Page 26: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Big 3 Problems Inherent In Language

OK now I have a list. Am I done yet?– No. Uncontrolled lists like this, and folksonomies also, do

not handle the… Big 3 Problems Inherent In Language:

1. Equivalent relationships2. Homonyms (look the same but aren’t)3. Hierarchical relationships and other related concepts

Page 27: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Equivalent relationships

In a book index, these are “See” references Decide on a preferred form of the keyword,

and lead the other forms to that Word variations and synonyms are the most

common

Page 28: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Equivalent relationships

Word variations– Spelling variations

color = colour Chanuka = Hanukkah

– Word ending variations (word stemming) Canad* = Canada, Canadian Immigra* = immigrant, immigrate, immigration

– Plural and tense variations goose = geese run = ran, running

Page 29: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Equivalent relationships

Synonyms– baby = infant– purchasing = buying– pupil = student – but not if it’s Pupil (Eye)

Equivalency control increases Recall

Page 30: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Homonyms

Words that are spelled alike but have different meanings

Disambiguate by appending clarifier terms– Turkey (Bird), Turkey (Meat), or Turkey (Country)– Play (Dramatic work), Play (Imaginative activity), or Play (Sports

activity) Can also ask searcher to choose one

(“Did you mean… or …”) Homonym control increases Precision Now you have a “Controlled Vocabulary” But you’re still not done yet…

Page 31: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Hierarchies

Grouping related categories together, e.g.– Restaurant menus – Yellow Pages – File folders in hanging files in filing cabinets– Command menus in software applications– “Browse” trees on e-commerce websites

Especially handy for browsing if you’re not sure how to describe (or spell) what you want

Page 32: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Hierarchies

Broader/narrower (parent/child) concepts Instrumental music > Piano sonatas Canada > British Columbia > Vancouver > Burnaby > Capitol Hill

– Broader term should retrieve all its child terms Birds = Sparrows + Penguins + Ostriches etc.

– Or, if too many results, narrow search by selecting one or more child terms

– Child term may have multiple parents (polyhierarchy)– In a book index, these are “main entries” and “subentries”

Controlled Vocabulary + Hierarchy = Taxonomy

Page 33: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Hierarchies

Related concepts (“cousins”) – Not (usually) broader/narrower– In a book index, these are “See also” s

Parenting See also Child development School supplies See also Office supplies

– Searching one keyword should suggest the other keyword but not automatically retrieve its results

Taxonomy + Related concepts = Thesaurus

Page 34: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Hierarchies

Scope note– Tells taggers and searchers where this concept stops and

other concepts begin– Often differentiates related concepts

Medieval. Use for European history of the 5th through 15th centuries. For earlier periods, consider Classical antiquity. For later periods, consider Renaissance.

Office supplies. Use only for business contexts. For educational or home contexts, use School supplies.

– Thesaurus + Scope notes + further rules for how to apply your keywords = Ontology

Page 35: Content Findability in a Portable Content World

April 12, 2023 35

Act III: Too much of a good thing?

[email protected]

Page 36: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The Perils of Polylingual Hierarchy, Or, Will It Play in Paris?

“Grandchild” terms may not always fit the “grandparent” categories

Hierarchies are different in other languages Two real-life examples

– Part I: “Not the Comfy Chair!”– Part II: The Event of Dessert

Page 37: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

“Not the Comfy Chair!”

Furniture – Chairs

Dining chairs Armchairs

But what about Dentist chairs? Electric chairs? Thrones? They’re chairs but not domestic Furniture

Page 38: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

“Not the Comfy Chair!”

In European languages, there is no general “Chair”– Only approximately “Comfy chair” and “Uncomfy chair”– “Comfy” may have arms, upholstery, be in living room– “Uncomfy” may be armless, unupholstered, in kitchen– But not always…

So is this chair comfy?

Page 39: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The Event of Dessert

Wedding cake is not really a Dessert Brownies may be Snacks Ice cream cones are definitely Snacks

Civilised countries that have cake at Tea time prefer cheese or fruit for dessert

Page 40: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The Event of Dessert

– Foods Dessert

– Cake Wedding cake?

– Ice cream Ice cream cones?

– Brownies?– Cheese??

Quark???

Page 41: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The Event of Dessert

Quark is a soft cheese

In Germany they eat Quark with Fruit for Dessert…

…but they also eat Quark on Toast for Breakfast.

Page 42: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The Event of Dessert

Dessert moved from a Food to an Event

– Marked by tableware (plate, fork) The sweet Dessert foods moved to

Foods > Sweets Here we have Brownie as Snack… …and here we have Brownie as Dessert Dessert foods without tableware are Snacks

– Snacks and Breakfast are also Events

Page 43: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

How deep should I get?

How far down the hierarchy should go on making narrower terms?

In the InfoSci World, this is called Specificity Keywords should be specific enough to

– Make useful distinctions between items– Accurately describe each item– Not split hairs and make useless distinctions

Specificity increases Precision

Page 44: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

How deep should I get?

The level of specificity you need depends on– Subject domain (type of information)

More technical domains -> more specificity

– Collection size (size matters again!) As your collection grows -> more specific keywords Otherwise, too many items share a particular keyword

– Precision plummets Often a problem with folksonomies, which tend to use a

lot of broad keywords

Page 45: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

The more the merrier?

If a few keywords are good for findability, then a lot of keywords must be better, right?

In the InfoSci world, this is called Exhaustivity:– How many different keywords each item gets

Tip: when you’re stretching for keywords that aren’t much “about” this item, it’s time to stop plastering keywords

– Otherwise, your Precision tanks

Again, often a problem with folksonomies

Page 46: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Will my keywords play nicely with others?

Good “aboutness” keywords from a consistent taxonomy can integrate your content very well

This is the biggest reason to invest in a taxonomy You can use your “aboutness” tags to automatically:

– Retrieve relevant images to accompany text– Suppress inappropriate images– Suggest appropriate products to accompany an article– Offer highly-relevant related topics– Sensitively combine professional- with user-generated

content– And that’s not all…

Page 47: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Will my keywords play nicely with others?

You can use your “aboutness” tags to automatically:– Generate your website’s browse menus– Categorize search results into narrower categories– Increase users’ “personalization” experience, by offering

them tags to Identify their personal attributes in their website profiles Subscribe to feeds of new content that will interest them Suggest topics most relevant to them

– Make your “tag cloud” retrieve more relevant results– Improve findability!

Page 48: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What happens when worlds collide?

In the Portable Content world, collections of different content often merge

Acquiring new collections Selling your collection to others Continuously incorporating new content from suppliers

or users

Page 49: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What happens when worlds collide?

Keywords from different content collections often don’t merge well, because the collections differ in:

– Uncontrolled vs. controlled vocabularies Preferred term to use Synonym control Word variation control

– Hierarchy construction Decisions about what belongs with what

Page 50: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What happens when worlds collide?

Keywords from different content collections often don’t merge well, because the collections differ in:

– Specificity levels Collection size Subject area domain

– Exhaustivity Tagging quality standards

And when your collection grows, your own keywords will need to get more specific

Page 51: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What’s a content producer to do?

Accept others’ keywords in special metadata tags – Review these and “map” them to your keywords

Understand that your keywords will need to change

Publish your keywords to your suppliers Expose your keywords to your users as

suggestions for their own tagging

Page 52: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

What’s a content producer to do?

Standardize on a publicly-available thesaurus, e.g.– Library of Congress subject headings– National Library of Medicine’s subject headings

(MeSH)– Getty Art & Architecture Thesaurus – If you can find one that matches your content and

meets your users’ search needs Your competitors won’t sell you theirs

Page 53: Content Findability in a Portable Content World

April 12, 2023 53

Act IV: Can’t machines do this?

[email protected]

Page 54: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

If I can search the full text, why bother with keywords?

Full-text search generally can’t cope with– Synonyms– Homonyms– Trivial occurrences (low “aboutness”)– Inferences (high “aboutness” not explicit in the text), e.g.

Intended audience Prerequisite knowledge

– African rainforest animals (Which countries? Which animals?)– Non-textual content (images, audio, video, etc.)

Online Help indexes vs. full-text search– Indexes are selective

Like travel guide rather than phone book– In my usability tests, users got to the best answers faster via the index

Page 55: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Can computers automatically generate keywords and tag content?

Uses linguistic analysis mathematical rules to churn through text and try to

– Comprehend all the “aboutnesses” and– Categorize the content

A huge, complex and difficult field of Information Science

– Analysis rules are different for each subject area domain– There are no “magic” fits-all, out-of-the-box solutions

How do they do it? Here are a couple of methods

Page 56: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Term Frequency-Inverse Document Frequency (TF-IDF)

Term occurs frequently within a document = more “aboutness”, right? But if that term occurs in lots of your documents, it’s not a good

discriminator for finding the most relevant documents; e.g.– “Software” may be

rare on a travel website, but on nearly every page of a technical publisher’s website

TF-IDF is a statistical weighting method Determines a term’s relative importance within a document and within

the collection of documents– If term is frequent in doc AND rare in collection, then– High TF-IDF (high “aboutness” and good discriminator)

For each term, calculates numerical values (vector space) Analyses compare these values to other documents’ values to retrieve

similar documents

Page 57: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Term Frequency-Inverse Document Frequency (TF-IDF)

– Some drawbacks of TF-IDF Needs lots of sophisticated programming to be smart about

your particular content collection and subject area domain– Dictionaries of stop words, word variants, phrases, synonyms,

etc. Can miss a low-occurring term that is crucial to this document,

e.g.– Article on baby-proofing your home– Mentions “danger,” “injuries,” and “stairs” only once, but they are

the main point of the article– Does not mention “safety” or “falling” – important concepts to

capture– Does not mention “toddlers” or “parenting” – the implied subject

and audience of the article– A human indexer would immediately catch these

Page 58: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Inductive Learning Algorithms

Humans teach computer what “aboutness” looks like For each term, professional experts hand-tag a set of

“good example” docs Use mathematical linguistic analysis to compare new

docs to “good example” docs for similarity and categorization

More expensive than straight TF-IDF, but Better results Good at capturing concrete concepts (“auto repair”) Poor at capturing implied concepts (intended audience)

Page 59: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Does clickstream = “aboutness”?

Are pages “about” the same thing if users:– Click through them in the same sequence?

“Customers who viewed this also viewed…”– Buy the same items on them?

“Customers who bought this also bought…”– Write reviews of them?– Add them to a “recommended” list?– Tag them with the same tag?

Let’s look at Amazon again…– How relevant are each of this item’s “related” items?– How relevant are the professional and user tags?

Page 60: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Does clickstream = “aboutness”?

Info that users consciously connect is more likely to be related than passive clickstream trails

– The more effort they put into categorization, the better the categorization is likely to be

But “Related to” ≠ “about” the same subject Connections or tags very useful to one person ≠ tags

useful to everyone; e.g. users’ personal tags like– “Me” or “Home” or “Cynthia” (recipient of this gift)

Page 61: Content Findability in a Portable Content World

April 12, 2023 61

Act V: What’s findability worth to you?

[email protected]

Page 62: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Human effort is expensive

Effort “up front”: expense to producer– Creating and maintaining thesaurus– Tagging by trained staff– And/or designing (and redesigning) smart

automatic categorization and retrieval systems– Better findability = happier customers

Page 63: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Human effort is expensive

How much money are you willing to put into your content’s findability?– Not all content is equally worth the money

Professionally- vs. user-generated content “Personalized” content vs. general content New content vs. old “Push” content you most want to sell

– Be careful of “pushing” irrelevant content and– Losing your customers’ confidence

Page 64: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Human effort is expensive

Effort “at the end”: expense to customers– Users do the tagging: e.g. folksonomies– Users slog through junk (low Precision) – Only works in contexts where users are motivated

enough to volunteer their time and effort

Find your balance

Page 65: Content Findability in a Portable Content World

April 12, 2023 © 2008 Lise Kreps [email protected]

Epilogue: Q & A

Questions? Comments? Further questions or comments?

Want a copy of this presentation?Email [email protected]

Thank you!