Quality Taxonomies

35
Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th , 2001

description

Quality Taxonomies. Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th , 2001. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money” - PowerPoint PPT Presentation

Transcript of Quality Taxonomies

Page 1: Quality Taxonomies

Quality TaxonomiesQuality Taxonomies

Jim NisbetSenior Vice President of Technology

Semio Corporation

Knowledge Technologies 2001March 5th, 2001

Page 2: Quality Taxonomies

Ontology / Taxonomy

Root Ontology

Taxonomy Generation

Static Discovery

Dynamic Discovery

Page 3: Quality Taxonomies

What is Quality ?

“Best value for the money” According to this definition, you are entitled to

get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.

Page 4: Quality Taxonomies

What is Quality ?

“Good Quality is Nominal Conformance”

Taxonomy Quality is defined as Taxonomy Conformance to: • Valid requirements;• Explicitly documented development standards; and, • Implicit characteristics that are expected of all

professionally developed taxonomies, such as the desire for good maintainability.

Page 5: Quality Taxonomies

Standards

ISO 2788-1986• International Organization for Standardization. Documentation—Guidelines for the Establishment and

Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)

ISO 5964-1985 • International Organization for Standardization. Documentation—Guidelines for the Establishment and

Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)

ANSI/NISO Z39.19-1993• National Information Standards Institute. Guidelines for the Construction, Format, and Management of

Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)

SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF

• Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML

Page 6: Quality Taxonomies

Project Plan

1. Kick-off2. Requirements Review3. Lexicon Review4. Taxonomy Review5. Tags Review6. Final Review

Page 7: Quality Taxonomies

1. Kick-off Objectives

• Purpose• Scope• Scale• Users• Conditions of receipt

Roles• Supplier• Customer

– Admin– KE– Experts– Users

Planning Training and Transfer

Page 8: Quality Taxonomies

2. Requirements Review

Sources Lexicon Ontology Install

Page 9: Quality Taxonomies

Sources

Dispersion (Multiplicity, Size, Homogeneity) Refresh AccessFeatures Internet,

News,E-Mail

Reports,Patents

E-Trade,Logs

Informative content - + +Number of topics covered + + -Structured information - + +Size of records - + -Number of records + - +

Page 10: Quality Taxonomies

Typical Patterns

Disparity Adjust sources Adjust crawl strategy Isolate communities / taxonomies

Page 11: Quality Taxonomies

Lexicon

Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords

Page 12: Quality Taxonomies

Typical Patterns

Lack of requirements Use Librarian Resources

Page 13: Quality Taxonomies

Ontology

Thesaurus ? Is the information domain analysis

complete, consistent, and accurate ? Is the partitioning of the problem

complete ?

Page 14: Quality Taxonomies

Typical Patterns

Directory versus Taxonomy Isolate “directory” branches

Thesaurus versus Taxonomy Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with

extracted lexicon

Very high level design for top categories requirements Plan to work bottom-up

See also Taxonomy (functions, combinations, etc.)

Page 15: Quality Taxonomies

Install

Implementation / Integration:• Are external and internal interfaces properly

defined? • Are all requirements traceable to the system level? • Has prototyping been conducted for the

user/customer? • Is performance achievable within the constraints

imposed by other system elements? • Are requirements consistent with schedule,

resources, and budget?

Page 16: Quality Taxonomies

Typical Patterns

Scale Security Missing Documents

Page 17: Quality Taxonomies

3. Lexicon Review

Coverage• Extracted words / Words• (Extracted Index / Index)

Sources bench-marking• Coverage• Extraction quality• Topic distribution

Structure• Most Frequent Phrases• Most Productive Generics

Substitutions Exceptions

Page 18: Quality Taxonomies

Typical Patterns

Low level of frequency / quality for the most meaningful content Increase size of value corpus Filter and re-import lexicon

Page 19: Quality Taxonomies

4. Taxonomy Review Taxonomy Operation

• Correctness• Reliability• Usability• Integrity• Efficiency

Taxonomy Revision• Maintainability• Flexibility• Testability

Taxonomy Transition• Portability• Reusability• Interoperability

Page 20: Quality Taxonomies

UB

i j

lf lflf1 2

g g gn 1 2 i

n3 4 mg g g g g g s s s s s s25 6 1 3 4

s s s s s s5 6 7 8 m n

v v v v1 2 m n

Level 0

Level 1

Level 2

Level 3

Level 4

UB = unique beginner lf = life-form g = generic s = specific v = varietal

Tax

Liability

Loan

Term loan

Short-term loan

Unique Beginner

Life Form

Generic

Specific

Varietal

Folk Taxonomies Design

The Berlin and Kay model: Taxonomy = Nomenclature + Terminology

Page 21: Quality Taxonomies

Correctness

Accuracy Completeness Consistency

Page 22: Quality Taxonomies

Accuracy

Precision Recall

Page 23: Quality Taxonomies

Completeness

Taxonomy Maps Lexicon Collection

Page 24: Quality Taxonomies

Concentration Works Against Quality

Lexicon

Document Collection

Maps

Taxonomy

Tagging

Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage

Page 25: Quality Taxonomies

Consistency:Typical Patterns

Objectivization Hyperonymy Speciation Necessity

Page 26: Quality Taxonomies

Objectivization

EmploymentFiringHiring

Salaries

Avoid functional categories

Don’t mix functions / objects

Exhaust scripts Match idiomatic phrases

Page 27: Quality Taxonomies

Genericity

PartsAir ConditioningBelts and HosesBodyBrake SystemChassisEngineExhaust SystemFuel SystemGlassIgnition

Avoid meronymy Don’t mix

meronymy / hyperonymy

Exhaust prototypes

Page 28: Quality Taxonomies

Speciation

Person Unwelcome person

Unpleasant personSelfish person

OpportunistBackscratcher

Avoid “strings” of categories Avoid (non-idioms) properties for categories

(WordNet)

Page 29: Quality Taxonomies

Necessity

Tax

Individuals Corporations

Assets Liability Assets Liability

BC

D

E

FG

H

I

K

Tax

Individuals Corporations

Assets Liability

Individuals Corporations

Avoid non-productive categories

Avoid combinations of categories

Page 30: Quality Taxonomies

Nomenclature (Design Structure) Quality Index

Depth Width Balance

UB

i j

lf lflf1 2

g g gn 1 2 i

n3 4 mg g g g g g s s s s s s25 6 1 3 4

s s s s s s5 6 7 8 m n

v v v v1 2 m n

Level 0

Level 1

Level 2

Level 3

Level 4

UB = unique beginner lf = life-form g = generic s = specific v = varietal

Page 31: Quality Taxonomies

Complexity Index

Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.

Taxonomy Complexity Index combines:• autonomy• closure• similarity• typicality• commonality• redundancy• stability

Page 32: Quality Taxonomies

Maturity index

The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy .

Maturity Index combines:• number of modules in current ontology / taxonomy.• number of modules in current ontology / taxonomy that have

been changed.• number of modules added to current ontology / taxonomy. • number of modules deleted from the previous version of the

ontology / taxonomy.

Page 33: Quality Taxonomies

5. Tags Review

Document coverage Concepts coverage

<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>

Page 34: Quality Taxonomies

6. Final Review

Receipt Maintenance

Page 35: Quality Taxonomies

Quality TaxonomiesQuality Taxonomies

Jim [email protected]

Knowledge Technologies 2001