INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor...
-
Upload
roberta-gibbs -
Category
Documents
-
view
217 -
download
0
Transcript of INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor...
![Page 1: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/1.jpg)
INFO624 -- Week 8Subject Indexing & Knowledge
Representation
Dr. Xia LinDr. Xia LinAssistant ProfessorAssistant Professor
College of Information Science and TechnologyCollege of Information Science and Technology
Drexel UniversityDrexel University
![Page 2: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/2.jpg)
Effective Information Retrieval
Data StructuresData Structures Knowledge RepresentationKnowledge Representation
From Document representation to From Document representation to Knowledge representationKnowledge representation
User Interface and User InteractionUser Interface and User Interaction
![Page 3: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/3.jpg)
Document Representation Vocabulary Vocabulary Semantics Semantics Implementation Implementation
![Page 4: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/4.jpg)
Vocabulary Controlled Vocabulary Controlled Vocabulary
A list of terms selected for index purpose. A list of terms selected for index purpose. The terms are processed to reduce The terms are processed to reduce
inconsistence and ambiguity. inconsistence and ambiguity. Established selection rules and indexing Established selection rules and indexing
rulesrules Uncontrolled vocabularyUncontrolled vocabulary
Subject keywordsSubject keywords Metadata Metadata
![Page 5: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/5.jpg)
Example: ACM record
![Page 6: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/6.jpg)
Meta Data Data about dataData about data Descriptive DataDescriptive Data
External to the meaning of the documentExternal to the meaning of the documentDublin Core Metadata Element SetDublin Core Metadata Element SetAuthor, title, publisher, etc.Author, title, publisher, etc.
Semantic MetadataSemantic Metadata Subject keywordsSubject keywords
Challenge: automatic generation of Challenge: automatic generation of metadata for documentsmetadata for documents
![Page 7: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/7.jpg)
Semantics Semantics is the study of meaningSemantics is the study of meaning
Relational semantics Relational semantics Synonymy, hierarchical, etc. Synonymy, hierarchical, etc.
Referential semanticsReferential semanticsHomonyms, techniques used to limited the Homonyms, techniques used to limited the
meanings or referents of termsmeanings or referents of terms Category semanticsCategory semantics
Facets or other participationsFacets or other participations
![Page 8: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/8.jpg)
Example:
Mercury?Mercury? Mercury (car)Mercury (car) Mercury (planet)Mercury (planet) Mercury (metal)Mercury (metal) Mercury (Greek god)Mercury (Greek god)
![Page 9: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/9.jpg)
Implementation Standards Standards
AACR2AACR2 ISO Standard for Indexing (ISO 5963)ISO Standard for Indexing (ISO 5963) ISO Standard for Thesaurus Construction ISO Standard for Thesaurus Construction
(ISO 2788)(ISO 2788) RulesRules
Classification rulesClassification rules Evaluation rules Evaluation rules
![Page 10: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/10.jpg)
Subject Indexing
A human analytic process for identifying, A human analytic process for identifying, selecting, and representing document conceptsselecting, and representing document concepts Create indexing languagesCreate indexing languages
Using standardized, limited vocabularies for Using standardized, limited vocabularies for index purposes.index purposes.
Assign indexing terms to documentsAssign indexing terms to documentsUsing only the terms in the index language Using only the terms in the index language
selected.selected.
![Page 11: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/11.jpg)
Basic Processes of Subject Indexing Identifying concepts which represent the subject Identifying concepts which represent the subject
and purpose of a document. and purpose of a document. Deciding which of these concepts are important Deciding which of these concepts are important
for retrieval of this document for retrieval of this document Expressing concepts needed for retrieval in the Expressing concepts needed for retrieval in the
indexing languages used indexing languages used Using uncontrolled vocabulary for concepts not Using uncontrolled vocabulary for concepts not
represented or represented insufficiently represented or represented insufficiently specifically in the indexing languages. specifically in the indexing languages.
![Page 12: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/12.jpg)
Controlled Vocabulary Goals:Goals:
To permit easy locations of documents by To permit easy locations of documents by topic.topic.
To define topic areas, and hence relate one To define topic areas, and hence relate one document to another.document to another.
to provide multiple access pointers to to provide multiple access pointers to documentsdocuments
to enforce a uniformity throughout an to enforce a uniformity throughout an information retrieval system information retrieval system
![Page 13: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/13.jpg)
Controlled Vocabulary Formats:Formats:
Hierarchical Classified list Hierarchical Classified list hierarchical subject descriptorshierarchical subject descriptorsassociative cross referencesassociative cross referencesclassification notation (codes)classification notation (codes)
Alphabetical listAlphabetical listinclude both descriptors and other include both descriptors and other
lead-in termslead-in terms
![Page 14: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/14.jpg)
Main Componentsin a Controlled Vocabulary
Keyword/ Descriptor
Synonymous Term
Broader Term
Narrower Term
Related Term
![Page 15: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/15.jpg)
Example
CancerMalignancy Malignant tumor Cancer morphology
Diseases
Neoplasms
Malignant neoplasm of skins Breast Cancer Primary malignant neoplasm of liver
Abdominal Neoplasms Hyperplasia Seminoma
Broader Terms
Related Terms
Narrower Terms
Synonyms
![Page 16: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/16.jpg)
Example: MeSH – Medical Subject Headings MeSH – Medical Subject Headings
22,568 descriptors 22,568 descriptors 139,000 headings (Supplementary Concept 139,000 headings (Supplementary Concept
Records)Records) thousands of cross-references thousands of cross-references
i.e., Vitamin C see Ascorbic Acid. i.e., Vitamin C see Ascorbic Acid. Used t indexing MEDLINEUsed t indexing MEDLINE
MeSHMeSH Browser Browser
![Page 17: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/17.jpg)
MeSH Tree Structures - 20041.1. Anatomy [A] Anatomy [A] 2.2. Organisms [B] Organisms [B] 3.3. Diseases [C] Diseases [C] 4.4. Chemicals and Drugs [D] Chemicals and Drugs [D] 5.5. Analytical, Diagnostic and Therapeutic Techniques and Analytical, Diagnostic and Therapeutic Techniques and
Equipment [E] Equipment [E] 6.6. Psychiatry and Psychology [F] Psychiatry and Psychology [F] 7.7. Biological Sciences [G] Biological Sciences [G] 8.8. Physical Sciences [H] Physical Sciences [H] 9.9. Anthropology, Education, Sociology and Social Phenomena [I] Anthropology, Education, Sociology and Social Phenomena [I] 10.10. Technology and Food and Beverages [J] Technology and Food and Beverages [J] 11.11. Humanities [K] Humanities [K] 12.12. Information Science [L] Information Science [L] 13.13. Persons [M] Persons [M] 14.14. Health Care [N] Health Care [N] 15.15. Geographic Locations [Z] Geographic Locations [Z]
![Page 18: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/18.jpg)
ERIC Thesaurus
more than 10,000 terms or subject headings more than 10,000 terms or subject headings used in indexing and searching ERIC records. used in indexing and searching ERIC records.
A supplemental list of over 55,000 terms or A supplemental list of over 55,000 terms or subject headings includingsubject headings including proper names (e.g., geographic, personal, proper names (e.g., geographic, personal,
institutional, project, equipment, test, etc., institutional, project, equipment, test, etc., names) or names) or
concepts not yet represented by the concepts not yet represented by the controlled vocabulary of the ERIC controlled vocabulary of the ERIC Thesaurus. Thesaurus.
![Page 19: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/19.jpg)
Controlled Vocabulary Examples:Examples:
Case studies: Case studies: DescriptorDescriptorSN: Details analyses, usually focusing on a SN: Details analyses, usually focusing on a
particular problem of an individual, group, particular problem of an individual, group, or organization (note: do not confuse with or organization (note: do not confuse with “medical case histories”“medical case histories”
NT: NT: Cross sectional studiesCross sectional studies
Longitudinal studiesLongitudinal studies
![Page 20: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/20.jpg)
Examples (Case Studies)
BTBT
Evaluation methodsEvaluation methods
ResearchResearch RTRT
Case recordsCase records
CounselingCounseling
Qualitative researchQualitative research
![Page 21: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/21.jpg)
Advantages of Subject Indexing Facilitates concept searchFacilitates concept search
search by topics/subjects, not just by wordssearch by topics/subjects, not just by words link related documents by subject termslink related documents by subject terms Make implicit information explicitMake implicit information explicit
Provides a standard terminology to index and Provides a standard terminology to index and search documents.search documents. Use small indexing vocabularyUse small indexing vocabulary Help the searcher find related termsHelp the searcher find related terms
![Page 22: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/22.jpg)
Disadvantages of Subject Indexing Expensive manual operationsExpensive manual operations
To construct the controlled vocabularyTo construct the controlled vocabulary To assign terms to documentsTo assign terms to documents
Difficult to keep up to dateDifficult to keep up to date Terminology changes very fastTerminology changes very fast New terms are added daily.New terms are added daily.
Inconsistent process of human indexingInconsistent process of human indexing Same documents are assigned different indexing Same documents are assigned different indexing
terms by different indexersterms by different indexers The user may not use the same terms to find The user may not use the same terms to find
documents as the indexer would use to index the documents as the indexer would use to index the documents.documents.
![Page 23: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/23.jpg)
Document Representation Inverted IndexingInverted Indexing
Represent a document as a list of terms Represent a document as a list of terms occurred in the documentoccurred in the document
computer-based indexingcomputer-based indexing statistical-based indexingstatistical-based indexing
Subject IndexingSubject Indexing Represent a document as a list of subject Represent a document as a list of subject
terms occurred in a controlled vocabulary. terms occurred in a controlled vocabulary.
![Page 24: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/24.jpg)
Considerations of Document Representation Any format of document representation Any format of document representation
needs to maintain a balance of itsneeds to maintain a balance of its Discriminating powerDiscriminating power DescriptivenessDescriptiveness Similarity identificationSimilarity identification Conciseness Conciseness
![Page 25: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/25.jpg)
Considerations of DR
Discriminating powerDiscriminating power to identify a document uniquelyto identify a document uniquely to reduce ambiguity to reduce ambiguity
Examples:Examples:
• ISBN number for bookISBN number for book
• bar codes for productsbar codes for products
![Page 26: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/26.jpg)
Considerations of DR
DescriptivenessDescriptiveness describe all the information as complete as describe all the information as complete as
possible possible fulltextfulltextabstractsabstractsextractsextractsreviewsreviews
Completeness and correctnessCompleteness and correctness
![Page 27: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/27.jpg)
Considerations of DR Similarity IdentificationSimilarity Identification
to group similar documents to group similar documents keywords or subject indexingkeywords or subject indexingbook classification numbersbook classification numbers
Difficulty for the computer to assign Difficulty for the computer to assign keywords, subject descriptors, or keywords, subject descriptors, or classification numbers to documentsclassification numbers to documents
![Page 28: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/28.jpg)
Considerations of DR
ConcisenessConciseness simple and clearsimple and clear reduce process time and storage spacereduce process time and storage space Examples:Examples:
authors and titlesauthors and titles
![Page 29: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/29.jpg)
Relationships of four considerations
Higher discrimination power may lower the Higher discrimination power may lower the capability of identifying similarities among capability of identifying similarities among documents.documents.
Good descriptiveness may defeat the concisenessGood descriptiveness may defeat the conciseness What’s good for the computer may not always be What’s good for the computer may not always be
good for the user.good for the user. A good representation should seek a balance of the A good representation should seek a balance of the
four, and take consideration of both the computer four, and take consideration of both the computer and the user.and the user.
![Page 30: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/30.jpg)
What’s missing in DR? Intelligent Reasoning!Intelligent Reasoning! Knowledge-baseKnowledge-base
OntologyOntology Semantic NetworksSemantic Networks
Uncertainty(impreciseness)-handlingUncertainty(impreciseness)-handling
![Page 31: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/31.jpg)
Knowledge Representation
encoding human knowledge - in all its encoding human knowledge - in all its various forms - in such a way that the various forms - in such a way that the knowledge can be used. knowledge can be used. A successful representation of some A successful representation of some
knowledge must be in a form that is knowledge must be in a form that is understandable by humans, and must understandable by humans, and must cause the system using the knowledge cause the system using the knowledge to behave as if it knows it. to behave as if it knows it.
![Page 32: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/32.jpg)
Knowledge Representation A knowledge representation (KR) is most A knowledge representation (KR) is most
fundamentally a surrogate, a substitute for the fundamentally a surrogate, a substitute for the thing itself.thing itself.
It is a set of ontological commitments, i.e., an It is a set of ontological commitments, i.e., an answer to the question: In what terms should I answer to the question: In what terms should I think about the world? think about the world?
It is a fragmentary theory of intelligent reasoning, It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the expressed in terms of three components: (i) the representation's fundamental conception of representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of representation sanctions; and (iii) the set of inferences it recommends. inferences it recommends.
![Page 33: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/33.jpg)
Knowledge Representation It is a medium for pragmatically efficient It is a medium for pragmatically efficient
computation, i.e., the computational environment computation, i.e., the computational environment in which thinking is accomplished. One in which thinking is accomplished. One contribution to this pragmatic efficiency is contribution to this pragmatic efficiency is supplied by the guidance a representation supplied by the guidance a representation provides for organizing information so as to provides for organizing information so as to facilitate making the recommended inferences. facilitate making the recommended inferences.
It is a medium of human expression, i.e., a It is a medium of human expression, i.e., a language in which we say things about the world. language in which we say things about the world. From From
http://medg.lcs.mit.edu/ftp/psz/k-rep.htmlhttp://medg.lcs.mit.edu/ftp/psz/k-rep.html
![Page 34: INFO624 -- Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f275503460f94c3fcda/html5/thumbnails/34.jpg)
Intelligent Information Retrieval
Information retrieval supported by Information retrieval supported by knowledge representation, rather than knowledge representation, rather than document representation.document representation.
Useful linksUseful links Stanford Stanford Agent-based IRAgent-based IR