Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

65
Accessing Tacit Knowledge and Linking it to the Peer- Reviewed Literature Michael Shepherd Web Information Filtering Lab Faculty of Computer Science Dalhousie University

description

Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature. Michael Shepherd Web Information Filtering Lab Faculty of Computer Science Dalhousie University. Research Team. Students Qiufen Qiu (MD and MCS) Zhixin Chen (MHI and BSc) Computer Science Faculty Michael Shepherd - PowerPoint PPT Presentation

Transcript of Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Page 1: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Accessing Tacit Knowledge and Linking it to the Peer-

Reviewed Literature

Michael ShepherdWeb Information Filtering LabFaculty of Computer Science

Dalhousie University

Page 2: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Research Team

• Students– Qiufen Qiu (MD and MCS)– Zhixin Chen (MHI and BSc)

• Computer Science Faculty– Michael Shepherd– Qigang Gao– Syed Sibte Raza Abidi

• Anaesthesia & Psychology– G. Allen Finley

Page 3: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Overview

• Introduction

• Research Program

• Results to Date

• Summary

Page 4: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Pediatric Pain Discussion List

• Clinical discussion on pediatric pain

• Informal email-based discussion among professionals

• Initiated in 1993

• Over 700 subscribers world-wide

• More than 10,000 messages

Page 5: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Date: Wed, 04 Jan 1995 16:54:48 -0500 (EST)From: posterSubject: opioids and meningitis

X is a 13 month (9.8kg) old boy suffering from acute meningitis (pneumocoque) treated with IV cefotaxime; at day three, I have been called as pediatric pain consultant to assess X; I have discovered an extreme painfull state: one could not handle or touch him without producing screaming. The child was unable to move spontaneously he looked paralysed by pain and hypertonia ; he also presented a neurological complication : ptosis at the right side.The pain treatment was IV acetaminophen. The first day I have prescribed IV Nalbuphine (weak opioid u antagonist and agonist) 11mg/24h after a loading dose of 1.4 mg; Pain at rest has been succesfully relieved but not the mobilisation pain; the dose has been increased at 14 mg/day wihout relieving the pain associated with moving; he has moved spontaneously limbs 2 days later; nalbuphine has been stopped 4 days later. Neurological examination and CT scan have been still normal (except ptosis) during this period. No opioid's side effects have been observed.

What do you think of this case ?Have you any experience with opioids and acute meningitis ?

Dr Poster, Pediatric pain unit, Poster Hospital

Page 6: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Date: Wed, 04 Jan 1995 17:27:25 -0500 (EST)From: first replySubject: re: opioids and meningitis

Is there any periosteal involvement? If so an NSAID (ibuprofen or naproxen) may be much more effective than even opioid.

-------------------------

Page 7: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Date: Wed, 04 Jan 1995 19:06:32 -0400From: second replySubject: Re: opioids and meningitis

Poster writes:> X is a 13 month (9.8kg) old boy suffering from acute meningitis...> extreme painfull state: one could not handle or touch him without> producing screaming....> The first day I have prescribed IV Nalbuphine ...> succesfully relieved but not the mobilisation pain;...> has moved spontaneously limbs 2 days later; nalbuphine has been stopped 4> days later. Neurological examination and CT scan have been still normal...

I have used IV morphine for similar severe meningitis pain, with success. I wouldn't hesitate to use a pure opioid agonist (in conjunction with acetaminophen, NSAID, and/or tricyclics). However, it sounds like you have the situation under control.

Second Reply, Associate Professor, Dept and University

-------------------------

Page 8: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Date: Thu, 05 Jan 1995 18:58:32 -0800 (PST)From: Third ReplySubject: Re: opioids and meningitis

I wonder if the problem is not due to severe arachnoiditis that is secondary to the inflammation. I would suggest a trial of steroids in this patient, perhaps in combination with a benzodiazepine to reduce the spasm. Narcotics may reduce the pain but I would not like to keep X on them for too long.

Good luck

Third Reply

-------------------------  

Page 9: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Tacit and Explicit Knowledge

• Tacit knowledge is what the knower knows and is derived from experience

• Explicit knowledge is represented by some artifact such as a document or journal article

Page 10: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Tacit Knowledge

Explicit Knowledge

Internalization

Co

mb

inatio

n

Socialization

Externalization

Knowledge Transformation Processes

Page 11: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Knowledge Transformation Processes

Socialization– Tacit to tacit– Face to face meetings– Synchronous

Externalizaton– Tacit to explicit– Respond to question– List-server discussion– Asynchronous

Internalization– Explicit to tacit– Access to organized explicit knowledge– Structure helps user internalize knowledge

Combination– Explicit to explicit– Organization into categories– Reflects knowledge of domain

Page 12: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Research Questions

1. ExternalizationHow can we capture the tacit knowledge in such a discussion list and transform it into explicit knowledge?

2. CombinationHow can we organize this explicit knowledge?

3. InternalizationHow do we provide access to this explicit knowledge so that users can internalize this knowledge?

4. Linking Tacit Knowledge to Best EvidenceHow do we map this transformed tacit knowledge to the appropriate best evidence literature?

Page 13: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Mapping Tacit Knowledge to Explicit Knowledge in Medical Literature

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h T

erm

ino

log

y M

ap

Thread Clusters

Externalization Combination LinkingInternalization

Access

Page 14: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Data Cleaning• Remove duplicate messages (subject & time stamp)

• Remove responses that were generated automatically by “vacation” mail programs

• Remove other “junk” e-mails

• Removing unnecessary content of the messages themselves. This unnecessary content included non-textual material such as images that would not be used in the clustering process and included original messages that were more than ten lines long as these would skew the clustering process.

• The initial stage of this cleaning was done manually until patterns were recognized and then programs were written to clean the data based on these patterns.

Page 15: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Externalization: Creating Threads

• Messages were threaded based on time stamps and subject headings.

• Those messages that had a blank subject field were processed based on the included original messages to which they had replied.

Page 16: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Thread Representation

• Each thread is treated as though it were a contiguous document

• The original messages that are embedded in the reply messages are removed.

• Stop words are removed• If not on the stop list, they are matched against a

synonym dictionary manually created by a pediatric pain specialist.

• The remaining terms are stemmed• The stemmed terms are assigned tf.idf weights

Page 17: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Data Set

• An archived sample of 6939 messages from 1993-1999

• After cleaning 4033 messages

• After threading 1289 threads

• Each thread is represented by a vector of 4111 term weights

Page 18: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

term1 term2 term3 . . . term4111

thread1 w1,1 w1,2 . . . w1,4111

thread2 w2,1 w2,2 . . . w2,4111

.

.

.

thread1289 w1289,1 w1289,2 . . . w1289,4111

Thread-Term Matrix

Page 19: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Combination: Organizing the Threads

• Text clustering – unsupervised learning process – groups documents into clusters so that the documents

within a cluster have high similarity with one another, but are very dissimilar to the documents in the other clusters

• Text classification or categorization– supervised learning process– Assigns documents to pre-defined classes or

categories

Page 20: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

k-means clustering with k=2

12

3

45

6 7

Page 21: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

k-means clustering with k=2

12

34 56 7

Page 22: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

k-means clustering with k=2

12 3

4

5

6

7

Page 23: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 24: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 25: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Evaluation of Clustering

• Performed a study in which 100 randomly selected threads were presented to two experts for clustering and to our clustering algorithm

• Results of clustering between the experts measured

• Results of clustering between the experts and the system measured

Page 26: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Cluster ID

Label Number of Threads

1 adverse effects of a treatment or medication, monitoring requirements 6

2 advice on medications or treatment technique for a particular CONDITION 33

3 announcement of a publication or event 9

4 assessment methods for a particular condition 2

5 availability and benefits of a nonstandard drug compound 3

6 availability and validation of a particular assessment tool 11

7 contact information or other information about a specific person 3

8 dosage or adverse effects or technique for a medication or other treatment 9

9 information on a condition: description, etiology, prognosis 4

10 job description, posting of job or fellowship 3

11 miscellaneous 3

12 other newsgroups and listservs 2

13 policies, guidelines, protocols, algorithms, quality assurance, supervision, competency

12

Clusters and labels created by expert 1 – a psychologist

Page 27: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Clusters and labels created by expert 2 – a medical doctor

Cluster ID Label Number of Threads

1 Assessment 10

2 Musculoskeletal 2

3 sedation & procedures 4

4 oral drugs 5

5 miscellaneous /irrelevant/out-of-date 21

6 neuropathic pain 9

7 regional analgesia 10

8 postoperative pain 5

9 intravenous opioids 11

10 psychology 5

11 visceral pain, bowel function, etc 2

12 topical analgesia, EMLA 2

13 everyday pain 1

14 NMDA antagonists, ketamine 1

15 resources 8

16 administration 3

17 burns 1

Page 28: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Inter-Rater Reliability

The Redundancy(X, Y) is the proportion of uncertainty about X that is removed by knowing Y

In this instance, X and Y represent the two sets of clusters generated by the experts.

The measure is asymmetrical and the calculated redundancy measures are:

R(Expert-1, Expert-2) = 0.51R(Expert-2, Expert-1) = 0.44

Page 29: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Evaluation of the Automatically Generated Clustering

• Assume each manually created cluster is correct

• Compare the manually created cluster against an automatically created cluster

• Recall – the proportion of those items in the manually created cluster that appear together in the same automatically generated cluster

• Precision – the proportion of those items in an automatically created cluster that appear together in the same manually created cluster

• F–measure = 2PR / (P+R)

Page 30: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Hierarchy – k=2

C-1,1

C-2,1

C-3,1 C-3,2 C-3,3 C-3,4

C-4,1 C-4,2

C-2,2

E-1

E-2

E-n

Page 31: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

F-measure for a classification

The overall F-measure is used to reflect the quality of the whole hierarchy. The overall F-measure is the average weighted F-measure for all the clusters in a humanly generated clustering and is defined to be:

Overall F-measure = ∑ ( |T| * F(T)) / ∑ |T|

ST ST

TTFT ||/))(*|(|

ST ST

TTFT ||/))(*|(|

Page 32: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Evaluation of Clustering

• Each expert’s set of clusters was compared to the automatically generated hierarchical clustering. The hierarchy was generated ten times using different seed centroids for each run.

• The results of the paired-samples t tests (p=0.05) show that there was no significant difference between the two sets of manually generated clusters when used to evaluate the automatically generated clustering (k = 6).

E-1 E-2

k-means0.47 0.48

Page 33: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Evaluation of k-means Clustering

• We now have 3 different clusterings with inter-rater reliability of < .50

• k-means generated a large number of term representatives for each cluster with no elegant way of mapping the terms into MeSH.

• Therefore, the k-means clustering algorithm was replaced with a SOM in the expectation that the clustering results would be better and that a smaller set of term representatives for each cluster might be identified.

Page 34: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

SOM – Self Organizing Maps

• Invented by Teuvo Kohonen • Provide a way of representing

multidimensional data in much lower dimensional spaces - usually one or two dimensions.

• Create a network that stores information in such a way that any topological relationships within the training set are maintained

Page 35: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Example 2-D Lattice of Nodes

Page 36: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Red = 240 Green = 89 Blue = 48

R G B

240 89 48

37 202 219

Page 37: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Mapping 3 Dimensional Colour Vectors Into 2 Dimensions

Notice that in addition to clustering the colours into distinct regions, regions of similar properties are usually found adjacent to each other.

Page 38: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

SOM Neighbourhood Decreases

Page 39: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Mapping 3 Dimensional Colour Vectors Into 2 Dimensions

Notice that in addition to clustering the colours into distinct regions, regions of similar properties are usually found adjacent to each other.

Page 40: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

term1 term2 term3 . . . term4111

thread1 w1,1 w1,2 . . . w1,4111

thread2 w2,1 w2,2 . . . w2,4111

.

.

.

thread1289 w1289,1 w1289,2 . . . w1289,4111

Thread-Term Matrix

Page 41: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Principal Component Analysis for Feature Length Reduction

PCA Vectors

Eig

en V

alu

es

Page 42: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

SOM – Vector Length 150

Page 43: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Growing Hierarchical SOM

Page 44: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

SOM Results

Method Features Map SizeNumber

Of Clusters

Best F-Measure(SOM)

Average F-Measure(k-means)

Expert1

Expert2

Expert1

Expert2

SOM 150 8*6 48 0.2968 0.4043

0.47 0.48GHSOM 5005 layers

2*253 0.4235 0.4466

SOM-k 150 10*5 13 0.2783 0.3896

Page 45: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Problems

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h T

erm

ino

log

y M

ap

Thread Clusters

Externalization Combination LinkingInternalization

Page 46: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Mapping Tacit Knowledge to Explicit Knowledge in Medical Literature

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h a

nd

UM

LS

Threads

Externalization Combination LinkingInternalization

Access

Page 47: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Combination: Organizing the Threads

• Text clustering – unsupervised learning process – groups documents into clusters so that the documents

within a cluster have high similarity with one another, but are very dissimilar to the documents in the other clusters

• Text classification or categorization– supervised learning process– Assigns documents to pre-defined classes or

categories

Page 48: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

MetaMap Transfer (MMTx)• Discovers UMLS Metathesaurus concepts in text

• Text is parsed into components including sentences, paragraphs, phrases, lexical elements and tokens. Produces a shallow syntacitc analysis with part-of-speech tagging.

• Variants are generated from the resulting phrases. Includes acronyms, abbreviations and synonyms.

• Candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against the phrases.

• The best of the candidates are organized into a final mapping in such a way as to best cover the text.

Page 49: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Metathesaurus CandidatesThe word "discharge" returns

Semantic Group: Anatomy Discharge, Body Substance (C0012621) - [Body Substance] Discharge, Body Substance, Sample (C0600083) - [Body

Substance]Semantic Group: Procedures Patient Discharge (C0030685) - [Health Care Activity]

from the UMLS Knowledge Server

Page 50: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Metathesaurus Candidates"He is to be discharged home..."

Phrase: "discharged"Meta Candidates (3) 966 C0030685:Discharge <1> (Patient Discharge) [Health Care Activity] 966 C0600083:Discharge <3> (Discharge, Body Substance, Sample)

[Body Substance] 966 C0012621:Discharge, NOS (Discharge, Body Substance) [Body

Substance]

Phrase: "home"Meta Candidates (3) 1000 C0442517:Home [Manufactured Object] 928 C0237154:homeless <1> (Homelessness) [Finding] 928 C0019863:homeless <2> (Homeless persons) [Population Group]

MMTx Scores

MeSH Concept Number

MeSH Concept Term

UMLS Semantic Type

Page 51: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Using the MMTx Results

• The MMTx results were used in three different ways:– Organize the PPML threads according to the

UMLS Semantic Groups -134 semantic types in 15 semantic groups

– Organize the PPML threads according to the MeSH Hierarchy – 15 MeSH trees

– Select terms that can be used as queries to PubMed

Page 52: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Organization by UMLS Semantic Group

Semantic Groups Semantic Types Terms Retained

Activities & Behaviors ACTI Activity, Behavior, Event, Machine Activity … NO

Anatomy ANAT Anatomical structure, Body location … YES

Chemicals & Drugs CHEM Amino Acid, Antibiotic, Chemical … YES

Concepts & Ideas CONC Classification, Concept Entity … NO

Devices DEVI Medical Device, Research Device … NO

Disorders DISO Acquired Abnormality, Disease … YES

Genes & Molecular Sequence GENEAmino Acid Sequence, Gene or Genome,

Molecular Sequence …NO

Geographic Areas GEOG Geographic Area NO

Living Beings LIVB Age group, Alga, Animal … NO (except age group)

Objects OBJC Entity, Food, Manufactured Object … NO

Occupations OCCU Biomedical Occupation … NO

Organization ORGA Organization, Professional Society … NO

Phenomena PHEN Biologic Function, Test Result … NO

Physiology PHYS Cell Function, Clinical Attribute … NO

Procedures PROC Diagnostic procedure … NO

Page 53: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 54: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Organization by MeSH Tree

• There are 15 MeSH trees

• It was determined to keep only two trees:– The C tree (Diseases) as the PPML largely

deals with disorders and diseases– The D tree (Chemicals and Drugs) as the

PPML contains discussions on drugs hence it was deemed important to retain drug-related terminology.

Page 55: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 56: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Filtering to Generate PubMed Queries

• Filtering approach operates at the semantic/conceptual level as opposed to the term level

• UMLS semantic types associated with each MeSH term are used as the basis for term filtering

• Working at the semantic level we can – Establish a medical context for the thread which can assist in

subsequent search for corresponding literature; – Characterize the entirety of medical terms into a small number of

medical concepts – Design filtering rules that apply to broad semantic types as opposed to

focused individual terms

Page 57: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Filtering UMLS Concepts Associated with MeSH Terms Found in Subject Line

If mapping score = 1000 then retain the MeSH term.

If semantic type = Age group (T100) then retain the MeSH term.

If semantic group = CHEM | DISO | ANAT AND (mapping score > 800) then retain the MeSH term.

If semantic type = Diagnostic Function (T060) | Therapeutic or Preventive Procedure (T060) | Laboratory or Test Result (T034) AND (mapping score > 800) then retain the MeSH term.

Page 58: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Generating a PubMed Query

Concept Name

ScoreSemantic

GroupSemantic Type Retain

Year 694 CONC Temporal Concept No

Old 861 CONC Temporal Concept No

Feline osteogenesisimperfecta

1000 DISO Disease or Syndrome Yes

Adolescent 694 LIVB Age Group Yes

Osteoporosis 861 DISO Disease or Syndrome Yes

Query Terms: Feline osteogenesis imperfecta, Adolescent, Osteoporosis

Page 59: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 60: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature
Page 61: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Summary

• We have various hierarchical organizations of the PPML threads that can be browsed by the user

• We have linked the PPML to the best-evidence literature via PubMed

Page 62: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Knowledge Transformation Processes

Socialization– Tacit to tacit– Face to face meetings– Synchronous

Externalizaton– Tacit to explicit– Respond to question– List-server discussion– Asynchronous

Internalization– Explicit to tacit– Access to organized explicit knowledge– Structure helps user internalize knowledge

Combination– Explicit to explicit– Organization into categories– Reflects knowledge of domain

Page 63: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Future Research

• Improve the filters

• Link from medical literature to PPML

• Evaluate the overall system with respect to the users:– Is it useful?– Is it helpful?– Does it improve outcomes?

Page 64: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Thank You

Web Information Filtering Lab

http://www.cs.dal.ca/wifl/

Page 65: Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Closeness of Document Vectors

information science

Doc0 0 0

Doc1 0 1

Doc2 1 0

Doc3 1 1

(0,0)

science

information

θ cos θ

cos 0o = 1

cos 90o = 0Doc2

Doc1

(1,1)

Doc3