FROM TINNITUS DATA TO CLASSIFIERS CONSTRUCTION: Building Decision Support System for Diagnosis and...

FROM TINNITUS DATA TO CLASSIFIERS CONSTRUCTION:

Building Decision Support Systemfor Diagnosis and Treatment

of Tinnitus

Zbigniew W. Ras1 & Paul Jastreboff2 & Pamela Thompson1

1) University of North Carolina at Charlotte College of Computing and Informatics2) Tinnitus and Hyperacusis Center

Emory University School of Medicine

1

2

In collaboration with Jan RauchDepartment of Computer ScienceUniversity of Economics, Prague, Czech Republic

Research partially supported by the Project ME913 of the Ministry of Education, Youth, and Sports of the Czech Republic

Methodology◦ Domain Knowledge◦ Data Collection◦ Data Preparation

New Feature Construction Tolerance Relation Based Clustering & New Temporal Features Classifiers Construction –

[for Total Score or Difference in Total Score] Action Rules Discovery [hints how to treat tinnitus] Future Research

From Music to Emotions and Tinnitus Treatment

IntroductionIntroduction

Neil Young, Barbra Streisand, Pete Townshend, William Shatner, David Letterman, Paul Schaffer, Steve Martin, Ronald Reagan, Neve Campbell, Jeff Beck, Burt Reynolds, Sting, Eric Clapton, Thomas Edison, Peter Jennings, Dwight D. Eisenhower, Cher, Phil Collins, Vincent Van Gogh, Ludwig Van Beethoven, Charles Darwin, . . .

5

IntroductionIntroduction6

Methodology: Domain KnowledgeMethodology: Domain Knowledge

TRT includes

DIAGNOSIS◦ Preliminary medical examination◦ Completion of initial interview questionnaire◦ Audiological testing

◦ TREATMENT◦ Counseling◦ Sound Habituation Therapy

◦ Exposure to a different stimulus to reduce emotional reaction◦ Visit questionnaire (THI)◦ Secondary questionnaire (TFI) in new dataset◦ Instrument tracking (instruments can be table top or in ear,

different manufacturers)◦ Continued audiological tests

7

Original Dataset◦555 patients◦Relational◦11 tables

New Dataset◦758 patients◦Relational◦Secondary questionnaire -

Tinnitus Functional Index (TFI)

8

9

Methodology: Database Features

Initial Interview form provides basis for initial patient classification.

Category - 0 to 4 (stored in Questionnaires tables)

0 – low tinnitus only: counseling

1 – high tinnitus: sound generators set at mixing point

2 – high tinnitus w/hearing loss (subjective): hearing aid

3 – Hyperacusis: sound generators set above threshold of hearing

4 – persistent hyperacusis: sound generators set at the threshold; very slow increase of sound level

9

1010

Methodology: Database Features

Tinnitus Functional IndexNew cognitive and emotional questionsScale of 0 to 10 and some %Includes questions related to

Anxious/worriedBothered/upsetDepressed

This new set of features is mapped to “arousal-valence emotion plane” used for construction of emotion-basedclassifiers in music information retrieval domain (personalization aspects are considered as well).

10

1111

Arousal-valence emotion plane - used in Automatic Indexing of Music by emotions

12New Feature Construction: TFI and Emotions

New Features Based on the TFI and emotions

12

Table 2: Tinnitus Functional Index (scale of 0 to 10)

Category of Question

Q1 % of time aware Awareness E-V Scale

Q2 loud HEARING

Q3 in control E11 E1

Q4 % of time annoyed Annoyance

Q5 cope E11 E1

Q6 ignore E21 E2

Q7 concentrate THINKING CONCENTRATION

Q8 think clearly THINKING CONCENTRATION

Q9 focus attention THINKING CONCENTRATION

Q10 fall/stay asleep E33 E3

Q11 as much sleep E33 E3

Q12 sleeping deeply E33 E3

Q13 hear clearly HEARING

Q14 understand people HEARING

Q15 follow conversation HEARING

Q16 quite, resting activities E41 E4

Q17 relax E43 E4

Q18 peace and quiet E42 E4

Q19 social activities SOCIAL

Q20 enjoyment of life E11 E1

Q21 relationships SOCIAL

Q22 work on other tasks SOCIAL

Q23 anxious, worried E23 E2

Q24 bothered upset E22 E2

Q25 depressed E31 E3

Sum of values represents E1 Energetic Positive, E2 Energetic Negative, E3 Calm Negative, E4 Calm Positive

Methodology: Database FeaturesMethodology: Database Features

Tinnitus Handicap Inventory◦Questionnaire, forms Neumann-Q Table◦Function, Emotion, Catastrophic Scores◦Total Score (sum)◦THI

0 to 16: slight severity 18 to 36: mild 38 to 56: moderate 58 to 76: severe 78 to 100: catastrophic

13

14

New Feature Construction: Decision FeatureNew Feature Construction: Decision Feature

Total Score

Difference

Discretization

Description

(score a represents the highest T Score in all cases)

TSa a= {s: s>0}, b= {0} , c = {s: s < 0}

TSb a={ s: s>30}, b ={s: 10 < s 30}, c={s: -10 < s 10},

d={s: -40 < s -10}, e – remaining scores TSc a={s : s > 28}, b={s: 0 < s 28}, c ={s: -1 < s 0},

d ={s: -15 < s -1} , e – remaining scores TSd a={s: s > 40}, b={s: 10 < s 40}, c={s: -10 < s 10},

d={s: -40 < s -10}, e – remaining scores TSe a={s: s > 50}, b={s: 0< s 50}, c={s: -50< s 0}, d – remaining scores

TSf a={s: s > 80}, b={s: 60< s 80}, c={s: 40<s 60}, d={s: 20 < s 40},

e ={s: 0< s 20}, f={s: -20 < s 0}, g={s: -40< s -20},

h={s: -60 < s -40}, i – remaining scores TSg a={s: s > 28}, b={s: 0 < s 28}, c={s: -12 < s 0}, d – remaining scores

TSh a ={s: s> 10}, b={s: -10 s 10}, c – remaining scores

14

New 8 decision attributes based on different discretizations of the difference in Total Score (between first and last visit)

Data Transformation – ORIGINAL DATABASE◦Flattened File (by Patient) From original database,

one tuple per patient with addition of features◦ Discovered from Text Data◦ Statistical (standard deviations, averages, ..) ◦ Temporal (sound level centroid, sound level spread, recovery rate) ◦ Decision Feature – discretized Difference in Total Score

from THI

Data Transformation – NEW DATABASE Clustered patient-driven datasets (by similar visit

patterns) with addition of features Coefficients, angles

15

New Feature Construction: Text FeaturesNew Feature Construction: Text Features

Text Mining◦Text fields

Demographic, Miscellaneous, Medication tables Categories may show cause of tinnitus for patient Stress, Noise, Medical:

17

New Boolean Features Stress, Noise, and Medical Based on

Text Mining of Terms

Stress stress, depression, emotion, work, marriage, wedding

Noise accident, noise, concert, loud, music, shooting, blast

Medical surgery, infection, medicine, depression, hospital

New Feature Construction: Temporal FeaturesNew Feature Construction: Temporal Features

New Temporal Features◦Sound Level Centroid

18

T = Total number of Visits per patient (3)V is some sound level feature (ex. LDL measurement) measured at each visit V(1), V(2), V(3)

1/3*V(1) + 2/3 * V(2) + 3/3 * V(3) V(1) + V(2) + V(3)


New Temporal Features◦Sound Level Spread

19

SQRT V(1) * (1/3-C)2 + V(2) * (2/3-C)2 + V(3) * (3/3 – C)2

V(1) + V(2) + V(3)


New Temporal Features◦Recovery Rate

20

],0[,min,0

0 NiVkTT

VVi

k

k

V = Total Score from THIVo = first score (should be less than Vk)Vk is the best or min score in the vectorTk is the date of best score

Data Mining: Unclustered Data Mining: Unclustered DataData

In Search for Optimal Classifiers describing Total Score or changes in Total Score [new decision attributes]

◦WEKA

◦ J48 (C4.5 Decision Tree Learner)

◦Random Forest

◦Multilayer Perceptron

21


Experiments and Results

1) Original data with Standard Deviations and Averages from Audiological features 2) Original data with Standard Deviations, Averages, Sound level centroid and sound level spread (Sound) only 3) Original data with Standard Deviations, Averages, and Text 4) Original Data Standard Deviations, Averages, Text and Sound 5) Original Data with Text 6) Original Data with Sound 7) Original Data with Sound, Text, and Recovery Rate 8) Original Data with Sound, and Recovery Rate /the winner/ 9) ……………………………………….

22


23

Top Classification Results for all 8 decision variablesOriginal Data with Sound Level Centroid, Sound Level Spread, Recovery Rate

Data Mining: Clustered DataData Mining: Clustered Data

Continuing the Search for Optimal Classifiers

◦Transformation to Visit Structure◦Creating Tolerance-Relation based Datasets ◦Adding New Features

Two groups of databases: three and four visit centered sets were constructed.

24

Clustering Techniques for Temporal Feature ExtractionClustering Techniques for Temporal Feature Extraction

Coefficients and Angles Feature Construction for Dp where p is a patient with 4 visits:

26

Clustering TechniquesClustering Techniques

Quadratic Equation Based New Features

28

Clustering TechniquesClustering Techniques

29

3030

New Feature Construction: Decision FeatureNew Feature Construction: Decision Feature

Total Score

Difference

Discretization

Description

(score a represents the highest T Score in all cases)

TSa a= {s: s>0}, b= {0} , c = {s: s < 0}

TSb a={ s: s>30}, b ={s: 10 < s 30}, c={s: -10 < s 10},

d={s: -40 < s -10}, e – remaining scores TSc a={s : s > 28}, b={s: 0 < s 28}, c ={s: -1 < s 0},

d ={s: -15 < s -1} , e – remaining scores TSd a={s: s > 40}, b={s: 10 < s 40}, c={s: -10 < s 10},

d={s: -40 < s -10}, e – remaining scores TSe a={s: s > 50}, b={s: 0< s 50}, c={s: -50< s 0}, d – remaining scores

TSf a={s: s > 80}, b={s: 60< s 80}, c={s: 40<s 60}, d={s: 20 < s 40},

e ={s: 0< s 20}, f={s: -20 < s 0}, g={s: -40< s -20},

h={s: -60 < s -40}, i – remaining scores TSg a={s: s > 28}, b={s: 0 < s 28}, c={s: -12 < s 0}, d – remaining scores

TSh a ={s: s> 10}, b={s: -10 s 10}, c – remaining scores

30

Eight new decision attributes based on different discretizations of Differences in Total Score

Data Mining: Clustered DataData Mining: Clustered Data

Classifiers Construction [learning differences in total score] for clustered data:

J48, Random Forest, and Multilayer Perceptron(Neural Network) have been tested on the cluster-based original datasets with:

1) standard deviations and averages, 2) coefficients and text, 3) coefficients and angles, 4) coefficients only, 5) angles only, 6) angles and text, 7) angles, coefficients and text /the winner/.

31

Data Mining: Clustered DataData Mining: Clustered Data32

Summary Data Mining: Clustered DataSummary Data Mining: Clustered Data

Results are quite encouraging

◦ Top precision is .884◦ This represents an improvement over the classification

precision of .751 with J48 classification on the original dataset and features Sound Level Centroid, Sound Level Spread and Recovery Rate being present

33

Action RulesAction Rules

34

35

Action rule is defined as a term

A B D

a1 b2 d1

a2 b2

a2 b2 d2

Information System

conjunction of fixed condition features shared by both groups

proposed changes in values of flexible features

desired effect of the action

[(ω) ∧ (α → β)] →(ϕ→ψ)

Action RulesAction Rules

New Feature Construction: Decision Features New Feature Construction: Decision Features showing change over timeshowing change over time

New Decision Feature

◦Boolean features + or – related to a feature such as Total Score improving or getting worse Calculated from score on next visit Stored as + or – on visit related tuple

36

Rules using LISpMiner

37

ACTION RULES: EXPERIMENT AND RESULTSACTION RULES: EXPERIMENT AND RESULTS

Analysis:

38

ACTION RULES: EXPERIMENT AND RESULTSACTION RULES: EXPERIMENT AND RESULTS

Before confidence: 9/9+0After confidence: 9/ [9+20]Low confidence but shows promise

39

Summary

Future ResearchFuture Research

Continue Action Rule StudyDevelop GUI for patient data entryUse knowledge gained from rules to

develop decision support system for treatment support for tinnitus sufferers

Continue research with music, emotions, and tinnitus treatment

40

FROM TINNITUS DATA TO CLASSIFIERS CONSTRUCTION: Building Decision Support System for Diagnosis and...

Documents

Transcript of FROM TINNITUS DATA TO CLASSIFIERS CONSTRUCTION: Building Decision Support System for Diagnosis and...