Handling Class Overlap and Imbalance to Detect Prompt Situations in Smart Homes

Post on 25-May-2015

208 views 7 download

Tags:

description

This paper was selected at the ICDM Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH), 2013.

Transcript of Handling Class Overlap and Imbalance to Detect Prompt Situations in Smart Homes

Barnan DasSchool of Electrical Engineering and Computer Science

Washington State University

***Self-portraits by William Utermohlen, an American artist living in London, after he was diagnosed with Alzheimer’s disease in 1995. Utermohlen died from the consequences of Alzheimer’s disease in March 2007.

Handling Class Overlap and Imbalance to Detect Prompt Situations in

Smart HomesBarnan Das, Narayanan C. Krishnan, Diane J. Cook

2

Worldwide Dementia population

Source: World Health Organization and Alzheimer’s Association.

Actual and expected number of Americans >=65 year with Alzheimer’s

Payment for care in 2012$200billion

Unpaid caregivers15million

36million

2010 2030 2050

5.1m

7.7m

13.2m

3

Automated Prompting

4

Help with Activities of Daily Living (ADLs)

5

• Rule-based (temporal or contextual)• Activity initiation• RFID and video-input based prompts for

activity steps

• Learning-based• Sub-activity level prompts• No audio/video input

Existing Work

Our Contribution

Architectural Overview

6

7

Data

8 dailyactivities

300 elderlyparticipants

Prompts issued when errors were committed

Raw Data

1 activitystep

17 engineered features

Binary class{no-prompt, prompt}

Clean Data

1 datapoint

0/1

SweepingCookingMedicationWatering PlantsEtc.

Length of activity stepLocation in apartment

# sensors involves# distribution of sensor events

Etc.

Class Distribution

8

149

3831

Total number of data points

3980

Overlapping Classes9

Overlapping Classes in Prompting Data

10

3D PCA Plot of prompting data

Existing Approaches

11

• Discard data of the overlapping region

• Treat overlapping region as a separate class

• Polynomial combination of existing features

• Using kernel methods

Tomek Links

12

Cluster-Based Under-Sampling(ClusBUS)

13

Form clusters Under-sampling candidate clusters

14

Choice of Clustering Algorithm

Determining Candidate Clusters

DBSCAN• Density-based• Non-spherical clusters• No need to predetermine

number of clusters

Empirically Determined• Based on minority class

dominance (r) in clusters• Threshold determined by q-

quantile values of r

Two Critical Components

Empirically Determined Threshold

15

Experimental Setup

16

Alternative Sampling Method SMOTE

Classifiers• C4.5 Decision Tree• Naïve Bayes• k-Nearest Neighbor• SVM

Performance Metric TP Rate, G-mean, AUC

Results (1)

17

C4.5 Naïve Bayes IBk SMO0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Original SMOTE ClusBUS

TP R

ate

C4.5 Naïve Bayes IBk SMO0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Original SMOTE ClusBUS

G-m

ean

TP Rate G-mean

Results (2)

18

C4.5 Naïve Bayes IBk SMO0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Original SMOTE ClusBUS

AUC

Area Under ROC Curve

Conclusion

19

• Automated prompting as a classification problem

• Proposed ClusBUS: under-sampling-based preprocessing

• Solution to class overlap helps address imbalance classes

Contact Us

20

Barnan Dasbarnandas@wsu.edu

Dr. Diane Cookcook@eecs.wsu.edu

http://casas.wsu.edu

21

Backup Slides

22

Activities

23

Sweeping

Cooking

Taking Medication

Watering Plants

Watching DVD

Selecting Outfit

Taking Phone Call

Writing Birthday Card

Feature Generation

24

Feature # Feature Name Description

1 stepLength Length of the step in time (seconds)

2 numSensors Number of unique sensors involved with the step

3 numEvents Number of sensor events associated with the step

4 prevStep Previous step

5 nextStep Next step

6 timeActBegin Time (seconds) elapsed since the beginning of the activity

7 timePrevStep Time (seconds) difference between the last event of the previous step and the first event of the current step

8 stepsActBegin Number of steps visited since the start of the activity

9 activityID Activity ID

10 stepID Step ID

11 location Set of features representing sensor frequencies in kitchen, dining room, living room, etc. when the activity was performed

12 Class Binary class. 1-”Prompt”, 0-”No-Prompt”