Synthesis of Streaming Data from Multiple Sensors via Embedded Data
Extraction
April 15th, 2004 Project Report
Magdiel Galán
CSE591: DataMiningDr. Huan LiuSpring 2004
http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt
Project Description Synthesis of Streaming Data from
Multiple Sensors (~100’s) via Embedded Data Extraction for mission critical applications.
Work in conjunction with Motorola’s Human Interface Lab (on-going project) Simulation Environment
Project Description
Goal: Develop driver assistance system that provide feedback, but not control, during unsafe instances.
From distractions caused by cellphones, PDAs, eMail, Why: Targeting a government initiative to create a
safer car environment in the information age explosion
How: Develop intelligent system by mining Streaming Data from multiple automotive sensors
Development work being done using driving simulator with projections screens with up to 400 parameters/sensors including video links for eye-gaze and foot-pedal movement
Sample Cases Case Scenario #1:
Passing Slow Traffic which slowed down due to an accident
which you are also rubber-necking while fidgetting with your radio
Case Scenario #2: Making a left turn
while hearing directions from MapTracker while checking at the time because you are late
while reaching for the cellphone with on-coming call
Driving Experience
GasGas
EngineTempBatt
Oil
PDA
GearShift
CD
CellPhone
A/C
Air Bag
Acceleration
Lateral Acc.
Sonar Proximity Sensor
Wheel Rotation Brake Pressure
RPMs
GPS Internet
Driver
Motivation Primary Interest: Robotics
Merging of Sensors/Sensor Fusion optical proximity (IR, sonar, radar) location (GPS, visual maps) movement (actuators, rotations) system (battery, temperature, bump switches)
Problem: decide agent’s next best action vs. a goal
Not too dissimilar from an Automobile environment Other Applications:
Manufacturing Environment Increase Yields/Productivity/Reduce Defects using quality
control daily monitor data (100’s Parameters 1K’s) Pentium Ex.: Oxide Thickness, Poly Width, Boron
Implant Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…
Stream Data Properties Numerical/Continuous
Speed Steering/Heading Acceleration (Forward/Lateral) Distance (Lane Edge, Vehicle on Front)
Categorical Lane Position Gear: P/R/D/OD/L1/L2 Headlights On/Off Radio/CD ON Incoming Call
Sampling Rate: 60Hz
Critical/Special Conditions
Left/Right Turn Passing/Changing Lanes U-Turn Reverse Tailgating Not On Road
Some Warning Signs Lane Drifting Erratic Behavior
droopy eyes eyes not facing the road foot/pedal movement do not correspond
with road conditions Incoming Call while performing
Critical Maneuver
Goal
Identify Instances outside normal patterns as an indication of an Abnormal Situation Hence – Need to draw Driver’s Attention
to Impending Situation Ultimate Goal:
Develop bootsrapping mechanism that combines driving situation classifiers (i.e. LeftTurn/Passing) together with instance selection methods in active learning
Bootsrapping – selecting high utility data for re-training
Instance Selection Properties Instance representative Instance selection reduce rows Ideal outcome instance selection
choose a data subset achieves same result as whole data with little or no performance PP deterioration
Should be model independent ∆ ∆ P(MP(Mii) ≐ ∆P(M) ≐ ∆P(Mjj))
[LM01]
Problem#1: Sampling
Initial step towards instance selection: select representative subset… Divide into collection of elements which
must cover the whole population without overlapping [GHL01]
These are called sampling units
Problem#2: Smoothing Reduce/Filter out noise and outliers. Smoothing Techniques used:
Bin Median/Rolling Average [LM01]/[D03] Median preferred over Mean since less
sensitive to outliers Tresholding/Bin Boundaries
[LM01]/[HK01] 10% offset treshold
PreSmoothing - RAW Data
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
Smoothing Results - Median
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
Dr. Liu’s Incremental Instance Selection AlgorithmGiven: Data streams with instances IOutput: indicative instances
For each data streamDo the following incrementally Create a profile P for I Check new instance i against P if i is an outlier of P
Return i else
Update P with iEnd do
Problem#3: Clustering Why?
Data is Unclassified Previous results using Numerical Data on
most significant key parameters Develop clusters exemplifying ALL
attributes Select instances that do not belong to a
cluster as triggering mechanism
Stream Clustering Challenges Large “Unclassified” Data Base Fast On-Line Resolution within small
window 0.5 – to 2 or 3 seconds
One Pass Only restriction (need fast I/O) Mix of Numerical and Categorical Data
Traditional algorithms do not work well for categorical attributes (remember P/R/D/OD/L1/L2, or CD On)
Centroid approach cannot be used Hard to reflect the properties of the neighborhood of
the points
Memory Constraints
Clustering Techniques vs. Streaming Data SVM
Good at handling multidimensional data Not good – need classified data, lots of
I/O, data in memory BIRCH
Good at handling mulidimensional data, large databases; single scan, linear I/O time
Not good – predominantly for “numerical” type of attributes; order dependent
Clustering Techniques vs. Streaming Data (2)
CURE (Clustering Using REpresentative)[D03] Good at handling outliers; hierarchical Not good – random sampling (won’t fit
streaming) ROCK (RObust Clustering Using LinKs)
[D03] Good at Hierarchical clustering for
categorical attributes Not good: Random sampling for scale up
Current Status/Plans This is an ON-GOING project Cluster Technique Development
Evolve from known methods? Generalization of the technique
Not just Automobile Streaming Data
References [LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”.
Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library
[GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance Selection and Construction for Data Mining. 2001. KAP. ASU Library
[HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8 Data Cleaning, Clustering. Morgan Kaufman. ASU Library
[D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5. Mining Techniques, Classification, Clustering. ASU Library