Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Intel Research / Affiliate...
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
4
Transcript of Creating Dynamic Social Network Models from Sensor Data Tanzeem Choudhury Intel Research / Affiliate...
Creating Dynamic Social Creating Dynamic Social Network Models from Sensor Network Models from Sensor
DataData
Tanzeem ChoudhuryTanzeem ChoudhuryIntel Research / Affiliate Faculty CSEIntel Research / Affiliate Faculty CSE
Dieter Fox Dieter Fox Henry KautzHenry Kautz
CSECSEJames KittsJames Kitts
SociologySociology
What are we doing?Why are we doing it?
How are we doing it?
Social Network AnalysisSocial Network Analysis
Work across the social & physical sciences is increasingly studying the structure of human interactiono 1967 – Stanley Milgram – 6 degrees of separation
o 1973 – Mark Granovetter – strength of weak ties
o 1977 –International Network for Social Network Analysis
o 1992 – Ronald Burt – structural holes: the social structure of competition
o 1998 – Watts & Strogatz – small world graphs
Social NetworksSocial Networks
Social networks are naturally represented and analyzed as graphs
Example Network PropertiesExample Network Properties
Degree of a nodeEigenvector centrality
o global importance of a node
Average clustering coefficiento degree to which graph decomposes into
cliques
Structural holes o opportunities for gain by bridging
disconnected subgraphs
ApplicationsApplications
Many practical applicationso Business – discovering organizational
bottlenecks
o Health – modeling spread of communicable diseases
o Architecture & urban planning – designing spaces that support human interaction
o Education – understanding impact of peer group on educational advancement
Much recent theory on finding random graph models that fit empirical data
The Data ProblemThe Data Problem
Traditionally data comes from manual surveys of people’s recollectionso Very hard to gather
o Questionable accuracy
o Few published data sets
o Almost no longitudinal (dynamic) data
1990’s – social network studies based on electronic communication
Social Network Analysis of Social Network Analysis of EmailEmail
Science, 6 Jan 2006
Limits of E-DataLimits of E-Data
Email data is cheap and accurate, but misseso Face-to-face speech – the vast
majority of human interaction, especially complex communication
o The physical context of communication – useless for studying the relationship between environment and interaction
Within a Floor
Within a Building
Within a Site
Between Sites
0 20 40 60 80
Proportion of Contacts
Face-to-FaceTelephone
High Complexity Information
• Can we gather data on face to face communication automatically?
Research GoalResearch Goal
Demonstrate that we can… Model social network dynamics by gathering
large amounts of rich face-to-face interaction data automatically o using wearable sensors
o combined with statistical machine learning techniques
Find simple and robust measures derived from sensor datao that are indicative of people’s roles and relationships
o that capture the connections between physical environment and network dynamics
Questions we want to Questions we want to investigate:investigate:
Changes in social networks over time:o How do interaction patterns dynamically relate to
structural position in the network?
o Why do people sharing relationships tend to be similar?
o Can one predict formation or break-up of communities?
Effect of location on social networkso What are the spatio-temporal distributions of
interactions?
o How do locations serve as hubs and bridges?
o Can we predict the popularity of a particular location?
Other Applications of such DataOther Applications of such Data
Research on emotional content of speech o Need for “natural” data
Medical applicationso Speaking rate is an indicator of mental activity
o Overly-rapid speech symptom of mania
o Asperger’s syndrome: abnormal conversational dynamics
Meeting understandingo Interruptions indicate status & dominance
SupportSupport
Human and Social Dynamics – one of five new priority areas for NSFo $800K award to UW / Intel / Georgia Tech
team
o Intel at no-cost
Intel Research donating hardware and internships
Leveraging work on sensors & localization from other NSF & DARPA projects
ProcedureProcedure
Test groupo 32 first-year incoming CSE graduate students
o Units worn 5 working days each month
o Collect data over one year
Units record o Wi-Fi signal strength, to determine location
o Audio features adequate to determine when conversation is occurring
Subjects answer short monthly surveyo Selective ground truth on # of interactions
o Research interests
All data stored securelyo Indexed by code number assigned to each subject
PrivacyPrivacy
UW Human Subjects Division approved procedures after 6 months of review and revisions
Major concern was privacy, addressed byo Procedure for recording audio features
without recording conversational content
o Procedures for handling data afterwards
Data CollectionData Collection
Intel Multi-Modal Sensor Board
Real-time audio feature
extraction
audiofeatures
WiFistrength
Coded
Database
codeidentifier
Recording UnitsRecording Units
Data CollectionData Collection
Multi-sensor board sends sensor data stream to iPAQ
iPAQ computes audio features and WiFi node identifiers and signal strength
iPAQ writes audio and WiFi features to SD card
Each day, subject uploads data using his or her code number to the coded data base
Speech DetectionSpeech Detection
From the audio signal, we want to extract features that can be used to determineo Speech segments
o Number of different participants (but not identity of participants)
o Turn-taking style
o Rate of conversation (fast versus slow speech)
But the features must not allow the audio to be reconstructed!
Speech ProductionSpeech Production
vocal tractfilter
Fundamental frequency (F0/pitch) and formant frequencies (F1, F2 …) are the most important components for speech synthesis
The source-filter Model
Speech ProductionSpeech Production Voiced sounds: Fundamental frequency (i.e.
harmonic structure) and energy in lower frequency component
Un-voiced sounds: No fundamental frequency and energy focused in higher frequencies
Our approach: Detect speech by reliably detecting voiced regions
We do not extract or store any formant information. At least three formants are required to produce intelligible speech*
* 1. Donovan, R. (1996). Trainable Speech Synthesis. PhD Thesis. Cambridge University 2. O’Saughnessy, D. (1987). Speech Communication – Human and Machine, Addison-Wesley.
Goal: Reliably Detect Voiced Goal: Reliably Detect Voiced Chunks in Audio StreamChunks in Audio Stream
Speech Features ComputedSpeech Features Computed
1.Spectral entropy
2.Relative spectral entropy
3.Total energy
4.Energy below 2kHz (low frequencies)
5.Autocorrelation peak values and number of peaks
6.High order MEL frequency cepstral coefficients
Features used: AutocorrelationFeatures used: Autocorrelation
Autocorrelation of (a) un-voiced frame and (b) voiced frame.
Voiced chunks have higher non-initial autocorrelation peak and fewer number of peaks
(a) (b)
Features used: Spectral EntropyFeatures used: Spectral Entropy
Spectral entropy: 3.74Spectral entropy: 4.21
FFT magnitude of (a) un-voiced frame and (b) voiced frame.
Voiced chunks have lower entropy than un-voiced chunks, because voiced chunks have more structure
Features used: EnergyFeatures used: Energy
Energy in voiced chunks is concentrated in the lower frequencies
Higher order MEL cepstral coefficients contain pitch (F0) information. The lower order coefficients are NOT stored
Segmenting Speech RegionsSegmenting Speech Regions
Multi-Person Conversation Multi-Person Conversation ModelModel
Group State Gt
Who is holding the floor (main speaker)
1-N: instrumented subjects
N+1: silence
N+2: any unmiked speaker
Multi-Person Conversation Multi-Person Conversation ModelModel
Individual State Mi
t
True if subject i is speaking
P(M|G) set so as to disfavor people talking simultaneously
U true if unmiked subject speaking
Multi-Person Conversation Multi-Person Conversation ModelModel
Voicing States Vit
True if sound from mike i is a human voice
P(Vit | Mi
t) = 1
P(Vit | not Mi
t) = 0.5
AVt is logical OR of
voicing nodes
Multi-Person Conversation Multi-Person Conversation ModelModel
Observations Oit
Acoustic features from mike i that are useful for detecting speech
P(O|V) is a 3D Gaussian with covariance matrix, learned from speaker-independent data
Multi-Person Conversation Multi-Person Conversation ModelModel
Energy Ei,jt
2D variable containing log energies of mikes i and j
Associates voiced regions with speaker
If i talks at t, then energy of mike i should be higher than mike j
Determining Miked SpeakerDetermining Miked Speaker
Multi-Person Conversation Multi-Person Conversation ModelModel
Entropy Het
Entropy of the log energy distribution across all N microphones
When an unmiked subject speaks, entropy across microphones will be low
Determining Unmiked SpeakerDetermining Unmiked Speaker
ResultsResults
ResultsResults
Analyzing Results of DBN Analyzing Results of DBN InferenceInference
Compute # of conversations between subjects
Create weighted graph
Visualize with multi-dimensional scaling
Modeling InfluenceModeling Influence
Goal: model influence of subject j on subject i’s conversational style
Formally:o P(Si,t | Si,t-1) = self transition probability
(probability of continuing to speak or remain silent)
o Question: for a particular conversation, how much of P(Si,t | Si,t-1, Sj,t-1) is explained byP(Sj,t | Sj,t-1)?
o Create mixed-memory Markov chain model, infer parameters;
InfluenceInfluence
GISTSGISTS
Inferring what a conversation is about (“gist”)
Apply speech recognition Use OpenMind commonsense
knowledge database to associate words with classes of events (“buying lunch”)
Use simple Naïve Bayes “bag of words” to infer gist and select key words
Improve by conditioning on location
ExampleExample
Next Step: LocationsNext Step: Locations
Wi-Fi signal strength can be used to determine the approximate location of each speech evento 5 meter accuracy
o Location computation done off-line
Raw locations are converted to nodes in a coarse topological map before further analysis
Topological Location MapTopological Location Map
Nodes in map are identified by area typeso Hallway
o Breakout area
o Meeting room
o Faculty office
o Student office
Detected conversations are associated with their area type
Goal: Social Network ModelGoal: Social Network Model
Goal: Dynamic Social Network Modelo People, Places, Conversations, Timeo Nodes
o Subjects (wearing sensors, have given consent)o Places (e.g., particular break out area)o Instances of conversations
o Edgeso Between subjects and conversationso Between places and conversations
o Replicate over data collection sessions (as in a DBN)o Compute influences between sessions: E.g., if A-B
and B-C are strong a t, then A-C is likely to be strong at t+1