Jie Liu - Fapesp · Jie Liu Suman Nath ... Börje Felipe Fernandes Karlsson Pontifícia...
Transcript of Jie Liu - Fapesp · Jie Liu Suman Nath ... Börje Felipe Fernandes Karlsson Pontifícia...
Jie Liu
Senior Researcher
Microsoft Research
Redmond, WA 98052
Microsoft-FAPESP Environmental Science Workshop, Nov. 11th, 2010
Networked Embedded Computing Group @
MSR
Jie Liu
Suman Nath
Bodhi Priyantha
Michel Goraczko
Aman Kansal
Dimitrios Lymberopoulos
Special Thanks to our LATAM
Interns Andre Santanche
Universidade Estadual de Campinas
Börje Felipe Fernandes KarlssonPontifícia Universidade Católica do Rio de Janeiro
Leonardo Barbosa OliveiraUniversidade Estadual de Campinas
Heitor Soares Ramos Filho Universidade Federal de Minas Gerais
Miguel PalomeraUniversidad Nacional Autónoma de México
Sensing Made Easy -- As It Appears
Sensing, computing, and networking become increasing accessible Low cost sensors, radios, and devices
Low barrier of entry for accessing cloud services
Ubiquitous cell and WiFi coverage
Ride the consumer device wave Sales of sensor-rich smartphones: 270 million in 2010. (IDC)
Potential of smart grid: 70 million single family houses in US.
A round of academic investments around the world US: Cyber-physical systems
EU: CONET
China: Internet of things
Sensing as a Foundation for
Science
CollaborationCollection Extraction
At Microsoft Research, we tackle platform and systems issues on
sensing the physical world at scale.
Data Center Genome
A data center can have hundreds of thousandservers and other equipment and consumes tens of mega watts.
The complex interactions among utilization, computing, and cooling make improve their efficiency challenging.
MeshID: Low cost scalable asset location and
tracking
Low cost RFIDs on each server
In-rack RFID reader arrays
Transit units to convert RFIDs to wireless sensors
Mesh ad hoc network among transit units.
WiFlock: Ultra-Low Duty Cycle
Discovery
Automatic discovery and group formation when nodes come to communication range.
Lower duty cycle means more energy efficiency, but longer discovery latency.
The faster a node can wake up the lower duty cycle it can achieve. We can do 80 s with CC2500, or equivalently 0.2% duty cycle.
Expected life time: 5~7 years.
Beacon x
Listen x
Beacon y
Listen y
x
y
Genomotes Each server have a dozen fans.
Together with AC fans, the temperature distribution is
complex.
Wireless sensors are ideal for non-intrusive continuous
monitoring.
RACNet: Reliable ACquisition
Network(MSR & JHU)
Technology highlights Local data caches
Multi-channel, multi-hop collection tree
Token passing data retrieval over collection trees
End-to-end reliability.
Database
Virtual and Soft Sensors
Not all variables of interest are measurable (e.g. per virtual machine power consumption.)
Virtual and soft sensors compute the values indirectly.
0 50 100 150 200 250 3000
10
20
30
40
50
60
Time(s)
Watt
s
Energy Model Error
Measured
Estimate
Error
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20Component Dynamic Energy
Time(s)
Watt
s
CPU
Memory
Disk
0 50 100 150 200 250 3000
5
10
15
20
25Application Dynamic Energy
Time(s)
Watt
s
Total
App 1
App 2
Total App 1 App 20
500
1000
1500
2000
2500Component Dynamic Energy (Cumulative)
Joule
s
CPU
Memory
Disk
JouleMeterperf
counterspower
Example:
Software and Virtual SensorsSensor net as service providers
– Services encapsulate data and computation
– Input/output are label with clear semantics
– Services can be discovered, composited, and executed on demand
PlannerQueryService
graph
Service
Embedding
Tasking MLService
Scheduler
Service
scheduleExecution
Run-time
resource infoRun time
Schedule time
Service
info
User
Report
Service Discovery/
Self-monitoring
Tasking
Progress
Scalable Data Management
Unprecedented data collection capability
creates massive amount of data
Microsoft data centers collects ~1TB/day
Data management challenges
Information extraction
Archiving
Analysis
Cypress Approach
Compress data to reduce
storage and I/O cost
Query processing on
compressed data
— Semantic-based compression to reduce storage and I/O cost
— Achieve 100X compression with bounded maximum error (L∞ )
— Answer useful queries directly from compressed data
Column-oriented compression:
Precisionreduction
[selection]
Trickles:
Single-streamspectrum analysis
[correlation, histogram]
GAMPS:
Cross-stream compression
[selection, similarity]
0 50 100 150
-0.2
-0.1
0
0.1
0.2
Compress components into ―trickles‖
0 500 1000 1500 2000 2500 30000
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000 2500 3000-0.5
0
0.5
0 500 1000 1500 2000 2500 3000-0.5
0
0.5
LoF
Residual
Spikes
0 10 20 30 40 50 600
0.2
0.4
0.6
0.8
1
Down
sampleLoF trickle 144
2880
2880
2880
LosslessCompression
Spike trickle 56
Sketch with
Random
Projection
Sketch trickle 200
0 500 1000 1500 2000 2500 30000
0.2
0.4
0.6
0.8
1
Trickles + GAMPS
LoF & Spike
TricklesGAMPS
Me
sse
ng
er
Se
rve
r
Counte
r D
ata
Trickles GAMPS
Discard
noise
13%
8x
5%
20x
8%
12x
1%
100x
100%
HiF Trickle
SenseWeb: Wikipedia of
Sensors
SenseWeb
USGS
• Sensor registration
• Metadata mgmt
• Store and query data
• Share data with others • One stop portal
• Discover data source
• Spatio-temporal viz.
http://atom.research.microsoft.com/sensewebv3/sensormap/
...
SwissExSeamonster
Thoughts Every system is different
Power options & power consumption
Infrastructure vs infrastructure-less
Stationary vs mobile
Engineered vs opportunistic vs participatory
Extracting information from data is the challenge Modeling, validation, learning, hypothesis testing
Archiving vs streaming
Network effects
Crowd sourcing/citizen scientists
Trust
Can we design sensing systems at scale? Computer scientists and real scientists need to collaborate
Fast prototype and feedback