Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

40
Challenges in Challenges in Ubiquitous Data Ubiquitous Data Management Management Michael Franklin UC Berkeley

Transcript of Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

Page 1: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

Challenges in Ubiquitous Challenges in Ubiquitous Data ManagementData Management

Michael FranklinUC Berkeley

Page 2: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 2

Ubiquitous ComputingUbiquitous Computing

“In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998

You’ve heard it before…

Wireless Internet-enabled devices projected to soon outnumber wired Internet devices.

Many computing devices per person: Smartphones, PDAs, Smartcards, badges, wearables, lightswitches, toasters, …

Page 3: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 3

Ubiquitous ConnectivityUbiquitous Connectivity

Tremendous improvements in Internet backbone bandwidth and reductions in diameter.

Broadband connectivity to the home and office (i.e. the “last mile”) is being solved.

Wireless technologies are enabling anytime-anywhere connectivity.

Page 4: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 4

Ubiquitous Data AccessUbiquitous Data Access

But, ubiquitous computing and connectivity aren’t worth much without ubiquitous data access.

“Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Hambrecht and Quist, iWord , 3/99

Ubiquitous data access will put existing data management techniques to the test, in all aspects – searching, location, reliability, consistency, …

Page 5: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 5

Ubiquitous Data – State of the ArtUbiquitous Data – State of the Art Everyone uses a database system and/or search

engine every day Although they may not realize it! (the true test of “ubiquity”).

The Internet and WWW have become a ubiquitous means of global data dissemination and exchange.

Databases play a crucial but largely invisible role here. XML and related standards are enabling increasingly

sophisticated interoperation.

Wireless access provides anytime-anywhere access and enables location-centric applications.

Page 6: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 6

Scenarios and RequirementsScenarios and Requirements Real “killer apps” have not yet emerged.

Many in industry have begun to refer to a “user experience” rather than a particular app.

Many of these scenarios are quite irritating

e.g. “buy milk now!!!!” Typical scenarios require three types of functionality:

Support for mobility – of users and data Context awareness – what is the user trying to do? Support for collaboration – varied and dynamic groups of

people; real-time or asynchronous,…

Page 7: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 7

Demands on Data ManagementDemands on Data Management A key requirement that emerges from all three of these

categories is adaptivity.

movement/availability of data and people continually changing contexts dynamic groups and interactions

A problem and solution: “user-in-the-loop”:

people can deal with ambiguity and conflict resolution. requires a collaborative and responsive approach to

information systems: provide fast interactive performance quickly respond to user direction.

Page 8: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 8

MobilityMobility Limited device capabilities:

storage & CPU, battery power, bandwidth, display, … requires adjustment of data delivery to these

Varying and intermittent connectivity

requires proxies and smart data staging/pre-staging requires global access to data

Location-centric applications

“find open drugstores within two miles of my current location.”

must be able to deal with locations and distances servers must track huge numbers of moving objects

Page 9: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 9

Context AwarenessContext Awareness System must maintain an internal representation of the

users’ needs, tasks, roles, preferences, etc.

requires “user profiles” and models some information can be leveraged from PIM apps

In some scenarios, e.g. “smart spaces”, system must continually monitor and react to changes in the environment:

requires processing streams of data from sensors, logs, etc.

All require inferencing and learning techniques over dirty and incomplete data.

Page 10: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 10

CollaborationCollaboration

Synchronization and consistency support

collaboration revolves around a set of shared data

requirements range from unmoderated chat rooms to complete ACID transactions

Also need maintenance of history

to support asynchronous collaborations to support changes in group membership must be durable and highly-available.

Page 11: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 11

Two On-going ProjectsTwo On-going Projects Two projects currently underway to address some of

these issues (both part of “Endeavour”).

Data Centers/Dissemination-Based Info Sys

Profile-based data management includes “data recharging” collaboration with Stan Zdonik at Brown and Mitch

Cherniack at Brandeis Telegraph

adaptive query processing over data streams with Joe Hellerstein at UC Berkeley

Page 12: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 12

Data Centers FrameworkData Centers Framework

An architecture that combines data delivery techniques for responsive client access.

3 types of nodes: Data sources Clients Information brokers (can add value)

Any data delivery mode can be used.

Network transparency Dynamic

Page 13: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 13

Delivery OptionsDelivery Options

PushPull

Aperiodic Periodic

Unicast 1-to-n Unicast 1-to-n

Aperiodic Periodic

Unicast 1-to-n Unicast 1-to-n

request/response

request/responsew/snoop

polling pollingw\snoop

Email lists

publish/subscribe

Emaillistdigests

Broad-castdisks

publish/subscribe

Page 14: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 14

Network TransparencyNetwork Transparency

Clients Brokers Sources

The type of a link matters The type of a link matters only only to nodes on each endto nodes on each end

Page 15: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 15

DBIS ExampleDBIS Example

1-to-n pushServerDB

Proxy cache

An example:

Can vary dynamically

Unicast pull

Proxy cache

Proxy cache

Unicast pull

Unicast pull

Page 16: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 16

““Data Recharging” for Weakly Data Recharging” for Weakly Connected DevicesConnected Devices

Mobile devices require 2 resources: power and data

It is impractical to be continuously connected to fixed sources of these.

Devices cope with disconnection using caching:

Power cached in rechargeable batteries Data cached in hot-synched memory

Recharging the power is easy…

Anywhere, Anytime, “Hands-off” operation, Flexible connection duration

Page 17: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 17

Data Recharging – Elevator PitchData Recharging – Elevator Pitch

Make recharging data as simple as recharging power:

Anywhere – no need to connect to your home machine,

Anytime – no prior arrangements necessary, “Hands-off” operation – system knows what you

need Flexible connection duration – the longer you stay

connected, the better your device-resident data gets.

Page 18: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 18

Some QuestionsSome Questions

How to know where the user will be?

and do we care? (for context – yes, for staging -??)

How to know what the user wants?

How to prioritize data delivery?

The answer is User Profiles

Page 19: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 19

““Data Recharging” ProfilesData Recharging” Profiles Three main components:

1) Content-based specifications of user interests(read “queries”)

2) Specifications of user priorities/requirementspriority ordering, resolution, freshness, dependencies

3) User Context information – where, when, who, what This info is available in the user’s PIM data!

Profiles must be both specified explicitly and learned automatically.

Page 20: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 20

First cut at Profile ModelFirst cut at Profile Model Tasks, sub-tasks, and jobs

Dependencies and alternatives expressed in a tree “Values” assigned and manipulated

Two optimization problems:

Bounded (known) sync time Unknown sync time

Bounded case is an instance of the “precedence-constrained knapsack problem”

The XFilter system allows us to process millions of standing queries of XML documents

Page 21: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 21

The challenge is to efficiently and quickly match incoming XML documents against the potentially huge set of user profiles.

Xfilter- An XML-Based SDI Xfilter- An XML-Based SDI SystemSystem

XML Conversion

XML Document

s Filter Engine

User Profiles

Users

Filtered Data

Data Sources

Page 22: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 22

Important XPath FeaturesImportant XPath Features Parent/Child (‘/’) and Ancestor/Descendant (‘//’):

/catalog/product//msrp

Wildcards (match any single element):

/catalog/*/msrp

Element Node Filters to further refine the nodes:

Filters can contain nested path expressions

//product[price/msrp < 300]/name

Filter applied to product element node

Page 23: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 23

ArchitectureArchitecture

XPath Parser

Filter Engine

Path NodesProfile Info

XML Documents

XML Parser(SAX Based) Element

Events

SuccessfulProfiles &

Filtered Data

ProfileBase

SuccessfulQueries

Query Index

User Profiles(XPath Queries)

/a//b/c//b/d/*/e/c/*/d//e

/a/b[c/d]/e//d/*/*/e/b/e

Page 24: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 24

XML Parsing and FilteringXML Parsing and Filtering Event-based XML Parsing using SAX API

XML documents are converted to a linear sequence of events that drive the execution of the filter

Callback functions are implemented to deal with the different events

Start Element Element Data End Element

Page 25: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 25

Filter EngineFilter Engine Tricky aspects of the XPath language:

Checking the order of elements in the queries Handling wildcards and descendent operators Evaluating filters that are applied to element

nodes (Nested path expressions) Solution:

Convert each XPath query into a Finite State Machine (FSM)

A profile is considered to be satisfied when its final state is reached

Index the states of FSMs for efficient evaluation

Page 26: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 26

FSM RepresentationFSM Representation Each element node is a state

A state is represented using a Path Node structure:

Contains information to process current state: Compare the level of element name in input document

with the level value of the path node Evaluate the element node filter if there is any Locate next path nodes for the state change in the FSM

representation Calculate the level values of next states using relative

distance values (in terms of levels) stored in the path nodes

Page 27: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 27

Handling Multiple QueriesHandling Multiple Queries

Hash table based on the element names in the queries

Each node contains two lists of path nodes:

Candidate List: Stores the path nodes that represent current state of each query

Wait List: Stores the path nodes that represent the future states

State transition is represented by promoting a path node from the Wait List to the Candidate List

Initial distribution of path nodes has a significant impact on performance

Key insight for scalable Profile Matching:Index the queries instead of the data

Page 28: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 28

ExampleExampless

Q1 = / a / b // c

Q1

1

NA

1

Q1

2

1

?

Q1

3

NA

-1

Q1-1 Q1-2 Q1-3

Q2 = // b / * / c / d

Q2

1

NA

-1

Q2

2

2

?

Q2

3

1

?

Q2-3Q2-2Q2-1

Q3 = / * / a / c // d Q4 = b / d / e Q5 = / a / * / * / c // e

Q3

3

NA

-1

Q3

2

1

?

Q3

1

NA

2

Q3-3Q3-2Q3-1

Q5

1

NA

1

Q5-1

Q5

2

3

?

Q5-2

Q5

3

NA

-1

Q5-3

Q4

1

NA

-1

Q4-1

Q4

2

1

?

Q4-2

Q4

3

1

?

Q4-3

Query Id

Position

Rel Dist

Level

Page 29: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 29

Query Index ConstructionQuery Index Construction

z

a

b

c

d

e

WL

CLQ2-1

Q2-2

Q2-3

Q3-1

Q3-2

Q3-3

Element Hash Table

CL : Candidate ListWL: Wait List

WL

Q1-1

Q1-2

Q1-3

WL CL

WL

CL

CL

WL CL

Q4-1

Q4-2

Q4-3

Q5-1

Q5-2

Q5-3

Page 30: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 30

Data Centers - Research AgendaData Centers - Research Agenda Profile Definition and Maintenance

Update Storage and Preparation

Efficient integration of "recharge" updates with existing cached data.

Recharge, Trickle Charge, Jump Start... Consistency Guarantees

Global Data Staging

Approaches will be driven by (mostly PIM) applications.

Page 31: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 31

Telegraph: Telegraph: An Adaptive Dataflow An Adaptive Dataflow EngineEngine

Dataflow because that’s what data does… data streaming from sensors real-time processing of streams: update

stream, click-stream, swipe-stream, … siphon data from the “deep web” “continuous queries” for dissemination-based apps

Adaptivity due to volatility… sensor nets wide area internet dynamic caching, replication, and staging user-in-the-loop interfaces mobile users and devices

Joint work at UC Berkeley with Joe Hellerstein

Page 32: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 32

Sources may be unreachable or slow to respond.

Data delivery may be: slower than expected bursty interrupted

Data statistics/cost estimates may be unavailable or unreliable due to poor interfaces or crossing administrative domains.

Wide-area + Wrapped sources Wide-area + Wrapped sources UnpredictabilityUnpredictability

Page 33: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 33

Batch processing is inappropriate for many apps.

especially when searching the Internet

Must provide feedback to the user as quickly as possible.

Data access becomes a cooperative, iterative approach:

User may correct/redirect query. User may refine/change the query.

User-in-the-loop User-in-the-loop UnpredictabilityUnpredictability

Page 34: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 34

Mobility Location-centric queries Moving endpoints change

data staging needs

Data Streams/Sensors Varying data arrival rates Adapting resolutions Push vs. Pull

Mobility & Data Streams Mobility & Data Streams UnpredictabilityUnpredictability

Page 35: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 35

Some SolutionsSome Solutions Adaptive Query Processing

Query Scrambling - “Reactive Query Execution”

XJoin – non-blocking, reactive query operator. Eddies – Continuous Query Optimization

Risk-Aware Query Planning Producing robust plans or partial plans.

Exploiting Alternative Sources Mirrors or “not exactly”.

Relaxing Query Semantics Partial, Fuzzy, or Alternative answers

Page 36: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 36

Query Scrambling ExampleQuery Scrambling Example

1

4

A

CDEB

Reschedule

A

CDEB

New Operators

3

2

4

1

B C D EA

Initial Plan Reschedule

A

BCDE

ABCDE

Page 37: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 37

Traditional Hash Joins block when one input stalls.

Hash JoinHash Join

BuildProbe

Source A Source B

Hash Table A

Hash Table A

Hash Table B

Symmetric Hash Join (SHJ) blocks only if both stall. XJoin partitions data -> small footprint -> full pipelining & bushy

plans-> higher adaptability.

XJoinXJoin

Page 38: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 38

Eddy – Continuous OptimizationEddy – Continuous Optimization

Flow-based (“Rivers”) Tuples are routed via a ticket-based scheme and back-pressure. Hellerstein and Avnur 99

Eddy

Join ST

Join RSR

S

T

Page 39: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 39

Adaptive ApproachesAdaptive Approaches

Increased uncertainty argues for increased adaptivity. Wide-area nets and admin domains introduce uncertainty. Pesky users introduce uncertainty. Mobility and streams introduce uncertainty.

Implications for data-intensive Internet services.

Dynamic,Parametric,

Competitive,…

staticplans

anarchylatebinding reopt. continuous

opt.

currentDBMS

Query Scrambling Eddy

XJoin

???

Page 40: Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley.

© 2000 Michael J. Franklin 40

ConculsionsConculsions We need to build more intelligent systems to protect

humans from the data flood, but good old systems performance issues still matter too.

No killer app for Ubiqutious Data Access yet; may be the killer “user experience”

Scenarios give us a common (and challenging!) set of requirements for data management: Adaptivity, context-awareness, global-scale, …

The Data Centers and Telegraph projects are addressing key data management technologies for supporting ubiquitous access to data.