Andrea D’Orazio 28 April 2010 © 2010 - [email protected] - .

36
Data Synchronizati on Andrea D’Orazio 28 April 2010 - [email protected] - www.doraz.net

Transcript of Andrea D’Orazio 28 April 2010 © 2010 - [email protected] - .

Page 1: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Data

Synchronization

Andrea D’Orazio

28 April 2010© 2010 - [email protected] - www.doraz.net

Page 2: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Cloud computing ?

Page 3: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Overview Synchronization models

• what, when, how

• versioning

• sync process

• sync architectures

• type-specific data

• algorithms & tools

Sync in middleware Case studies

• SyncML

• Funambol

Page 4: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models data has to be modified?

• NO: just a copy• YES: propagation of updated data to all

devices

• what if data is modified on two devices?

always ensure that no data is lost!

Page 5: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models WHEN to sync?

• manually• automatically

keep in mind network / connection availability!

synchronization ~ replication• store data in a number of locations• fault tolerance, but...• most copies unavailable most of the time

Page 6: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models HOW to sync?

• permit only one modifiable copy of data lock on data hub-and-spoke model

• multiple copies can be independently modified more flexible more complex to implement the sync process

Page 7: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Versioning each state of data is assigned a version

number that increases as modifications are made

2 basic rules: (α, β locations)

• if Aα3 originates from a modification(s) of Aα1,

Aα2 has a version number greater than Aα1• if Aβ4 and Aα6 originate from Aα1,

version is not comparable (both Aβ4 and Aα6 have been modified in a different location)

Page 8: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Versioning – example 1location α location β

A1

A1m=0

A1m=0

A5m=1

A1m=0

SYNC

modify

A5m=0

A5m=0

propagationm modified flag

Page 9: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Versioning – example 2location α location β

A1

A1m=0

A1m=0

A5m=1

A3m=1

SYNC

modify

propagationm modified flag

version not comparablehow to reconcile?

modify

Page 10: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process update detection

• recognition that a data has been modified

update propagation• transmission of changes among all the data

replica

reconciliation• combination of all the updated data to build

a synchronized version

Page 11: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process update detection

• recognition that a data has been modified• triggers the start of sync process• clean / dirty status (modification flag)

• modification timestamp

• hash of the content

Page 12: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process update detection

• modification timestamp store only the timestamp of last modification?

(e.g. in file system) comparison of all the timestamps

time / resource consuming! monitoring file system for changes use dir timestamp if equal to last modified contained

file

• semantics of modification time when a file timestamp is actually modified?

content modifying, file renaming, file moving?

Page 13: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process update detection

• modification / creation / access timestamps (e.g. with NTFS)

• file size• improved detection

• timestamp(s) + hash effective and safe way to compare data only the hash is transmitted for comparison

bandwidth saving can be slow if cpu / storage performance is poor

Page 14: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process reconciliation

• combination of all the updated data to build a synchronized version

opaque data (e.g. binary, pictures) ask the user which version to use

structured data (e.g. XML) edit logs state comparison both (can) use the latest common ancestor as

comparison aid

Page 15: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync process reconciliation

• type of modifications (inside a single file): insertion deletion moving changing

moving often as deletion + insertion use of unique ID per piece of structured content

Page 16: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync architectures concepts

• single user owns multi-located data

• device where data is located component

• connections between (two) devices synchronization connections

Page 17: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync architectures centralized model

• each device synchronizes with the master device

• bi-directional communication!• not all links might be established at the

same time! Amaster

B

CD

E

Page 18: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync architectures tree model

• each piece of data is branched from another one

• each device synchronizes only with its parents

• child embeds parent’s versionAmaster

B

C

D

E

D

Page 19: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Sync architectures general model

• it’s a mess!• composite version number, each part for

one device• how to locate common

ancestor?• how many older

versions stored?

AB

CF

E

D

Page 20: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization models

Type-specific data syncre

conci

liati

on c

apab

iliti

es

development effort

opaque

structure-known

unique identifier

semantics-known

application specific

Page 21: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

opaque

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Synchronization modelsType-specific - example

change

Page 22: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

structured<par>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</par>

<par>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. </par>

<par>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </par>

<par>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</par>

<par>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </par>

<par>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. </par>

Synchronization modelsType-specific - example

change

change

Page 23: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

structured + id<par id=1>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</par>

<par id=2>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. </par>

<par id=3>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </par>

<par id=1>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</par>

<par id=3>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </par>

<par id=2>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. </par>

Synchronization modelsType-specific - example

move

Page 24: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

rsync

Remote Differential Compression (RDC)• Microsoft Windows Server 2003 R2 / 2008• allows data to be synchronized between two

or more computers, using compression techniques to minimize the amount of data sent across the network.

Synchronization models

data sync algorithms & tools

Page 25: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Remote Differential Compression (RDC)

• no need of storing previous versions

• no assumptions regarding similarity or common origin of data

• independent from transfer protocol (HTTP, ...)

Synchronization models

data sync algorithms & tools

Page 26: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Remote Differential Compression (RDC)• files (to be synced) divided into chunks of data

chunk bounded using an incremental fingerprint function

• MD4 hash calculated for each chunk• comparison of MD4 lists (signature, one per file)• transfer only of missing / different chunks• can be applied recursively!

original file size 9GB signature 81MB signature of signature 6MB

Synchronization models

data sync algorithms & tools

Page 27: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

most of the sync (propagation) procedure can be generic• put it in a mw platform!• modules / hooks applications dependent

update propagation• typically relies on the network• control channel

messaging system• data channel

customized (over HTTP?)

Sync in middleware

Page 28: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

publish / subscribe

• for update propagation - data channel each edit gets immediately published as a event continuous reconciliation multiple users

• for update detection an update is advertised to all subscribers

Sync in middleware

Page 29: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Synchronization Markup Language

now integrated in: Open Mobile Alliance (OMA) Data Synchronization and Device Management

interoperable protocol to sync data

Case studies

SyncML

Page 30: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

interoperable protocol to sync data• update propagation

transfer of updates among devices

• client-server architecture

• based on “edit log” model addition, delete, replace of objects unique ID

• data type & data store independent

• notification (only!) of conflicts

Case studies

SyncML

Page 31: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

• “Many people want quick and easy access to their email, contacts, calendars, tasks and notes, regardless of where that information is stored.”

• started in 2001

• open source Java implementation of the SyncML (OMA DS) standard

• sync data with billions of phones thousands of applications and online services

(Gmail, Yahoo!, AOL, Hotmail, Outlook, ...)

Case studies

Funambol

Page 32: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

• “Funambol is the leading provider of mobile cloud sync.

• Its mobile open source platform can be used for many types of mobile applications, including push email, PIM data synchronization and device management.

• It provides C++ and Java client APIs and server side Java APIs.

• It facilitates the development, deployment and management of a wide range of mobile projects.”

Case studies

Funambol

Page 33: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Recap Synchronization models

• what data modified?

• when auto / manually

• how one / multiple copies modified

• versioning

• sync process update detection / propagation, reconciliation

Page 34: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Recap Synchronization models

• sync architectures centralized, tree, general model

• type-specific data opaque, structured, unique ID, semantic, app.

specific

• algorithms & tools rsync, Remote Differential Compression

Page 35: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

Recap Sync in middleware

update propagation

Case studies• SyncML

protocol for transfer of updates among devices

• Funambol “cloud implementation” of SyncML

Page 36: Andrea D’Orazio 28 April 2010 © 2010 - andreadorazio@gmail.com - .

• Sasu Tarkoma, 2009. Mobile Middleware, chap. 8. Wiley.

• Remote Differential Compressionhttp://msdn.microsoft.com/en-us/library/aa373254(v=VS.85).aspx

• SyncMLhttp://www.openmobilealliance.org/Technical/release_program/

ds_v1_2_2.aspx

• Funambolhttps://www.forge.funambol.org/learn

References