Versioning system in Graspeo
Nikita Grishin, EPFL, 17.04.2014 Supervised by Andrii Vozniuk
Motivation
Every document management system has the versioning.
Motivation
Every document management system has the versioning.
Various resources from various locations need to be tracked back
Motivation
Every document management system has the versioning.
Various resources from various locations need to be tracked back
The collaborators system implies the recovery system
State of the art
• Mercurial, SVN/CVS!
• On document change, only modifications are saved
• Easy to track modifications
• Difficult to recover old versions
• Git
• On file change, new version of file is created
• Easy to recover old versions
• Difficult to track modifications
• Mercurial, SVN/CVS!
• On document change, only modifications are saved
• Easy to track modifications
• Difficult to recover old versions
• Git
• On file change, new version of file is created
• Pointer to the last version
• Easy to recover old versions
• Difficult to track modifications
Only for text documents.
Works very bad with media content.
• Snapshots: new version of document for each modification
• Snapshots: new version of document for each modification
• Recovering the older version: creates a copy of the asked version on the top of the last version
• Snapshots: new version of document for each modification
• Recovering the older version: creates a copy of the asked version on the top of the last version
• Some old versions can be removed by Google in case of lack of space
• Snapshots: new version of document for each modification
• Recovering the older version: creates a copy of the asked version on the top of the last version
• Some old versions can be removed by Google in case of lack of space
• Only users that can edit the document can see its modification history
• Hard drive snapshots.
Apple TimeMachine
• Hard drive snapshots.
• Keeps:
• Hourly backups for last 24 hours
Apple TimeMachine
• Hard drive snapshots.
• Keeps:
• Hourly backups for last 24 hours
• Daily backups for the past month
Apple TimeMachine
• Hard drive snapshots.
• Keeps:
• Hourly backups for last 24 hours
• Daily backups for the past month
• Weekly backups until your backup drive is full.
Apple TimeMachine
• Hard drive snapshots.
• Keeps:
• Hourly backups for last 24 hours
• Daily backups for the past month
• Weekly backups until your backup drive is full.
• When your backup drive is full, TimeMachine removes old backups to free space.
Apple TimeMachine
Graspeo versioning system solution
Versioning
Graspeo versioning system solution• Snapshots: new version of item for
each modification
Versioning
Graspeo versioning system solution• Snapshots: new version of item for
each modification
• Any modification in Space generates a new version of Space and of all its parents
Versioning
Graspeo versioning system solution• Snapshots: new version of item for
each modification
• Any modification in Space generates a new version of Space and of all its parents
• Each file modification creates a new version of that file
Versioning
Graspeo versioning system solution• Snapshots: new version of item for
each modification
• Any modification in Space generates a new version of Space and of all its parents
• Each file modification creates a new version of that file
• File or Space removing affects only space. The file itself stays in the database
Versioning
Graspeo versioning system solution• Snapshots: new version of item for
each modification
• Any modification in Space generates a new version of Space and of all its parents
• Each file modification creates a new version of that file
• File or Space removing affects only space. The file itself stays in the database
• Recover the old version remains to clone it and put on the top of versions tree
Versioning
How does it work?• Each item has two new fields: originId and version
How does it work?• Each item has two new fields: originId and version
• OriginId: id of the first version of item. It is used to track the item modifications during the time. Is the same for all versions of the item
How does it work?• Each item has two new fields: originId and version
• OriginId: id of the first version of item. It is used to track the item modifications during the time. Is the same for all versions of the item
• Version: Value indicating the version number of the item.
How does it work?• Each item has two new fields: originId and version
• OriginId: id of the first version of item. It is used to track the item modifications during the time. Is the same for all versions of the item
• Version: Value indicating the version number of the item.
• Path value represents now the logical path: it is constructed of originId values (not id).
How does it work?• Each item has two new fields: originId and version
• OriginId: id of the first version of item. It is used to track the item modifications during the time. Is the same for all versions of the item
• Version: Value indicating the version number of the item.
• Path value represents now the logical path: it is constructed of originId values (not id).
• So now a query on path will return all children of all versions of the item.
How does it work?• New version of item contains the same data that the previous
version, except id and version fields:
• id field is completely new and unique
• version field is incremented by 1
How does it work?• New version of item contains the same data that the previous
version, except id and version fields:
• id field is completely new and unique
• version field is incremented by 1
• Parents are updated recursively:
• New version of parent is created
• Subitems array of parent’s new version document is updated in order to remove an id of previous version child
How does it work?• GridFS is implemented in order to store all files versions.
• Every version of each file is stored in GridFS with its unique id
• Resource document has a field gfsId that references to the file in GridFS
How does it work?• GridFS is implemented in order to store all files versions.
• Every version of each file is stored in GridFS with its unique id
• Resource document has a field gfsId that references to the file in GridFS
• On the file system, only the last version of file is present.
• When file’s version is updated, old version is removed from file system storage and is replaced by the new version from GridFS
• Motivation: BitTorrent Sync, performance
Future PlansSemester project deadline: 06.06.2014
Future Plans• Implement versioning while removing items (end of this week)
Future Plans• Implement versioning while removing items (end of this week)
• Create API for versions recovery (end of April)
Future Plans• Implement versioning while removing items (end of this week)
• Create API for versions recovery (end of April)
• Create a front-end for versioning
Future Plans• Implement versioning while removing items (end of this week)
• Create API for versions recovery (end of April)
• Create a front-end for versioning
• Implement versioning while using BTSync
Thank you for your attentionQuestions