SpreadSheetSpace seminar at ICSI

40
Spreadsheet Composition for Collaborative Data Analysis 16/01/2015 1 Michele Stecca (ICSI & CIPI) Berkeley, January 15th, 2015

Transcript of SpreadSheetSpace seminar at ICSI

Page 1: SpreadSheetSpace seminar at ICSI

Spreadsheet Composition for Collaborative Data Analysis

16/01/2015 1

Michele Stecca (ICSI & CIPI)

Berkeley, January 15th, 2015

Page 2: SpreadSheetSpace seminar at ICSI

Service & Smart Objects Composition at CIPI (1/6)

Research activities about “Composition” at CIPI, Research Center on

Software Platform Engineering (University of Genoa & Padua, Italy)

16/01/2015 2

Service Composition(a.k.a., Mashups)

Smart Objects Composition (IoT)

Spreadsheet Composition

Page 3: SpreadSheetSpace seminar at ICSI

The Origin: Service Composition

• The Base Services are provided by software components that expose

a software interface (usually a Web API) over the Internet.

• A Composite Service is a distributed application specified and

executed through the coordinated action of a set of Base Services.

• A Composite Service is usually specified in a graphical form through

a Service Creation Platform. The Composite Service specification is

deployed in a Service Execution Platform (SEP) for execution.

16/01/2015 3

Service & Smart Objects Composition at CIPI (2/6)

Page 4: SpreadSheetSpace seminar at ICSI

16/01/2015 4

Service & Smart Objects Composition at CIPI (3/6)

SEP’s technologies: BPEL (Business Process Execution Language), Web

Services, JMS (Java Messaging Service)

Page 5: SpreadSheetSpace seminar at ICSI

Evolution #1: Smart Objects Composition (IoT)

• The Base Services are provided by software components and smart objects that expose an interface.

• A Composite Service is a distributed application specified and executed through the coordinated action of a set of Base Services.

• A Composite Service is usually specified in a graphical form through a Service Creation Platform. The Composite Service specification is deployed in a Service Execution Platform (SEP) for execution.

The combination of physical objects poses new challenges because physical objects can change their position, may disappear (e.g., they run out of battery), etc.

16/01/2015 5

Service & Smart Objects Composition at CIPI (4/6)

Page 6: SpreadSheetSpace seminar at ICSI

16/01/2015 6

Service & Smart Objects Composition at CIPI (5/6)Example: Smart Parking Composite Service (in collaboration with FIAT

Research Center)

SEP’s technologies: Asynchronous I/O, MQTT (Message Queue

Telemetry Transport)

Page 7: SpreadSheetSpace seminar at ICSI

Evolution #2: Spreadsheet Composition

Questions

• What is a Base Service?

• What is a Composite Service?

• How can we «execute» a Composite Service?

16/01/2015 7

Service & Smart Objects Composition at CIPI (6/6)

Page 8: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 8

Page 9: SpreadSheetSpace seminar at ICSI

IntroductionWhy Spreadsheet Composition?

• Data manipulation/visualization/sharing is becoming more and more important (Open Data, Big Data, Data Scientists, etc.)

• Collaborative work is becoming more and more important

• Microsoft Excel is a widely used tool for data analysis (1.1B users according to Microsoft)

• Microsoft Excel is not the best tool to use in a collaborative environment

New paradigms and tools are needed to face new challenges like real-time reporting, easy data publishing, collaborative data analysis, etc.

16/01/2015 9

Page 10: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 10

Page 11: SpreadSheetSpace seminar at ICSI

Spreadsheet Composition (1/7)Base Services: Excel spreadsheets

We need to define the «basic functionalities» provided by a spreadsheet:

• Information Publication. A Source User publishes a data range called View to a set of Target Users. (Watch Video: https://www.youtube.com/watch?v=hM5bsdgF4Mc)

• Information Collection. A User provides a data range called Form to one or more Users to have them fill it out, update it and submit it. (Watch Video: https://www.youtube.com/watch?v=41IV2pNJqy0)• In both cases, at every update of a Worksheet/Cell-Range/Table in the Source User

Spreadsheet the Target User Spreadsheets get automatically synchronized and the personalized analyses and presentations change accordingly. Very important for real-time data analysis in Excel

16/01/2015 11

Page 12: SpreadSheetSpace seminar at ICSI

Base Services: Excel spreadsheets

Information Publication steps:

• A Source User Exposes a View on a Spreadsheet to a set of Target Users, i.e.,

it grants Target Users read access rights on such a view;

• The Target Users Link their Spreadsheets to such a View.

Information Collection steps:

• A Source User Prepares a Form on a Spreadsheet to be filled out by set of

Target Users;

• Each Target User Installs the Form in his Spreadsheet;

• The Source User receives the updates from the Target Users.

16/01/2015 12

Spreadsheet Composition (2/7)

Page 13: SpreadSheetSpace seminar at ICSI

A New Paradigm: the SpreadSheet Space

16/01/2015 13

Hypertext

Internet

World Wide Web

Spreadsheet

Internet

SpreadSheet Space

Spreadsheet Composition (3/7)

Page 14: SpreadSheetSpace seminar at ICSI

14

A New Paradigm: the SpreadSheet Space

Spreadsheet Composition (4/7)

Page 15: SpreadSheetSpace seminar at ICSI

Composite Services: the Distributed Spreadsheet

• Definition• Associated to a single Virtual Spreadsheet consisting of elements belonging

to different users;

• Evolving over time;

• Based on cross Spreadsheet links.

• 2 types of Spreadsheet Compositions: • Spontaneous compositions: they are the results of peer-2-peer interactions

among Excel users; • Graphical compositions: there is a Spreadsheet Composition creator (e.g., an

Excel consultant) who defines the set of relationships among spreadsheets though a graphical tool.

16/01/2015 15

Spreadsheet Composition (5/7)

Page 16: SpreadSheetSpace seminar at ICSI

Spontaneous Composition:

the composition is a result of peer-2-peer interactions

Spreadsheet Composition (6/7)

Page 17: SpreadSheetSpace seminar at ICSI

The Graphical Toolused by the

Composite Service Creator

Spreadsheet Composition (7/7)

- the composition is defined BEFORE the interaction among users- the relationships/links are created by the platformautomatically

Page 18: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 18

Page 19: SpreadSheetSpace seminar at ICSI

The SpreadSheet Space Software Platform (1/6)

• How can the links between two Spreadsheets be implemented if it cannot be guaranteed that the Source Spreadsheet and the Target Spreadsheet are simultaneously open?

• Persistence must be provided by a Server

-> The SpreadSheet Space requires a Software Platform

16/01/2015 19

Spreadsheet AlignerPersistency Service

Page 20: SpreadSheetSpace seminar at ICSI

High Level Architecture

16/01/201520

Range/Table/Sheet SynchronizerData Plane

Link ControllerControl Plane

Form/Link Repository

The SpreadSheet Space Software Platform (2/6)

Graphical Tool

Composite SpreadsheetRepository

Spontaneous

Interactions

CompositionPlane

Page 21: SpreadSheetSpace seminar at ICSI

Data Plane…. i.e. how to Synchronize the Spreadsheets

• Persistence is provided by a Cloud Platform

• The Add-in communicates with the Server through Web APIs

• There is a publish/subscribe mechanism based on HTTP long polling for automatic updates

16/01/2015 21

The SpreadSheet Space Software Platform (3/6)

Page 22: SpreadSheetSpace seminar at ICSI

…. i.e., how to establish Links among Spreadsheets

(A) Spontaneous Compositions (a.k.a., User Explicit Configuration)• Information Publication. The Source User publishes Spreadsheet

ranges/tables/sheets to a set of Target Users.

• Information Collection. A User provides a form to one or more Users to have them fill it out, update it and submit it.

16/01/2015 22

Control Plane

The SpreadSheet Space Software Platform (4/6)

Page 23: SpreadSheetSpace seminar at ICSI

…. i.e., how to establish Links among Spreadheets

(B) Through a Graphical Composite Spreadsheet Creation Environment

ROLES:• Composite Service Creator

• Creates a Composite Spreadsheet by specifying• The Users (A, B, …) involved• The Views/Forms exposed

• Users• Enter the Composite Spreadsheet Environment• Create the appropriate Views and Forms

16/01/2015 23

The SpreadSheet Space Software Platform (5/6)

Control Plane

Page 24: SpreadSheetSpace seminar at ICSI

Implementation insights

Microsoft Excel Add-in

• Developed as a Office Add-in Frameworkcomponent (C#)

• Downloaded and installed on user terminal

• Fully integrated with Excel

SpreadSheetSpace Server

• REST Web Services technology

• Apache Tomcat

• Deployed in-the-cloud or on-premises

• Scalable and Elastic architecture

16/01/2015 24

The SpreadSheet Space Software Platform (6/6)

Page 25: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 25

Page 26: SpreadSheetSpace seminar at ICSI

26

• Companies and organizations• Expose Views of company data in the form of Worksheets;

• Spreadsheet users• Link spreadsheets to exposed views

Information System access through Spreadsheets

Information Systems: a special type of Base Service

Page 27: SpreadSheetSpace seminar at ICSI

Users Mash-Up data exposed by different sources and maintain the combined analyses/presentations synchronized with the corporate data.

SpreadSheet Space

27

Information System Integration at the Desktop

Page 28: SpreadSheetSpace seminar at ICSI

The SpreadSheet Space Platform for Information System Access

16/01/2015 28

SpreadSheetSpace Server

Firewall/Proxy

Aware Network

SSS Addin SSS Addin

Public

APIs

Adaptor

Adaptor

Information System 1

Information System 2

Page 29: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 29

Page 30: SpreadSheetSpace seminar at ICSI

Google Sheets vs. SpreadSheetSpace

• Google Sheets is about sharing

• It is not Excel! (Limited functionalities and compatibility problems)

• Symmetry - The users that share a Spreadsheet have the same access rights. They can read it and write it freely.

• Spreadsheet level granularity –Sharing applies to Spreadsheets and not on parts of them. Either a Spreadsheet is shared or it is not.

16/01/2015 30

• SpreadSheetSpace is about linking

• It’s Excel!

• Asymmetry – The user roles are complementary. By exposing a View a source user grants the target users read access rights on it. By linking to a View the target users create an image of it in their Spreadsheets.

• Cell level granularity – Users are allowed to expose worksheets, cell ranges and tables while maintaining the rest of the Spreadsheet private.

Page 31: SpreadSheetSpace seminar at ICSI

Dynamic Data

• SpreadSheet Space focuses on Dynamic Data, i.e. on data that evolve over time.

• One specific case of Dynamic Data is that of the Open Data.

• In Dynamic Data the «Export to Excel» functionality, offered by most Information Systems, is meaningless. Saving a view provided by an Information System in Excel format means taking a picture of the Information System situation at the saving time.

• The Link functionality offered in the SpreadSheet Space enriches the Excel analysis tools by guaranteeing synchronization between the Excel views and the actual Information System situation.

16/01/2015 31

Page 32: SpreadSheetSpace seminar at ICSI

Scalable Information System Access (1/2)

• Excel can access external Information Systems through built-in query functionalities.

• Dynamic data evolution can be captured through polling, which injects a tremendous load on the Information Systems.

• SpreadSheet Space provides a Publish/Subscribe service which eliminates polling.

• The load to support interaction is transfered from the Information Systems to the SpreadSheet Space Platform.

16/01/2015 32

Page 33: SpreadSheetSpace seminar at ICSI

Scalable Information System Access (2/2)

Native Excel Functionality

16/01/2015 33

With SpreadSheetSpace

SELECT * FROM Table1SELECT * FROM

Table1

SELECT * FROM Table1

SELECT * FROM Table1

View 1View 1View 1

Page 34: SpreadSheetSpace seminar at ICSI

SpreadSheet Space

Some Users may expose personalized views of corporate data to other end users.

34

SYNC + DSS

Spreadsheet Ecosystems

Combining Information System Access and direct Excel links

Page 35: SpreadSheetSpace seminar at ICSI

Manual vs. Automatic Spreadsheet Update

• Manual Update • The Target Users

• are requested to confirm acceptance of View updates, and

• can scan the update history.

• Automatic Update• All the target users are “in sync” with the exposed Views

• Data Integrity (no different data versions) is guaranteed.

16/01/2015 35

Page 36: SpreadSheetSpace seminar at ICSI

Easy Publication of Tabular Contents and of Graphical Presentations• Very important for Open Data

• Although Excel already offers functionalities to publish data, yet a certain degree of experience on publishing is necessary.

• The SpreadSheet Space Platform turns out to be a easy to use Content Management System for tabular data and graphical presentations.

• A TabularData/Presentation repository enables the development and the diffusion of Data Marketplaces in the SpreadSheet Space.

16/01/2015 36

Page 37: SpreadSheetSpace seminar at ICSI

Outline

1. Introduction

2. Spreadsheet Composition

3. The SpreadSheet Space Software Platform

4. Information System Access from the SpreadSheet Space

5. Discussion

6. Summary

16/01/2015 37

Page 38: SpreadSheetSpace seminar at ICSI

Summary

• The SpreadSheet Space is a space in which the Excel files connected to each other and/or connected to external Information Systems can live.

• Spreadsheet Composition is a special case of Service Composition.

• Spreadsheet Composition was developed in two directions, namely Excel to Excel interconnection and Excel to Information System interconnection.

• Special features: Composite Spreadsheets, Linking vs Sharing, Dynamic Data, Ease of Publication.

16/01/2015 38

Page 39: SpreadSheetSpace seminar at ICSI

16/01/2015 39

Email: [email protected]

Twitter: @steccami

We are looking for early adopters

www.spreadsheetspace.net

Page 40: SpreadSheetSpace seminar at ICSI

References• Stecca M.; Maresca M., An Architecture for a Mashup Container in Virtualized Environments,

Cloud, pp.386-393, 2010 IEEE 3rd International Conference on Cloud Computing, 2010

• Baglietto P., Maresca M., Stecca M., Moiso C., Hybrid Composition of Telecom and Internet Services: The Telecom Operator Perspective, Intelligence in Next Generation Networks (ICIN), 2013 17th International Conference on , vol., no., pp.160-167, Venice, 15-16 Oct. 2013.

• Stecca M.; Maresca M., Mashup Patterns from Service Component Taxonomy, Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on Digital Object Identifier: 10.1109/WAINA.2010.96 Publication Year: 2010 , Page(s): 12 - 17

• Stecca M., Fornasa M., Baglietto P., Maresca M., Scalable Service Composition Execution throughAsynchronous I/O, In Proceedings of the 2013 IEEE International Conference on Services Computing (SCC '13). IEEE Computer Society, Washington, DC, USA, 312-319

• Baglietto P., Cosso F., Fornasa M., Mangiante S., Maresca M., Parodi A., Stecca M., Always-on Distributed Spreadsheet Mashups, Proceedings of Mashups 2010: 4th International Workshop on Web APIs and Services Mashups, Ayia Napa, Cyprus, December 2010

• Baglietto P., Maresca M., Stecca M., Moiso C., Smart Object Cooperation through Service Composition, Intelligence in Next Generation Networks, 2011. ICIN 2011. 15th International Conference on., pp. 133-138, Berlin, Germany, 4-7 October 2011

16/01/2015 40