Research BYTE - Data Virtualization

5
IT Research BYTE https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs Originator: ET Research/TBayne Record Series: 02.02 Ford Motor Company Proprietary Page 1 of 5 Retention Period: S, O Record Type: Official Printed Copies are Uncontrolled Data Virtualization: Don’t Integrate Data Without It  June 30, 2011  _____________________________________________________ While it's true that traditional approaches such as enterprise data warehousing (EDW) and data quality continue to evolve as requirements change, there is also a growing recognition among today's best DW thinkers that the EDW is insufficient as the sole focal point for all data integration and data quality. More IT groups are deploying data virtualization to complement their EDW investments. That's because data virtualization delivers the flexibility and agility that the traditional approaches were never designed to do.  Robert Eve, Composite Software   _____________________________________________________ Introduction Data is the lifeblood of Information Technology (IT), and we are regularly challenged with how to access relevant data. Data access scenarios frequently involve the need to obtain data from more than one source. As a result, a da ta access challenge typically involves a data integration challenge as well. Data integration involves combining data residing in different sources in order to provide users, applications, and other consumers with a unified view of this data. Data integration can be accomplished by different means and with different types of tools. To help structure analysis and comparison of data integration approaches, Gartner has defined four critical data delivery styles: Bulk data movement The use of Extract, Transform and Load (ETL) to ph ysically consolidate data from master/source databases and formats into repositories such as data marts or data warehouses.  Federated views (a.k.a. data virtualization)  Query execution against multiple data sources to provide virtual, integrated views of data in memory (rather than ph ysically moving the data and persisting the integrat ed view in a data store). The results can be provided in multiple forms (e.g., relational row set, XML or Web services interface). Message-oriented movement (a.k.a. enterprise application integration) Data is packaged as messages that various application can consume so as to exchange data in real time. Data replication and synchronization Data is synchronized between two or more database management systems (DBMSs) and schemas. While certain styles are more prevalent than others, each style has its place and contribution in an efficient and effective data integration toolbox. This paper focuses on federated views,a data integration delivery style that Ford does not currently have in its toolbox. Robert Eve from Composite Software a vendor focused on data virtualization—provides the following summary of the what’s and why’s characteristic of this approach. Data virtualization integrates data from multiple, disparate sources in an abstracted, logically consolidated manner for Topics Information Layer Interoperability Adapters & Toolkits Application Tech Layer Business Info Services By Todd Bayne [email protected]  "Data virtualization is an agile data integration delivery style whose time has come." Glossary Data integration the process of combining data from a heterogeneous set of data stores to create one unified view of that data. (Rick van der Lans) Data federation* a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on-demand data integration. (Rick van der Lans) Data virtualization* the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology. (Rick van der Lans) * Some view the terms data virtualization, data federation, and enterprise information integration (EII) as basic equivalents, whereas otherssuch as Rick van der Lansattempt to draw distinctions.

Transcript of Research BYTE - Data Virtualization

Page 1: Research BYTE - Data Virtualization

8/4/2019 Research BYTE - Data Virtualization

http://slidepdf.com/reader/full/research-byte-data-virtualization 1/5

IT Research BYTE 

https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs 

Originator: ET Research/TBayne Record Series: 02.02Ford Motor Company Proprietary Page 1 of 5 Retention Period: S, ORecord Type: Official Printed Copies are Uncontrolled

Data Virtualization: Don’t Integrate Data Without It June 30, 2011

 _____________________________________________________ 

―While it's true that traditional approaches such as enterprise

data warehousing (EDW) and data quality continue to evolve asrequirements change, there is also a growing recognition amongtoday's best DW thinkers that the EDW is insufficient as the solefocal point for all data integration and data quality. More ITgroups are deploying data virtualization to complement theirEDW investments. That's because data virtualization delivers theflexibility and agility that the traditional approaches were neverdesigned to do.‖  Robert Eve, Composite Software 

 _____________________________________________________ 

Introduction

Data is the lifeblood of Information Technology (IT), and we are regularlychallenged with how to access relevant data. Data access scenarios frequentlyinvolve the need to obtain data from more than one source. As a result, a dataaccess challenge typically involves a data integration challenge as well.

Data integration involves combining data residing in different sources in order toprovide users, applications, and other consumers with a unified view of this data.Data integration can be accomplished by different means and with different typesof tools. To help structure analysis and comparison of data integrationapproaches, Gartner has defined four critical data delivery styles:

Bulk data movementThe use of Extract, Transform and Load (ETL) to physically consolidatedata from master/source databases and formats into repositories such as

data marts or data warehouses.

  Federated views (a.k.a. data virtualization) Query execution against multiple data sources to provide virtual,integrated views of data in memory (rather than physically moving thedata and persisting the integrated view in a data store). The results canbe provided in multiple forms (e.g., relational row set, XML or Webservices interface).

Message-oriented movement (a.k.a. enterprise application integration)Data is packaged as messages that various application can consume soas to exchange data in real time.

Data replication and synchronizationData is synchronized between two or more database managementsystems (DBMSs) and schemas.

While certain styles are more prevalent than others, each style has its place andcontribution in an efficient and effective data integration toolbox. This paperfocuses on ―federated views,‖ a data integration delivery style that Ford does notcurrently have in its toolbox. Robert Eve from Composite Software—a vendorfocused on data virtualization—provides the following summary of the what’s andwhy’s characteristic of this approach. ―Data virtualization integrates data frommultiple, disparate sources in an abstracted, logically consolidated manner for

Topics

Information LayerInteroperability

Adapters & Toolkits

Application Tech LayerBusiness Info Services

By Todd [email protected] 

"Data virtualization is an agile data integration delivery style whose time has come." 

GlossaryData integration – the process ofcombining data from aheterogeneous set of data storesto create one unified view of thatdata. (Rick van der Lans)

Data federation* – a form of datavirtualization where the datastored in a heterogeneous set ofautonomous data stores is madeaccessible to data consumers as

one integrated data store byusing on-demand dataintegration. (Rick van der Lans)

Data virtualization* – the processof offering data consumers a dataaccess interface that hides thetechnical aspects of stored data,such as location, storagestructure, API, access language,and storage technology. (Rickvan der Lans)

* Some view the terms datavirtualization, data federation, and

enterprise information integration(EII) as basic equivalents,whereas others—such as Rickvan der Lans—attempt to drawdistinctions.

Page 2: Research BYTE - Data Virtualization

8/4/2019 Research BYTE - Data Virtualization

http://slidepdf.com/reader/full/research-byte-data-virtualization 2/5

IT Research BYTE 

https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs 

Originator: ET Research/TBayne Record Series: 02.02Ford Motor Company Proprietary Page 2 of 5 Retention Period: S, ORecord Type: Official Printed Copies are Uncontrolled

consumption by nearly any front-end business solution, including businessintelligence and analytics. By accessing the data from original or alreadyconsolidated data warehouse sources, data virtualization avoids the need foradditional physical consolidation and replicated storage, making it faster to buildintegration approaches.‖ 

AnalysisAt its basic level, data virtualization is comprised of:

1. Execution of distributed queries against multiple data sources

2. Federation of query results into virtual, integrated views

3. Consumption of those views by applications, users with query andreporting tools or other infrastructure components.

A variety of benefits areprovided by the fact thatvirtualization introduces astratum of abstraction above thephysical implementation of data.Among those benefits areshielding the consumers fromthe complexities and changes atthe data source layer.

Data federation has beenaround in less robust forms for acouple decades (e.g.,distributed queries, materializedviews). More recently, advances in the design, maintenance functions andperformance of data federation have led Philip Russom of TDWI to declare that

―data federation is finally ensconced as a DI [data integration] technique‖ (TDW IBest Practices Report: Next Generation Data Integration. Second Quarter 2011).

As data federation has matured, it has been able to assemble a collection ofcompelling capabilities including:

Access to and transformation of non-relational data sources, such asXML documents, sequential files, multidimensional OLAP cubes, SOAP-based services, and Java components.

Seamless integration of non-relational data with relational data.

A powerful abstraction layer to shield changes at the data source layer.

A distributed query optimizer and execution engine which calculates anefficient way to join remote data sets and perform the needed

transformations.

Results which can be provided as relational data, XML, or as webservices.

An ability to support updates and manage the logic to coordinatechanges—including rollback if necessary—across source systems

Check out the 2-minute DataVirtualization ―explainer‖ fromComposite Software

Page 3: Research BYTE - Data Virtualization

8/4/2019 Research BYTE - Data Virtualization

http://slidepdf.com/reader/full/research-byte-data-virtualization 3/5

IT Research BYTE 

https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs 

Originator: ET Research/TBayne Record Series: 02.02Ford Motor Company Proprietary Page 3 of 5 Retention Period: S, ORecord Type: Official Printed Copies are Uncontrolled

While data virtualization has become more powerful, it is not a replacement forexisting data integration approaches. Bulk data movement (i.e., ETL) is the mostpredominate style of data integration across a spectrum of use cases, and it willlikely remain so for the foreseeable future. Data virtualization should be viewedas a complementary data integration approach with its own unique sweet spotand strengths.

Data virtualization can be applied to a variety of use cases, such as:

Applications/Serviceso  Data Services layer in SOAo 360-Degree Views (customers, products, suppliers)o Operational reporting (real-time data)o Data discovery and modeling

Data Warehousing (DW)o Augment DW with additional datao Prototype before persisting DW changeso Federate DWs and/or data martso Augment ETL

Business Intelligence (BI)o Provide analytical sandboxeso Prototype BI solutionso Accelerate deployment of limited lifetime solutions

In 2010, Enterprise Technology-Research partnered with the Bill of MaterialsFoundation (BOMF) team to perform a data virtualization proof-of-concept (PoC)that concentrated on providing a BOM data services layer . In our increasinglyinterconnected and interdependent data world, BOMF is a classic example ofreliance upon data from systems of record outside of one’s direct control. Similarto numerous other Ford applications/programs, a production BOMFimplementation would replicate data in the database tier to ensure availabilityand accessibility of data needed from such ―external‖ sources. The BOMF PoC

The natural sweet spot for datavirtualization occurs under thefollowing conditions:

Consistently availablesource systems

Capacity in sourcesystems to handleadditional queries

Robust network, notprone to outages ordelays

Moderate data volumes

returned by the queries Modest transformation

& cleansing required forintegrating the data

Desire for current data(i.e., real time, near realtime), in a relational orXML format

Caching is the primary example ofa mechanism used tocompensate for a situation thatlies outside data virtualization’s natural sweet spot.

Page 4: Research BYTE - Data Virtualization

8/4/2019 Research BYTE - Data Virtualization

http://slidepdf.com/reader/full/research-byte-data-virtualization 4/5

IT Research BYTE 

https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs 

Originator: ET Research/TBayne Record Series: 02.02Ford Motor Company Proprietary Page 4 of 5 Retention Period: S, ORecord Type: Official Printed Copies are Uncontrolled

took an alternative approach in which a commercial data virtualization productwas used to:

Validate out-of-the-box capability for connectivity to and integration ofnumerous BOMF data sources (e.g., SOAP, XML over HTTP, Excel file,XML file, relational database containing XML CLOB)

Publish relational views and web services (using both relational and non-relational source systems)

Implement BOM Services retrieval and update use cases, as well asdemonstrate Excel consumption of virtualized data

Ford’s experience during the BOMF PoC affirmed that a leading product focusedon this data integration delivery style—data virtualization—is a capable and

attractive approach for accessing and integrating data from a variety ofheterogeneous sources. Thus data virtualization shows promise as a techniqueto reduce/avoid the physical data replication that is prevalent today. Datavirtualization also offers an easier, faster, and more flexible/reusable approachfor integrating many types of data sources when compared to writing application-specific ―connectors‖—another ugly reality of the approach often used byapplications today.

Recommendations

Data virtualization is a powerful data integration delivery style whose time hascome. Not only is it well suited to providing data services in a SOA context, butthere are attractive use cases in the analytical domain. In particular, the roledata virtualization could play in prototyping data warehouse changes and BI

solutions—along with enabling self-service scenarios—warrants furtherconsideration. The flexible, nimble nature of data virtualization—from both adevelopment and deployment perspective—lines up nicely with Ford’s goal toincorporate Agile development principles and practices into the ApplicationDevelopment organization. In addition, the real-time nature of data virtualizationaligns with the trend toward reduced data latency requirements and operationaldecision making scenarios.

The PoC with BOMF leveraged an integration toolset from a vendor exclusivelyfocused on data virtualization. While the product appears to be a strong

CLOB – Character Large Ob ject.A collection of character data,with a specified character

encoding, in a databasemanagement system.

―We all live every day in virtualenvironments, defined by ourideas.‖ 

Michael Crichton

Core BOMDatabase

(BOMF)Part Master

(MPNR) Part

Releases

(WERS)

Part Costs

(WIPS)

Oracle

DB2Mainframe

Team Works BPM Other

Consumers

etc…

tbd

Web

Service

HTTP UAPI ODBC

BOM Services

via

Data Virtualization

Page 5: Research BYTE - Data Virtualization

8/4/2019 Research BYTE - Data Virtualization

http://slidepdf.com/reader/full/research-byte-data-virtualization 5/5

IT Research BYTE 

https://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs 

Originator: ET Research/TBayne Record Series: 02.02Ford Motor Company Proprietary Page 5 of 5 Retention Period: S, ORecord Type: Official Printed Copies are Uncontrolled

contender, some form of tool evaluation (e.g., RFI, RFQ) would need to beconducted at such point that Ford moves forward with data virtualization. Fordcurrently uses several of IBM’s InfoSphere tools, including DataStage for ETL(Gartner’s ―bulk data movement‖ data integration style of delivery). Given Ford’sinvestment in the InfoSphere platform and the partnership with IBM, theInfoSphere Federation Server should be a data virtualization tool candidate in

such an evaluation.

Enterprise Technology-Research has completed the E2 stage (Evaluate) of theE

4Innovation Process for data virtualization. At this point, a champion outside of

the research organization is required in order to progress data virtualizationwithin Ford. That champion might be a shared service, a program with broaddata integration needs, or … it might be you! If you agree that data virtualizationcan provide significant value when added to your data integration toolbox, reachout and we can partner with you to take the next step in achieving value fromdata virtualization.

Where to Learn More  What is Data Virtualization? by Enterprise Technology – Research

  TDWI: Business Intelligence & Data Warehousing Education & Research(Free registration is required to access some resources.)

o  The Case for Data Virtualization (Revisited) 

o  Q&A: Why Data Virtualization is More Relevant than Ever 

o  How Data Federation Can Co-exist with an EDW 

o Webcast. Data Federation: Expanding the Data IntegrationToolbox 

o 2011Q2 Best Practices Report: Next Generation Data Integration 

  ―Critical Capabilities for Data Integration Tools: Common Data DeliveryStyles.‖ 21 December 2010. ID G00208131. Gartner. 

  Clearly Defining Data Virtualization, Data Federation, and Data

Integration.  BeyeNetwork. (Free registration is required to access someresources.)

Bottom Line

Gartner has observed that enterprises increasingly acknowledge diverse dataintegration problems, and that these problems require equally diversearchitectural styles for data delivery. There is a growing recognition that datavirtualization should be a core capability/delivery style in an enterprise’s dataintegration toolbox. Ford’s experimentation with data virtualization supports thisperspective, and the ―agile‖ nature of data virtualization aligns nicely with br oaderIT objectives and business needs. Ford should add data virtualization to its dataintegration toolbox to increase agility and expand the breadth of integration

scenarios that can be tackled effectively and efficiently.

Comments? Join the discussion athttps://comm.sp.ford.com/sites/researchcomm/site/ ResearchBriefs Reviewed By:

Tony Bailey