End User Informatics

57
Informatics Ambareesh Kulkarni

Transcript of End User Informatics

Page 1: End User Informatics

InformaticsAmbareesh Kulkarni

Page 2: End User Informatics

Informatics defined

• Informatics is the application of technology to bring Data, People and Systems together

• Bioinformatics is very Complex representation of Simple data

• Cheminformatics is very Simple representation of Complex data

2

Page 3: End User Informatics

Current State

Page 4: End User Informatics

Problem Statement….

“There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.”

Quote by CIO of Major Corporation

Page 5: End User Informatics

Integrated Solutions - Business Case:IDC White Paper

• Information Tasks– Email – 14.5 hours a week– Create documents – 13.3 hours a week– Search – 9.5 hours a week– Gather information for documents – 8.3 hours a

week– Find and organize documents – 6.8 hours a week

• Gartner: “Organizations spend an estimated $750 Billion annually seeking information necessary to do their job.”

Page 6: End User Informatics

• Time Wasted (per year)– Reformat information - $57 million per

10,000 users– Not finding information - $53 million

per 10,000 users– Recreating content - $45 Million per

10,000 users

Data Integration- Business Case:IDC White Paper

Page 7: End User Informatics

• Reduce development costs, cycle times– Increase employee efficiency

– Less time looking, more time doing

• Enhance communication– Capture and reuse knowledge

– Innovate better & faster

• Cost of not finding right information– Business – lost money, opportunities

Data Integration - Business Case: General ROI issues IDC White Paper

Page 8: End User Informatics

Key Takeaways

• Data Integration is not easy and represents ~80% of effort for a typical data integration project.

• Incompatible data are the largest, most expensive, and time-consuming portion of IT projects.

• Most data is in an unstructured format (outlook, word, PDF, images etc.)

8

Page 9: End User Informatics

Evolution of data integration technologies

Page 10: End User Informatics

Evolution of Integration Architectures

Point to Point HUB + Spoke HUB + EII

Page 11: End User Informatics

Defining EII, EAI, ETL Data Integration

EII EAI

Enterprise Information Integration Enterprise Application Integration

Reports from multiple apps/data sources

Transactions to multiple apps

e.g. Real-time access to product silos for customers, employees

e.g. Compound name change in one application propagated to other products

EII ETL

Real-time Batch

Extract, Transform, Reportin real-time

Extract, Transform, Load;later report on data warehouse

e.g. report data from operational applications

e.g. build duplicate reporting data mart and/or redesign data warehouse

Page 12: End User Informatics

Enterprise Application Requirements

Tools vs. Development Platform

Tools

Development Platform

Page 13: End User Informatics

What do end users really care about?

• The Internet has raised the bar for Informatics expectations

• Complex Query? Millions of Rows? Full table Scan?

• Users don’t really care. If they can view stock prices in real time, why not corporate data.

• In an ideal world, data analysis needs to be at speed of thought.

• Bigger, better, faster, cheaper

Page 14: End User Informatics

Business users view

Data

Pipeline Pilot

Reports

Page 15: End User Informatics

IT perspective

Page 16: End User Informatics

Key Takeaways

• Provide an Integrated view of data across multiple systems; flat files, data warehouses , data marts.

• Avoid “boiling the ocean” Jump start data integration efforts with PP to quickly meet an important user requirement and then decide if the data should be persisted in a data warehouse or data mart.

16

Use Pipeline Pilot to:

Page 17: End User Informatics

Action from InsightData is a New form of Energy

Page 18: End User Informatics

Why is data integration so important?

18

• Data in any organization is distributed in various disconnected and disparate systems

• There is always a need to combine most current data with historical values

• The success of the internet has created data sources outside the internal network

• Data has informational value only when combined with other & related data

Page 19: End User Informatics

WARNING SIGNS : Of Poor Data Integration

19

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

Presentations or discussions that are prefaced with statements like “most of our analysis would have been accurate, except for the missing data from….” or“Due to discovery of data not included in the last analysis , we are reversing our decision to……”

Page 20: End User Informatics

WARNING SIGNS : Of Poor Data Integration

20

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

As a result of an out-of-order condition for a critical chemical, a scientist must expedite the order and pay a premium price.When the chemical arrives the scientist (or worse her boss) discovers that another division had excess quantity of the same chemical and was looking to sell it at a discount.

Page 21: End User Informatics

WARNING SIGNS : Of Poor Data Integration

21

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

Scientists argue about the fact that analysis results differ-even though the data came from the same operational data source

Page 22: End User Informatics

WARNING SIGNS : Of Poor Data Integration

22

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

A technician alerts his management team of scientists to a potential problem discovered while running a query against a database.The technician cannot, however, answer the follow-up question , ” How long has the problem existed?”

Page 23: End User Informatics

WARNING SIGNS : Of Poor Data Integration

23

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

A Scientist runs a report every week against a LIMS, however to see a period-to-period comparison, the scientist maintains a spreadsheet into which he creates a new column every week and enters the data manually

Page 24: End User Informatics

WARNING SIGNS : Of Poor Data Integration

24

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

A customer calls tech. support to enquire about a pending case. While the customer support engineer has access to the case details, has no information available on whether the customer is current on maintenance, how many end-users they are licensed for or what options the customer has purchased.

Page 25: End User Informatics

WARNING SIGNS : Of Poor Data Integration

25

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

Minor change-requests take weeks to be implemented, any modifications have to be thoroughly tested for accuracy and integrity,

Page 26: End User Informatics

WARNING SIGNS : Of Poor Data Integration

26

• Incomplete Data foundation• Inability to consolidate data

from multiple sources• No single version of the truth• Poor audit trail and data

lineage• Historical values not retained

in a data warehouse or data mart

• Lack of integrated 360 deg view

• High cost of maintaining “one-time” in-house code

• Inability to comply with regulatory requirements

CEO and CFO are uncomfortable signing off on the quarterly numbers as there is no way to trace the numbers back to the source systems.

Page 27: End User Informatics

Case Study (closer to home): Services Order Report

• Poor data quality• Redundant information• Duplicate entries• Hard to read• Huge amount of time required to clean it up

Page 28: End User Informatics

Information-sensitivity

• Data Availability and Accessibility• Data Quality

– DQ = Completeness X Validity– E.g. Measure of Completeness = # of null values in a column– E.g. Measure of Validity = “ We have 4 regions, but there are 18

distinct values in the region column”– Pitfall: Don’t take accountability for DQ on the source system– Push accountability where it belongs, in the source system(s)

• Timeliness of Data, relevant to the questions being asked by the user

• SQL and programming accuracy

Information Quality is a Direct Function of……

Page 29: End User Informatics

Case Study (closer to home): Internal Revenue Forecasting process

Orders QTD                                                                               Pipeline                                                                                     Delivered Forecast   

Run the Services Products and Orders report in RSVPP ……; Export out the results and filter for product services (Column AM) and sum the Total Sale Price USD column

Run the Services Opportunities report in SFDC;export out the result……

Assuming Access is up to date………..; export to Excel; filter by product services and sum USD Amount columnAssuming Access is up to date run the Total Forecast report;

Export to Access; …………

Page 30: End User Informatics

Near real-time data access

Page 31: End User Informatics

Extract, Transformation & Load=Push big data

• Batch extract from transaction systems• Bulk transformation• Push load into data warehouse

Extract Load

Transformation

Data Warehouse

Real Time

Page 32: End User Informatics

32

Pipeline Pilot and Real time Data access

Data Access Data Adapters

Data Transformation Transform Calculate Security

Relational Flat Files ERPLegacy EJBXML

<XML>

Information Access Web Services ODBC JDBC

• Flexible Data Access capabilities• Single access point to data

• Consumer sees only the end result

• Shared platform service• Available to all technologies

• Reusable building blocks• Targeted to specific needs

• Reduces costs and time to market

• Supports incremental development

Page 33: End User Informatics

Case Study: PI Historian

33

• PI Historian, product provided by OSI, captures data real-time from the research test rigs

• Data capture in PI is triggered by events• PP allows scientists to read the data from PI historian as it

becomes available and also combine it with other information (e.g. associate real-time test data with historical characteristics of a catalyst

Page 34: End User Informatics

Data provisioning pros and cons

OLTP ReplicationData Marts

Enterprise Data Warehouse

Pipeline Pilot

Data QualityEase of enquirySystem PerformanceHistory

Scalability

Speed to information

Page 35: End User Informatics

Data IntegrationTotal Cost of Ownership

Really Matters

Page 36: End User Informatics

1 “Just give me a list of compounds from the database, sorted by compound name”

Evolution Of an Informatics System

Page 37: End User Informatics

“We also need to see the related toxicology information and for the list to be grouped by compound”

12

Evolution Of an Informatics System

Page 38: End User Informatics

“We’d like to get a list of some of the related compound information, too, grouped by the first letter of the compounds name.”

12

3

Evolution Of an Informatics System

Page 39: End User Informatics

“Actually, we’d like to be able to produce a completely separate report for compound and related toxicology information .”

12

3

4

Evolution Of an Informatics System

Page 40: End User Informatics

Evolution Of an Informatics System

“We don’t like running the reports manually. Can they be scheduled?”

12

3

4

5

Page 41: End User Informatics

Evolution Of an Informatics System

“We have quite a few users using this system now and there’s some fairly sensitive data in there.”

12

3

5

6

4

Page 42: End User Informatics

“We need to be able to drill down into more detail”

7

12

3

5

6

4

Evolution Of an Informatics System

Page 43: End User Informatics

7

8

12

3

5

6

“We need to track which users have used what Protocols”

4

Evolution Of an Informatics System

Page 44: End User Informatics

“We need to be able to easily search the information we need.”

9

6

8

4

7

12

3

5

Evolution Of an Informatics System

Page 45: End User Informatics

Evolution Of an Informatics System

9

6

8

4

7

12

3

5

“We need these reports linked to our business process”

“We need to be able to approve or reject the reports”

“We need a single version of the truth”

“We don’t want to be waiting around for the results”

“We don’t want to be re-typing information from these reports into our other application”

“We need to be able to see the underlying detail”

“We need to print the reports out to take into meetings”

“We need the output as Excel”

“We need charts”

“We need to know who’s looked at the reports”

“We need a simple way to see the entire contents of the report”

“We need a report that looks like an existing flow chart”

Page 46: End User Informatics

Hidden Costs

• Organizations that believe that they can build a data integration solution at the fraction of cost of a COTS solution….

• Discover that any savings in up-front costs are very quickly incurred multiple times over the lifetime of the solution

• Typical effort to build a custom data integration solution can be upwards of 5000-5500 man days

• Some of the tasks that need to be undertaken to provide a functioning solution:

Application Architecture Data cleansing & enrichment services

Integration framework

User Interface design Common field matching Security

Batch processing capabilities

Application Integration Audit & Logging capabilities

Page 47: End User Informatics

Build versus Buy Decision Criteria

47

Data Integration Considerations Build your own Buy

Initial Start-up cost Lower Higher

Ongoing Operating cost Higher Lower

Ongoing Support & Maintenance In-house responsibility Vendor

One time “quick and dirty” task Consider Maybe overkill unless one-time task becomes ongoing request

IT Staff requirements Higher Lower

IT Productivity Detracts from Contributes to

Data sources/data targets Single/single Multiple/multiple, Multiple/single, Single/multiple

Complex transformations Limited: IT must write complex code

Comprehensive

Integration Usually overlooked Industry standards

Page 48: End User Informatics

Industry TrendsEnd-user Informatics

Page 49: End User Informatics

Web 2.0What’s Setting Expectations Today

Page 50: End User Informatics

Next-Generation Enabling Technologies & New User Demands Are Emerging

•Rich Internet Experience

•Web 2.0

•Portlet components

•XML and derivatives

•Dynamic, Ajax-based UI

•Rich Internet Experience

•Web 2.0

•Portlet components

•XML and derivatives

•Dynamic, Ajax-based UI

SOA Infrastructure

Leverage existing systems and components

Standardization

Data-driven environment

Open APIs to customize apps

SOA Infrastructure

Leverage existing systems and components

Standardization

Data-driven environment

Open APIs to customize apps

Personal Dashboards

Integrate data from multiple sources

Multi-account views

Cross-account planning

Personal Dashboards

Integrate data from multiple sources

Multi-account views

Cross-account planning

Page 51: End User Informatics

Web 2.0 features on our projects

51

Page 52: End User Informatics

Web 2.0 features on our projects

52

Page 53: End User Informatics

Advanced Reporting/Visualization Collection

53

Page 54: End User Informatics

Scientific Business Process Management and PP

54

• Fuse scientific and analytical data with process data• Use Pipeline Pilot in automated process decisions • Display reports and data at appropriate points in the

process• Use data to modify process execution

Page 55: End User Informatics

Consolidated Informatics Platform

Consolidated Informatics PlatformConsolidated Informatics Platform

Many Databases Many Tools

Spreadsheets Analytics

Scorecards

Dashboards

Self- service Reports

Data Mining

Portals

Web Reports

Web Reports

Current

Future

Many Databases

Page 56: End User Informatics

Key Takeaways

• Provide Accurate, Integrated & Seamless Informatics Solutions

• Reduce redundant and replicated data bases

• Rationalize existing Reporting tools and technologies

• Build Agile, Flexible and Reusable solutions

• Empower the end-users “Shift Right”

Page 57: End User Informatics

Shift Right