Tamr | cdo-summit
-
Upload
tamrinc -
Category
Technology
-
view
100 -
download
1
Embed Size (px)
Transcript of Tamr | cdo-summit

Enterprise Data Unification in Practice
IHAB ILYAS
Professor, University of Waterloo
Co-founder, Tamr, Inc.
@ihabilyas

Top-Down Data Integration Limits Data Quality and Connectedness
<10%
Enterprise data
is siloed . . .. . . expensive to
connect & curate
# of sources
$
The Consequences:
• Limited data available
• Missed opportunity
• Ballooning costs

Hiring More Data Experts Is Not the Answer
Reality Enterprise RealityGoal
• Manual data collection
and preparation
• Long lead-time to
analyses
• Limited individual view
on variety of data
• Extensive rework
• No cohesive view of
data efforts
• Expertise across organization
is underutilized

Data Curation: Many Definitions and One Goal
Extract Value from Data
“For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”NYtimes August, 2014

Exploding Big Data Variety Will Make the Problem WorseR
ad
ica
l Incre
ase
in
Da
ta V
arie
ty
0
2000 2011
Source: IDC 2011 Digital Universe Study
1.0
2.0
Corporate databases
Semi-structured data JSON Sources
Increasingly valuable
Missing Capability:
Connecting and
curating in an
automated way

Structured and
Semi-structured
Data Sources
Collaborative
Curation
Data Experts
(Source owners)
Data Stewards
and Curators
Data
Inventory
APIs
Systems
Tools
Data
Scientists
The Core of Tamr: Machine Learning with Human Insight
Advanced
Algorithms &
Machine
Learning
Expert
Input
Integrated
Data &
Metadata
Identify sources, understand relationships and curate the massive variety of siloed data
Expert
Directory

DemoExample
Use Cases

Solution Overview: Sourcing & Supply Chain Spend Optimization
The Problem
• Part/supplier data in ERPs, life cycle management
systems, and catalogs across departments
• Inaccurate data / incongruent naming conventions
The Solution
• Create a unified schema that leverages all
relevant data sources, including parts,
procurement, logistical, and vendor data
Benefit
• Discover opportunities to optimize purchases
across different suppliers and lines of business Tamr Unified View
Hundreds of Potential Sources

Solution Overview: Customer Data Integration
The Problem
• Customer data stored in CRMs, data warehouses,
back-office applications, and other enriching sources
• Complexity of unifying personal data / incongruent
naming conventions / data sparseness / manual entry
The Solution
• Create a holistic and adaptive customer view by
unifying disparate data sources across the enterprise
Benefits
• Apply a unified and enriched customer view across
multiple channels / lines of business
• Discover hidden opportunities to improve upsell /
cross-sell, reduce churn, and identify key opinion
leaders (KOL) via enhanced segmentation/targeting

Solution Overview: Clinical Trials
The Problem
• Clinical trial data is reported in a wide variety of
formats, ontologies and standards
• Underspecified attribute names, varying
qualities of annotation, duplicate data etc…
The Solution
• Unify attribute names to build a common clinical
trial data model
Benefit
• Ability to cluster clinical trials based on drug, target or investigator
• Easier way to aggregate and report ongoing trial data
• Simplified reporting for various agency ontologies

Solution Overview: Medical Instruments
The Problem
• Instruments perform experiments at thousands
of labs and hospitals across the world
• Data stored in inconsistent formats and
standards across various labs and hospitals
The Solution
• Build a unified view of instruments leveraging all
available internal/external data-sources
Benefit
• Ability to cluster analysis based on instrument,
location and other attributes

Tamr Architecture: a Data Curation Stack

DemoQuestions?
@Tamr_Inc
www.tamr.com