© 2012 IBM Corporation Pushing the Frontiers of Analytics Brenda Dietrich, IBM Fellow & VP CTO,...
-
Upload
pearl-cook -
Category
Documents
-
view
218 -
download
0
Transcript of © 2012 IBM Corporation Pushing the Frontiers of Analytics Brenda Dietrich, IBM Fellow & VP CTO,...
© 2012 IBM Corporation
Pushing the Frontiers of Analytics
Brenda Dietrich, IBM Fellow & VP CTO, Business Analytics
© 2012 IBM Corporation2
Global Technology Outlook Objectives
GTO identifies significant technology trends
early. It looks for high impact disruptive
technologies leading to game changing
products and services over a 3-10 year
horizon.
Technology thresholds identified in a GTO
demonstrate their influence on clients,
enterprises, & industries and have high
potential to create new businesses.
2
© 2012 IBM Corporation3
Global Technology Outlook 2012Uncertain data and analytics are major themes
Managing Uncertain Data at Scale
Future of
Analytics
The Future Watson
Systems of People
Outcome Based Business
Resilient Business and Services
3
© 2012 IBM Corporation4 4
Managing Uncertain Data at Scale
Trend: Most of the world’s analyzed data will be uncertain
By 2015, 80% of the world’s data will be uncertain
Uncertain data management requires new techniques
These techniques are necessary for real-world Big Data Analytics
Opportunity: Business leadership using Big Data Analytics
Robust, business-aware uncertain data management
Use analytics over uncertain web, sensor, and human-generated data
Enable good business decisions by understanding analysis confidence
Challenge: Taking Big Data Analytics into an uncertain world
Analysis of text is highly nuanced; sensor-based data is imprecise
Timely business decisions require efficient large-scale analytics
It is more difficult to obtain insight about an individual than a group, especially if the source data is uncertain
© 2012 IBM Corporation55
* Truthfulness, accuracy or precision, correctness
The fourth dimension of Big Data: Veracity – handling data in doubt
Volume Velocity Veracity*Variety
Data at Rest
Terabytes to exabytes of existing
data to process
Data in Motion
Streaming data, milliseconds to
seconds to respond
Data in Many Forms
Structured, unstructured, text,
multimedia
Data in Doubt
Uncertainty due to data inconsistency& incompleteness,
ambiguities, latency, deception, model approximations
© 2012 IBM Corporation66
Forecasting a hurricane(www.noaa.gov)
Fitting a curve to data
Model UncertaintyAll modeling is approximate
Process UncertaintyProcesses contain
“randomness”
Uncertainty arises from many sources
Uncertain travel times
Semiconductor yield
Intended Spelling Text Entry
Actual Spelling
GPS Uncertainty
??
?
RumorsContaminated?
{John Smith, Dallas}{John Smith, Kansas}
Data UncertaintyData input is uncertain
Ambiguity
{Paris Airport}Testimony
Conflicting Data
??
?
© 2012 IBM Corporation77
Glo
bal
Dat
a V
olu
me
in E
xab
ytes
Sens
ors
(Inte
rnet
of T
hing
s)
Multiple sources: IDC,Cisco
100
90
80
70
60
50
40
30
20
10
Agg
rega
te U
ncer
tain
ty %
VoIP
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2005 2010 2015
By 2015, 80% of all available data will be uncertain
Enterprise Data
Data quality solutions exist for enterprise data like customer, product, and address data, but
this is only a fraction of the total enterprise data.
By 2015 the number of networked devices will be double the entire global population. All
sensor data has uncertainty.
Social Media
(video, audio and text)
The total number of social media accounts exceeds the entire global
population. This data is highly uncertain in both its expression and content.
© 2012 IBM Corporation88
Creating profiles from many sources
Many inconsistent data sources Intent hidden within social media Geospatial data is imprecise
Examples: Uncertainty management presents many opportunities
Process and forecast uncertainty
System analytics predict maintenance
Sm
art
er
Pla
ne
t
36
0˚
cu
sto
me
r v
iew
Su
pp
ly c
ha
in
Able to identify: 40% more smokers found 15% more disease history
Modeling Uncertainties Demand, sales, production, shipment
Shipping Uncertainties Goods damaged Mistakes in shipped goods
35% more satisfied customers by analyzing agent notes
35% better churn prediction using customer SMS messages
Reduced time to determine lending risk from weeks to minutes
More data from physician notes and tests
He
alt
hc
are
5% more oil platform production
30% less maintenance cost
Downtime costs $M in income loss Equipment maintenance needs unpredictable Customer contracts impose penalties
Mitral stenosis: 50% more diagnoses 35% misdiagnoses
Structured medical records are incomplete “Golden” text notes
must be interpreted Drug names Relationship types
(mtr, sibs, m, paunt)
Research
80% lower price protection costs
30% less channel inventory
50% fewer returns
Reductions obtained using inventory replenishment model that accounts for uncertain price protection
Improvements obtained using statistical modeling that combine equipment sensor data with performance history to predict corrective maintenance activities
Uncertainty in images
Telco
Auto
Healthcare
Energy
© 2012 IBM Corporation99
Required: tight integration to maximize context discovery
Required: common practices followedby multiple standards for representing uncertain data and uncertainty of all types, provenance, and lineage and other metadata
Required: common APIs to enable sharing across the uncertainty management pipeline
No such common practices, standards or APIs exist today
Condensing data reduces uncertainty by constructing context
Customer at Mall
Customer in Store #42Correlation
Data finds Data
Sense Making
Fact Discovery
Son
Mother
Birthday
Date
Spatial Reasoning
A&
Temporal Reasoning
&
Corroboration(Evidence Combination)
ETC.
MichaelSan Jose, CA
Credit Loyalty
Influencers
Buying DSLR today !
Intent
CO
ND
EN
SE
$999 $560
In-Store PricingAnd Discounts
Maximum ContextFor
Minimum Uncertainty
$999
$560OR
Buyinga DSLR today !
NY
© 2012 IBM Corporation10 10
Systems of People
A shift in value from process optimization to people-centric processes
Organizations have extracted most of the efficiencies from traditional process automation
IT enablement opportunities are shifting to Line of Business
A new set of data is made possible by exploiting social business
Social business drives new efficiencies and value from people-centric processes
An opportunity to instrument people-processes
Provides the basis for addressing diverse set of problems
A new IT market is emerging
Adaptive social platforms instrumented with knowledge capture, interconnected with enterprise data and processes, and made intelligent through differentiating analytics will transform business
© 2012 IBM Corporation11 11
People-centric processes are at the core of a broad range of issues
Differentiate for GrowthCreate winning products, fast, by having the best and most productive knowledge workers
Drive Sales ProductivityCreate superior sales force, drive sales enablement and seller/client alignment
Grow in Emerging MarketsRe-create organizational footprint in global markets
Transform Service Delivery Further grow productivity and enable new delivery models
© 2012 IBM Corporation12 12
Optimizing people-centric processes is not the same as optimizing supply chains
CRM ClaimsDeliveryRecords
Patents & Publications
– Clients served– Products sold – Sales patterns– Productivity
In the last couple of weeks, I’ve talked to ABC
bank, XYZ and at a security conference.
Status: Working Expert: Security
– Engagements worked
– Team info
– Work specs– Tasks
accomplished– Productivity
– Innovation– Products– Technical
leadership
“Status updates alone on Facebook amount to more than ten times more words than on all blogs worldwide” - David Kirkpatrick, The Facebook Effect
Status: At conference
Influencer
Rich information (e.g. expertise, work patterns, response to incentives, digital reputation) is flowing through on-line collaboration and enterprise systems
Capturing this information enables analytics to be applied to people-centric processes
© 2012 IBM Corporation13 13
Strength of Sales Force Index is an example of what is possible with a rich representation of people
TODAY Years selling Job change Salary band PBC
FUTURE True skills and expertise Disciplines Clients served Products sold Team experiences Connections Incentives and responses Career path …
SSFI mines sales force data to understand which attributes of a seller (e.g. skills, experiences), sales team (e.g. team composition, territories) or sales process (e.g. incentives, coverage model) are driving sales performance (quota attainment, win rates, productivity)
SSFI identifies: – Reasons for performance disparities (at
individual or group level), and the best set of actions to drive performance
“Why is our sales force in Region X not performing at par with other regions or competition?”
“What actions can we take to improve sales performance?”
“What are the incentives that truly drive performance?”
© 2012 IBM Corporation14 14
Executing on SoP vision depends on three key capabilities
Develop capabilities to create a representation of a person’s skills, experiences, preferences, digital reputation…
In a structured and organized way, so it can be used for the purpose of running a business
Implement capabilities for people-centric process optimization within an analytics platform for rapid, on-demand deployment
matching, talent cloud crowdsourcing, predictive markets
simulation of workforce trends performance analytics
behavior modeling…
Incorporate capabilities that adapt content for situations and needs, and enhance communication over many devices, across diverse pools of talent
context-aware cognitive load management
translation, transcriptiontext-to-speech, voice…
PEOPLE CONTENT PEOPLE ANALYTICSPEOPLE ENABLEMENT
© 2012 IBM Corporation15
Future of Analytics
Explosion of unstructured data
Creates new analytics opportunities
Addresses new enterprise needs
Consistent, extensible, and consumable analytics platform
Reduces cost-to-value for enterprises
Increases analytics solution coverage with limited supply of skills
Optimizing across the stack to deploy analytics at scale
Analytics becomes a dominant IT workload and drives HW design
Opportunity to seamlessly scale from terascale to exascale
15
© 2012 IBM Corporation16
Analytics is broadly defined as the use of data and computation to make smart decisions
Data
Historical
Simulated
Text Video, Images Audio
Data instances
Reports and queries on data aggregates
Predictive models
Answers and confidence
Feedback and learning
Decision point Possible outcomes
Option 1
Option 2
Option 3
16
© 2012 IBM Corporation17
The value of analytics grows by incorporating new sources of data, composing a variety of analytic techniques, spanning organizational silos, and enabling iterative, user-driven interaction
So
urc
es a
nd
typ
es o
f d
ata
New format or usage of data
Structured or standardized
Scope of decisionLow High
Multi-modal demand forecastingIntent-to-buy trends
Segmentation-based
market impactestimates
Price-based demand forecasting(own & competitors)Sales-based
demand forecasting
17
© 2012 IBM Corporation18
Analytics toolkits will be expanded to support ingestion and interpretation of unstructured data, and enable adaptation and learning
Extended from: Competing on Analytics, Davenport and Harris, 2007
Standard Reporting
Ad hoc Reporting
Query/Drill Down
Alerts
Forecasting
Simulation
Predictive Modeling
In memory data, fuzzy search, geo spatial
Causality, probabilistic, confidence levels
High fidelity, games, data farming
Larger data sets, nonlinear regression
Rules/triggers, context sensitive, complex events
Query by example, user defined reports
Real time, visualizations, user interaction
Report
Decide and Act
Understand and Predict
Collect and Ingest/Interpret
Learn
Tra
ditio
nal
New
Dat
a N
ew M
eth
od
s
Optimization
Optimization under Uncertainty
Decision complexity, solution speed
Quantifying or mitigating risk
Adaptive Analysis
Continual Analysis Responding to local change/feedback
Responding to context
Entity Resolution
Annotation and Tokenization
Relationship, Feature Extraction
People, roles, locations, things
Rules, semantic inferencing, matching
Automated, crowd sourcedDecide what to count;
enable accurate counting
In the context of the
decision process
18
© 2012 IBM Corporation19
Analytic solutions will apply multiple methods to multiple forms of dataExample: Utility Vegetation Management
Effective Right of Way vegetation management is critical to streamlined utility operations
Traditional Right of Way programs are mainly static-scenario driven on a six year cycle– Static and rigid models lead to predominantly reactive operations, which are expensive– Focus on narrow corridor widths fails to address severe weather impact
A multimodal analytics approach can overcome these shortcomings– Structured data (e.g. transmission line maps) and unstructured data (e.g. LIDAR sensor)– Advanced modeling to perform a dynamic scenario-driven analysis
3-DimensionalModel
Recovery
Right-of-WayDynamic
Forecasting Model
ScheduleGenerator
Visualization
ELECTRIC
TELECOMMUNICATIONS
RAIL
ROAD
OILSo
luti
on
Fra
me
wo
rk
SENSORS
UTILITY DATA
MAPS
WEATHER
Preprocessor
Preprocessor
Preprocessor
Preprocessor
19
© 2012 IBM Corporation20
Data Acquisition
Analytics solution development requires several interacting design steps
Streaming data
Text data
Multi-dimensional
Time series
Geo spatial
Relational
Data mining & statistics
Optimization & simulation
Fuzzy matching
Network algorithms
Composition andPackaging
Core AnalyticsFiltering and
Extraction Validation
Social network
Video & image
Semantic analysis
Business Rules Engine
Data Evaluation and FusionAlgorithm Composition and Invention
Testing and Execution Optimization
✔Deployment
New algorithms
20
© 2012 IBM Corporation21
An Analytics solution platform will increase enterprise value by supporting both the CxO solution and the CIO infrastructure
The CIO can reduce cost and add value to the use of analytics by supporting collaboration and data/analysis sharing
Leverage MandateStreamline
operations and increase
organizational effectiveness
Expand MandateRefine business processes and
enhance collaboration
Transform MandateChange the industry value chain through
improved relationships
Pioneer MandateRadically innovate products, markets, business models
Easier consumption of Analytics solutions– Have consistent look and feel– Changes are easier to implement effectively– Trustworthy solutions are produced
More efficient, less complex development– Reduces growth of development costs– Speeds delivery of new functionality– Expands analytics solution developer population
Reduces client cost of operation – Seamless integration eases deployment
of solutions– Establishes preferred development path
for new solution– Consistent and coherent infrastructure eases
managing solutions
Lines of code
Rev
enu
e
Without
platform
With
platform
21
© 2012 IBM Corporation22 22
Optimizing across the stack will enable the deployment of analytics at scale
Cores SCM
StorageNetwork
Cores SCM
StorageNetwork
Cores SCM
StorageNetwork
Cores SCM
StorageNetwork
+ +
Predictive Analytics Modeling, Simulation
Text AnalyticsHadoop Workloads
OptimizationSensitivity Analysis Future System
Systems supporting future analytics will be more data centric, composable and scalable
Balanced, reliable, power efficient systems, with integrated software that scales seamlessly
Integrated analytics, modeling and simulation capabilities to address generation, management and analysisof Big Data for Business Advantage
Systems will support increasingly complex data sets and workflows.
Different elements within these complex workflows will require different capabilities within systems.
General PurposeIntegrated Network
Integrated ProcessingIntegrated Storage
© 2012 IBM Corporation23
Extend Watson technology
Moves beyond “question-in & answer-out” to always “learning” evidence-based decision support
Addresses the enterprise need to convert growing volumes of information into actionable knowledge
Demonstrates business value in critical problem spaces, starting with Healthcare
Lead in new domains
Efficiently adapting and scaling Watson to new domains requires a novel blend of engineering and research
Enable efficient adaptation
The Future Watson
23
© 2012 IBM Corporation24 24
Watson’s real value proposition: Efficient decision support over unstructured (and structured) content
Unstructured Data Broad, rich in context Rapidly growing, current Invaluable yet under utilized
SQL/
XQuery
Existing
BI
Inference/
Rules
Structured Data Precise, explicit Narrow, expensive
Jeopardy! Challenge
Deeper Understanding but BrittleHigh Precision at High CostNarrow Limited Coverage
Shallow UnderstandingLow Precision
Broad Coverage
Deeper Understanding,Higher Precision and Broader,
Timely Coverage at lower costs
Key WordSearch
Relevance Ranking
Open-Domain
Question-Answering
© 2012 IBM Corporation25 25
LearningUnderstanding Interacting Explaining
Specific Questions
The type of murmur associated with this condition is harsh,
systolic, and increases in intensity with
Valsalva
From specific questions
to rich, incomplete problem
scenarios(e.g. EHR)
Rich ProblemScenarios
Entire Medical Record
Question-In/Answer-Out
Evidence analysis and look-ahead,
drive interactive dialog to refine
answers and evidence
Interactive Dialog Teach Watson
Refined Answers, Follow-up Questions
Input, Responses
Dialog
Batch Training Process
Scale domain learning and
adaptation rate and efficiency
Continuous Training& Learning Process
Answers, Corrections, Judgements
Responses, Learning Questions
Precise Answers& Accurate Confidences
Move fromquality answers
to quality answers and
evidence
ComparativeEvidence Profiles
Taking Watson beyond Jeopardy!
© 2012 IBM Corporation26