Everything has changed except us

56
Copyright Third Nature, Inc. Everything has changed except us February, 2015 Mark Madsen www.ThirdNature.net @markmadsen

Transcript of Everything has changed except us

Page 1: Everything has changed except us

Copyright Third Nature, Inc.

Everything has changed except us

February, 2015

Mark Madsenwww.ThirdNature.net@markmadsen

Page 2: Everything has changed except us

Copyright Third Nature, Inc.

The DW group as the crazy uncle of the organization

Madness: doing more of what you already did and expecting different results.

We’ve been struggling with shrinking load windows, performance problems, and most important, inability to quickly meet data needs, for a decade, yet we keep doing the same things to try to fix them.

Page 3: Everything has changed except us

Copyright Third Nature, Inc.

I never said the “E” in EDW meant “everything”…

What do you mean, “Just tables?”

Page 4: Everything has changed except us

Copyright Third Nature, Inc.

It’s going to get a lotworse

Not E

E

Conclusion: any methodology built on the premise that you must know and model all the data first is untenable 

Page 5: Everything has changed except us

© Third Nature Inc.© Third Nature Inc.

The good news is: we solved the bigness problem

Source: Noumenal, Inc.

Page 6: Everything has changed except us

Copyright Third Nature, Inc.

Now, analytics embiggens the data volume problem

Many of the processing problems are O(n2) or worse, so small data can be a problem for DB‐based platforms

Page 7: Everything has changed except us

© Third Nature Inc.© Third Nature Inc.

What makes data “big”?

Aside from very large amounts:

Hierarchical structures

Nested structures

Linked structures

Encoded values

Non‐standard (for a database) types

Deep structure

Human authored text

“big” is better off being defined as “complex” or “hard to manage”

Copyright Third Nature, Inc.

Page 8: Everything has changed except us

Copyright Third Nature, Inc.

Datasets today: Interconnection and Dependency

Dynamic models are missing from most data systems today. These drive new workloads, generate different data, need new techniques. 

Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data, Danny Holten

Page 9: Everything has changed except us

Copyright Third Nature, Inc.

It’s not the number of genes that determine complexity, it’s the interactions between them.

Source: M. Pertea and S. Salzberg/Genome Biology 2010

Page 10: Everything has changed except us

Copyright Third Nature, Inc.

It’s not the number of genes that determine complexity, it’s the interactions between them.

Source: M. Pertea and S. Salzberg/Genome Biology 2010

Page 11: Everything has changed except us

Copyright Third Nature, Inc.

Categorizing the measurement data we collectThe convenient data is the transactional data.▪ Goes in the DW and is used, even if it isn’t the right measurement.

The inconvenient data is observational data.▪ It’s not neat, clean, or designed into most systems of operation.

The difficult and misleading data is declarative data.▪ What people say and what they do require ground truth.

We need an architecture that supports all three categories.

Copyright Third Nature, Inc.

Page 12: Everything has changed except us

Copyright Third Nature, Inc.

Observations

Sensor data doesn’t fit well with current methods of collection and storage, or with the technology to process and analyze it.Copyright Third Nature, Inc.

Page 13: Everything has changed except us

Copyright Third Nature, Inc.

Declarations

Page 14: Everything has changed except us

Copyright Third Nature, Inc.

Unstructured is Not Really Unstructured

Slide 14

Unstructured data isn’t really unstructured: objects have structure, language has structure. Text can contain traditional structured data elements. The problem is that the content is unmodeled.

Our real problem is making implicit structure explicit.

Conclusion: the data warehouse must cope with more complex data structures, storage and processing.

Page 15: Everything has changed except us

Copyright Third Nature, Inc.

The creation, flow and use of data is different for transactions and machine‐generated events

Data entry Extract Cleanse Load UseStore

TransactionsMDM

Generate Store

Use

UseCleanse

Program

Capture

This runs at human speed

This runs at machine speed, with slower feedback cycle

Page 16: Everything has changed except us

Copyright Third Nature, Inc.

We’re moving BI from information to actuation

This means monitoring as data flows, 

detecting rather than querying, as well as feedback to the sources.

Page 17: Everything has changed except us

Copyright Third Nature, Inc.

The architecture we’ve been using.

The general concept of a separate architecture for BI has been around longer, but this paper by Devlin and Murphy is the first formal data warehouse architecture and definition published.

17

“An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988)

Slide 17Copyright Third Nature, Inc.

Page 18: Everything has changed except us

Copyright Third Nature, Inc.

Origins: in 1988 there was only big hair.▪ No real commercial email, public internet barely started

▪ Storage state of the art: 100MB, cost $10,000/GB

▪ Oracle Applications v1 GL released; SAP goes public, enters US market

▪ Unix is mostly run by long‐haired freaks

▪ Mobile was this

This is the context: scarcity of data, of system resources, of automated systems outside core financials, of money to pay for storage.

Page 19: Everything has changed except us

Copyright Third Nature, Inc.

We think of BI as publishing, an old metaphor.

Publishing has value, but may not be actionable.

Page 20: Everything has changed except us

Copyright Third Nature, Inc.

Data strategy means understanding the context of data use so we can build the right infrastructure

Collect new data

Monitor Analyze Exceptions

Analyze Causes Decide Act

Act on the process

Act within the process

We need to focus on what people do with information as the primary task, not on the data or the technology.

Page 21: Everything has changed except us

Copyright Third Nature, Inc.

The usage models for conventional BI

Collect new data

Monitor Analyze Exceptions

Analyze Causes Decide Act

No problem No idea Do nothing

Act on the processUsually days/longer timeframe

Act within the processUsually real-time to daily

This is what we’ve been doing with BI so far: static reporting, dashboards, ad-hoc query, OLAP

Page 22: Everything has changed except us

Copyright Third Nature, Inc.

The usage models for analytics and “big data” 

Collect new data

Monitor Analyze Exceptions

Analyze Causes Decide Act

No problem No idea Do nothing

Act on the processUsually days/longer timeframe

Act within the processUsually real-time to daily

Analytics and big data is focused on new use cases: deeper analysis, causes, prediction, optimizing decisions

This isn’t ad-hoc, reporting, or OLAP.

Page 23: Everything has changed except us

Copyright Third Nature, Inc.

As practices evolve based on new capabilities…

A new level of complexity develops over top of the older, now better understood processes, leading to new data and analysis needs.

Page 24: Everything has changed except us

Copyright Third Nature, Inc.

Growing complexity has changed our context

Internal 3rd party & custom applications, logs, event streams, hosted & external apps, 3rd party datasets… 

Page 25: Everything has changed except us

Copyright Third Nature, Inc.

Enterprise architecture changes

External = no data layer access

SOA and REST = no data layer access

Streams and messages are becoming the norm

Observations and Transactions

Page 26: Everything has changed except us

Copyright Third Nature, Inc.

Reality: continuous change in the DW

You can’t keep up with source changes

You can’t keep up with new data requests

You are already scale, performance, latency limitedBut:

Many parts of the organization need current operational data

Page 27: Everything has changed except us

Copyright Third Nature, Inc.

The emerging big data market has an answer…

Page 28: Everything has changed except us

Copyright Third Nature, Inc.

Centralize: that solves all problems!

Creates bottlenecks

Causes scale problems

Enforces a single model

Page 29: Everything has changed except us

Copyright Third Nature, Inc.

Data quality and definitions in a single schema are based on the strictest requirement, reducing flexibility

Page 30: Everything has changed except us

Copyright Third Nature, Inc.

The data warehouse vs business agility

All the data

Common, typed, tabular data

The bottleneck is you

Page 31: Everything has changed except us

Copyright Third Nature, Inc.

We have a design for stability. We need one for adaptability

Page 32: Everything has changed except us

Copyright Third Nature, Inc.

Which is best, 3NF or dimensional?

The core assumption that there can be just one big schema model on one big platform is flawed.

Answer: neither.

We think we can model all the data before use, but that’s a bottleneck. Current techniques for modeling and managing data are too rigid and incapable of describing all the possible relationships.

Page 33: Everything has changed except us

Copyright Third Nature, Inc.

A core problem with one big schema is change

Page 34: Everything has changed except us

Copyright Third Nature, Inc.

Big data answer?

Schema‐on‐read!

There’s a price to pay with using “schema‐on‐read” for everything.

You won’t see the problems with this until you add a second application, and a third.

“One writer‐many readers” kills schema‐on read benefits.

Page 35: Everything has changed except us

Copyright Third Nature, Inc.

Why is the choice no schema or hard schema?

Simple key‐value files give you flexibility in some areas. Tables give you flexibility in other areas.

Which area do you need flexibility in and why?

Programs writing data?

Files Tables

Programs processing data?

Programs reading data?

Why not flexible schemas instead of either-or?

Page 36: Everything has changed except us

Copyright Third Nature, Inc.

“We can't solve problems by using the same kind of thinking we used when we created them.”

Albert Einstein

Page 37

Page 37: Everything has changed except us

Copyright Third Nature, Inc.

With too much data the approach has to be inverted

The process we still use:1. Model

2. Collect

3. Analyze

The new process is:1. Collect

2. Analyze

3. Model

4. Promote

This is a shift from planned design to evolutionary design for the data warehouse

Page 38: Everything has changed except us

Copyright Third Nature, Inc. Slide 39

The solution to our problems isn’t necessarily technology, it’s architecture.

Page 39: Everything has changed except us

Copyright Third Nature, Inc.

Workloads

OLTP BI Analytics

Access Read‐Write Read‐only Read‐mostly

Predictability Predictable Unpredictable Fixed path

Selectivity High Low Low

Retrieval Low Low High

Latency Milliseconds < seconds msecs to days

Concurrency Huge Moderate 1 to huge

Model 3NF, nested object Dim, denorm BWT

Task size Small Large Small to huge

Page 40: Everything has changed except us

Copyright Third Nature, Inc.

DATA ARCHITECTURE

We’re so focused on the light switch that we’re not talking about the light

Page 41: Everything has changed except us

Copyright Third Nature, Inc.

Decoupled Data Architecture

The core of the data warehouse isn’t the database, it’s the data architecture that the database and tools implement.

We need a data architecture that is not limiting:▪ Deals with data and schema change easily

▪ Does not always require up front modeling

▪ Does not limit the format or structure of data

▪ Assumes a full range of data latencies, from streaming to one‐time bulk loads, both in and out, 

Page 42: Everything has changed except us

Copyright Third Nature, Inc.

Food supply chain: an analogy for data

Multiple contexts of use, differing quality levels

Page 43: Everything has changed except us

Integrate

Manage

Decouple data architecture layers

Use

This implies a new warehouse architecture and data modeling approaches

Collect

Transactions Observations Declarations

Page 44: Everything has changed except us

Copyright Third Nature, Inc.

Break down the monolithic architecture

Page 45: Everything has changed except us

The technology architecture must change, based on work done with the data:▪ Collection separate from▪Data management separate from

▪Data delivery and use

Data may live in more than one place because it may have more than one model, for more than one use, using more than one engine

Page 46: Everything has changed except us

Copyright Third Nature, Inc.

Reinforcing relationships keep architectures from changing, despite radical technology shifts

Note how only one third is tech

ArchitecturalRegime

MethodologyTechnology

Organization

Organization defines where the work is done and the roles.

Technology defines what work can be done in a given area. Methodology 

defines how work is done and what that work is.

Slide 49Copyright Third Nature, Inc.

Page 47: Everything has changed except us

Copyright Third Nature, Inc.

Agile architectures without agile methods fail

Page 48: Everything has changed except us

Copyright Third Nature, Inc.

How can you move to a more agile architecture?

Start by deploying faster.

Things will break.

You will fix them.

You will get better.

So will your architecture.

Page 49: Everything has changed except us

Copyright Third Nature, Inc.

The geography we have been using is out of date

The box we created:• not any data, rigidly typed data• not any form, tabular rows and columns of typed data

• not any latency, persist what the DB can keep up with

• not any process, only queries

The digital world was diminished to only what’s inside the box until we forgot the box was there.

Page 50: Everything has changed except us

Copyright Third Nature, Inc.

Data infrastructure is a platform▪ Any data – structures, forms

▪ Any latency –in motion, at rest

▪ Any process – query, algorithm, transform

▪ Any access – SQL, API, queue, file movement

Page 51: Everything has changed except us

Copyright Third Nature, Inc.

Don’t follow the market

Some people can’t resist getting the next new thing because it’s new and new is always better.

Many IT organizations are like this, promoting a solution and hunting for the problem that matches it.

Better to ask “What is the problem for which this technology is the answer?”

Copyright Third Nature, Inc.

Page 52: Everything has changed except us

Copyright Third Nature, Inc.

Think like an architect, not like a consumerNo more “enterprise standard” ‐ now “what works”

The technology providers are selling you what they have, not what you need.

Follow the goals of the business.

Translate the goals into capabilities and match those to the architecture required.

Page 53: Everything has changed except us

Copyright Third Nature, Inc.

“The future, according to some scientists, will be exactly like the past, only far more expensive.” ~ John Sladek

Page 54: Everything has changed except us

Copyright Third Nature, Inc.

CC Image Attributions

Thanks to the people who supplied the creative commons licensed images used in this presentation:

round hole square peg ‐ https://www.flickr.com/photos/epublicist/3546059144firemen not noticing fire.jpg ‐ http://flickr.com/photos/oldonliner/1485881035/pyramid_camel_rider.jpg ‐ http://www.flickr.com/photos/khalid‐almasoud/1528054134/House on fire ‐ http://flickr.com/photos/oldonliner/1485881035/glass_buildings.jpg ‐ http://www.flickr.com/photos/erikvanhannen/547701721Circos, Hierarchical Edge Bundles:Visualization of Adjacency Relations in Hierarchical Data, Danny Holtentext composition ‐ http://flickr.com/photos/candiedwomanire/60224567/Building demolition ‐ https://www.flickr.com/photos/gregpc/4429888820peek_fence_dog.jpg ‐ http://www.flickr.com/photos/webwalker/114998078/donuts_4_views.jpg ‐ http://www.flickr.com/photos/le_hibou/76718773/shady_puppy_sales.jpg ‐ http://www.flickr.com/photos/brizzlebornandbred/5001120150subway dc metro  ‐ http://flickr.com/photos/musaeum/509899161/

Page 55: Everything has changed except us

Copyright Third Nature, Inc.

About the Presenter

Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award‐winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and on the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit  http://ThirdNature.net 

Page 56: Everything has changed except us

About Third Nature

Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, information strategy and data management. If your question is related todata, analytics, information strategy and technology infrastructure then you‘re at the right place.

Our goal is to help organizations solve problems using data. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.

We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.