4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your...

10
4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations— not against them.

Transcript of 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your...

Page 1: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

Page 2: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

2

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

In some of the beautiful, ancient cities of the world, there are two cities: the modern, bustling city above ground and the city of archaic, layered architecture underneath. That’s because when buildings were rebuilt in ancient times, workers tore down the old structures and used the original material to form the foundations of the new buildings. Often, old buildings were stripped of their roofs then filled with debris to make solid foundations for new buildings or even entire neighborhoods. As a result, for example, there are places in Rome where the layers and remnants of the original Roman civilization stretch as deep as 60 feet below ground.

Many companies today have a corporate data infrastructure that resembles this kind of ancient city built layer upon layer in these old worlds. Thousands of users rely upon often hard-to-access data stored in complex systems to get the information they need to run their company, their department, their research efforts, their market research, their financial closings, and on and on. Yet, with each layer, getting to trusted information easily and effectively becomes more and more difficult.

At the heart of this layered maze of hardware, software, data repositories, and thousands of stored reports sits your legacy data warehouse. Internal and external data volumes have skyrocketed, and—like the layered architecture of yesteryear—the more things you add to your data architecture, the more complex it and your data warehouse can become. But what was state-of-the-art technology 20 years ago is now in trouble.

• The data warehouse team is besieged with new requests they can’t meet on a timely basis.

• Users frequently want disparate data added to their analyses.

• Many users—out of sheer desperation—load data extracts into Excel, so they can try to figure out on their own, using whatever means at their disposal, what’s going on.

• Ongoing support costs are staggering in terms of staff, licensing, and maintenance.

• Rapidly changing user complexity and specific user needs simply couldn’t be predicted when data warehousing foundations were built.

A lot of IT teams would like to walk away from all of the complexity and start over with a new approach. But for those companies with data warehouse strategies too deeply embedded to abandon altogether, there is a way to squeeze more performance from existing data warehouses.

First, here are four common mistakes you definitely want to avoid.

Data Warehouses: A Complexity of Layers

Page 3: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

3

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

Squeezing More Performance from a Data Warehouse—Don’t Make These Four Common Mistakes

MISTAKE #1

Assuming a data warehouse appliance is the answer.

Let’s discuss each mistake in more detail.

MISTAKE #3

Thinking open-source, Big Data technologies will get you where

you need to be.

MISTAKE #2

Believing the cloud’s low price and “elastically scaling data stores” will

solve everything.

MISTAKE #4

Believing continual data model and Extract Transform Load

(ETL) tuning will give you the performance gains you need.

$

Page 4: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

4

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

Many companies still concerned about rapidly growing data stores and complex performance issues sometimes turn to a data warehouse appliance solution. These finely tuned and very expensive systems can be tempting, but the problem is this one: these appliances attempt to solve your problems by using brute force—massive hardware resources.

An appliance might reduce the time required to run queries against billions of rows, but it does so only through extensive hardware and memory configurations—it doesn’t get to the heart of performance problems caused by the overall data warehouse architecture.

The costs for these systems, the needed data migrations, and the corresponding managed services also are staggering, even for vendors who have built sophisticated systems running on commodity hardware. Many IT leaders who ventured down this path still find their data warehouse performance and data access needs left unfulfilled.

MISTAKE #1Assuming a data warehouse appliance is the answer.

Page 5: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

5

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

In many cases, the cloud solves real problems, especially regarding data volumes and scaling out computational power. And no one can dispute that the cloud is where almost everything is headed.

But moving your struggling data warehouse to the cloud merely results in a cloud-based, struggling data warehouse.

Modernizing your enterprise data warehouse requires many components—including architecture, management processes, and user-vetted requirements—and the cloud really addresses only “the platform” side of

the equation. Plus, migrating to the cloud is no easy feat. It involves many steps for many components, such as: ETL; migrating the data itself; rebuilding data pipelines and connections; and migrating metadata, users, and applications.

If, despite the above, you’re considering migrating your data warehouse to the cloud for cost reduction reasons, you should know that many companies find cloud storage and processing costs add up quickly, resulting in amounts much, much higher than anticipated.

MISTAKE #2Believing the cloud’s low price and “elastically scaling data stores” will solve everything.

Page 6: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

6

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

Most data warehouses fail altogether or fail to deliver all of their promised benefits when it comes to performance and analytics. And Big Data project results are even worse—Gartner estimates 85 percent of Big Data projects fail to move past preliminary stages.1

For instance, the Big Data technology platform Hadoop was never designed to be a data warehouse, yet eager data architects and scientists experiment with Hadoop on projects such as data lakes. They soon run into problems, however, because these architectures lack discipline in critical areas like integration of outside sources of data, reduction of reporting stress on production systems, data security, historical analysis, data governance, user-friendly data structures and schemas, and—ultimately—delivering a single version of the truth.

There’s also a misconception that a Big Data approach like Hadoop is less expensive. While some data actions using Hadoop indeed can cost less, building an entire data warehouse and analytics solution ultimately can be massively more expensive due to the cost of writing

complex queries and analysis. And remember, Hadoop:

• requires new headcount possessing specialized, scarce, and expensive skills;

• often requires the implementation of new reporting tools that end users might not welcome;

• is not a database management system, so you will need to implement a whole new set of tools to get it to do what you want it to do;

• is very complex, requiring many external technologies to make it work; and

• performs poorly on complex queries for many reasons, so improving performance will require many additional commercial and open source systems.

The lure of a Big Data approach might be enticing, but beware the risky, expensive, and complex architecture you’d need to embrace, with no guarantee your data warehouse performance and analytics objectives will ever be achieved.

MISTAKE #3Thinking open-source, Big Data technologies will get you where you need to be.

$

Page 7: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

While brilliant, dedicated analytics data modeling gurus have built and maintained many thousands of star or snowflake schemas, some of them also describe those beautifully engineered models as “the curse of data analytics” because they often limit analytics results: users can’t see what they want to see and can’t get at the level of detail they need. And, when limitations become an insurmountable barrier to obtaining much-needed business insights or new, disparate data needs to be loaded into the analytics environment, the data models must be taken back to the drawing board and tuned once again.

Since analytics is a process, not an IT project, data modeling and database tuning needs just keep growing—costing companies millions of dollars and anxious users awaiting their results millions of minutes along the way. It’s a lose-lose situation.

Then, of course, there’s ETL. ETL is a complex, ever-changing process that needs to be constantly tuned by specially trained ETL teams. But this constant tuning is merely required maintenance—it’s not a viable strategy for boosting data warehouse performance. That’s because ETL tuning is typically a very manual process where a lot of time and effort yields only incrementally small results.

With these data model and ETL challenges, performance issues can quickly emerge when companies need to scale up platforms due to massively increasing data volumes, data complexities, and disparate data. Often, improved modeling and ETL processes just don’t scale. As a result, this type of tuning will consume all of your resources—both people and money.

7

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

MISTAKE #4Believing continual data model and ETL tuning will give you the performance gains you need.

Page 8: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

Ripping and replacing your data warehouse can be a scary undertaking if you’re not ready for it. Many organizations prefer to first find ways to deliver analytics projects more quickly using their existing data warehouse—a business user-focused approach to boosting performance. But how do you do this if you need to avoid the four common mistakes outlined above?

Modern analytics platforms such as Incorta enable you to access and analyze data directly from source data models—including your existing data warehouse—which immensely speeds the development and updating of analytics projects. Since the analytics platform mirrors data, in its original data model, directly from the source applications, you no longer need to change the data model via star or snowflake data modeling whenever new reports or new insights are needed. You also easily can extend beyond existing star schemas to add other data sources whenever needed.

Executing queries on the source data model using this type of approach enables analysts to access all attributes of the data without any pre-determined assumptions. They’re free to explore and uncover trends and details inaccessible to them within a traditional data modeling approach while continuing to benefit from snapshots, Type 2 Slowly Changing Dimensions, or semi-additive or non-additive facts in a dimensionalized model. And all of this progress can be achieved in mere hours, rather than scarce resources wasting valuable hours in countless design meetings and business requirements analysis.

This approach is much simpler than any of the four common, mistaken approaches discussed above: it speeds and removes unnecessary complexity from the development process while giving you a phenomenal boost in performance.

And—when it’s time to re-implement your aged data warehouse in 3-5 years—you might choose to migrate off of it altogether, instead opting for a new, no-data-warehouse strategy using a modern analytics platform like Incorta.

8

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

A Powerful Alternative—Boost Performance Now by Pairing Your Data Warehouse with a Modern Analytics Platform

Page 9: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

9

4 Common Mistakes to Avoid When Optimizing Your Data WarehouseWork within the data warehouse’s inherent limitations—not against them.

Find out how a Fortune 10 company uses Incorta to bypass data modeling to deliver complex operational reports in only seconds—read the case study.

1 Designing for Analytics, “Failure Rates for Analytics, BI, and Big Data Projects = 75%—Yikes!” Feb. 21, 2018.

ETL Processes

Data Warehouse

BI Tools

Sample data sources can be: Input through a data warehouse

OR

Input directly to Incorta

How Incorta works in conjunction with an existing data warehouse.

Page 10: 4 Common Mistakes to Avoid When Optimizing …...9 4 Common Mistakes to Avoid When Optimizing Your Data Warehouse Work within the data warehouse’s inherent limitations—not against

Copyright © 2018, Incorta Inc. All rights reserved.

For more information, visit www.incorta.com or email [email protected].