My Ideal Data Warehouse System

26
My Data Warehouse Dream Machine: Building the Ideal Data Warehouse Michael R. Ault, Oracle Guru Texas Memory Systems, Inc Introduction Before we can begin to discuss what is needed in a data warehouse (DWH) system, we need to pin down the definition of exactly what a data warehouse is and what it is not. Many companies accumulate a large amount of data and when they reach a certain threshold of either size or complexity they feel they have a data warehouse. However, in the pure sense of the term, they have a large database, not a data warehouse. However, a large non-conforming database masquerading as a data warehouse will probably have more severe performance issues than a database designed from the start to be a data warehouse. Most experts will agree that a data warehouse has to have a specific structure, though the exact structure may differ. For example, the star schema is a classic data warehouse structure. A star schema uses a central “fact” table surrounded by “dimension” tables. The central fact table contains the key values for each of the dimension tables that happen to correspond to a particular set of dimensions. For example, the sales of pink sweaters in Pittsburgh, PA, on Memorial Day in 2007 is an intersection of a STORES, ITEMS, SUPPLIERS, and DATE dimension and a central SALES table as in Figure 1.

description

A presentation on an ideal data warehouse server and IO subsystem, if money were no object! Of course, it is already obsolete and the prices quoted too high!

Transcript of My Ideal Data Warehouse System

Page 1: My Ideal Data Warehouse System

My Data Warehouse Dream Machine: Building the Ideal Data Warehouse Michael R. Ault, Oracle GuruTexas Memory Systems, Inc

Introduction

Before we can begin to discuss what is needed in a data warehouse (DWH) system, we need to pin down the definition of exactly what a data warehouse is and what it is not. Many companies accumulate a large amount of data and when they reach a certain threshold of either size or complexity they feel they have a data warehouse. However, in the pure sense of the term, they have a large database, not a data warehouse. However, a large non-conforming database masquerading as a data warehouse will probably have more severe performance issues than a database designed from the start to be a data warehouse.

Most experts will agree that a data warehouse has to have a specific structure, though the exact structure may differ. For example, the star schema is a classic data warehouse structure. A star schema uses a central “fact” table surrounded by “dimension” tables. The central fact table contains the key values for each of the dimension tables that happen to correspond to a particular set of dimensions. For example, the sales of pink sweaters in Pittsburgh, PA, on Memorial Day in 2007 is an intersection of a STORES, ITEMS, SUPPLIERS, and DATE dimension and a central SALES table as in Figure 1.

Figure 1: Example Star Schema

Page 2: My Ideal Data Warehouse System

Generally speaking, data warehouses will be either of the Star or Snowflake (essentially a collection of related Star schemas) design.

Of course we need to understand the IO and processing characteristics of the typical DWH system before we describe an ideal system. We must also understand that the ideal system will also depend on the projected size of the DWH. For this paper we will assume a size of 300 gigabytes, which is actually small compared with many companies’ terabyte or multi-terabyte data warehouses.

IO Characteristics of a Data WarehouseThe usual form of access for a data warehouse in Oracle will be through a bitmap join across the keys stored in the fact table, followed by a specific retrieval of both the related data from the dimension table and the data at the intersection of the keys in the fact table. This outside-to-inside (dimension to fact) access path is called a Star Join.

Many times the access to a data warehouse will be through index scans followed by table scans and involve large amounts of scanning, generating large IO profiles. Access will be to several large objects at once: the indexes, the dimensions, and the fact(s) in a DWH system. This access to many items at once will lead to large numbers of input and output operations per second (IOPS).

For example, in a 300 gigabyte TPC-H test (TPC-H is a DSS/DWH benchmark) the IOs per Second (IOPS) can exceed 200,000 IOPS. A typical disk drive (15K, 32-148 gigabytes, Fibre Channel) will allow a peak of around 200 random IOPS per second. It is not unusual to see a ratio of provided storage capacity to actual database size of 30-40 to ensure that the required number of IOPS can be reached to satisfy the performance requirements of a large DWH. The IO profile for a 300GB system with temporary tablespace on solid state disks (SSD) is shown in Figure 2.

HD IOPS

1.00

10.00

100.00

1000.00

10000.00

100000.00

0 5000 10000 15000 20000

Seconds

IOP

S

Perm IO

Temp IO

Total IO

Figure 2: IOPS for a 300GB TPC-H on Hard Drives

Page 3: My Ideal Data Warehouse System

The same 300GB TPC-H will all tablespaces (data, index, and temporary) on SSD is show in Figure 3.

IOPS-SSD

1.00

10.00

100.00

1000.00

10000.00

100000.00

1000000.00

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Seconds

IOP

S

D-DI

D-TI

D-Total

Figure 3: IOPS from a 300GB TPC-H on SSD

Note that the average IOPS in Figure 2 hover around 1-2000 IOPS with peaking loads (mostly to the SSD temporary file) of close to 10,000 IOPS, while the SSD-based test hovers around 10,000 IOPS with peaking loads nearing 100,000 or more IOPS. The numbers in the charts in Figures 2 and 3 were derived from the GV$FILESTAT and GV$TEMPSTAT views in a 4 node RAC cluster. It was assumed that the raw IOPS as recorded by Oracle underwent a 16 fold reduction due to IO grouping by the HBA and IO interfaces. The hard drive arrays consisted of 2-14 disk 15K RPM 144 GB sets in two RAID5 arrays. The SSD subsystem consisted of a single 1 terabyte RamSan-500, a single 128 GB RamSan-400, and a single 128 GB RamSan-320.

To get 100,000 IOPS a modern disk drive based system may require up to 500 disk drives, not allowing for mirroring (RAID). To put this in perspective: to properly spread the IOPS in a 300 GB data warehouse (actually close to 600 GB when indexes are added) you will require 300*40 or 12,000 GB (12 terabytes) of storage to meet IO requirements. At 200 IOPS/disk that maps to 500 disk drives for 100,000 IOPS if no caching or other acceleration technologies are utilized. In actual tests, EMC reached 100,000 IOPS with 495 (3 CX30 cabinets worth) disks in a RAID1 configuration with no mirroring( http://blogs.vmware.com/performance/2008/05/100000-io-opera.html.) The EMC results are shown in Figure 4. Assuming they get linear results by adding disks (and HBAs, cabinets, and controllers) they should be able to get close to 200,000 IOPS with 782 disks, which is pretty close to our 1,000 disk estimate. However, their latency will be pretty close to 15 or more milliseconds if the trend shown in the graph in Figure 4 continues.

Page 4: My Ideal Data Warehouse System

Figure 4: EMC IOPS and Latency (From: http://blogs.vmware.com/performance/2008/05/100000-io-opera.html)

Since most database systems such as Oracle will use a standard IO profile, increasing the amount of data in the warehouse means we must increase the available IOPS and bandwidth as the database increases in size. Most companies project that their data warehouse will usually double in size within 3 years or less.

Of course the amount of IOPS will decide the response time for your queries. If you can afford high response times then your IOPS can be lower; conversely, if you need low response times then your IOPS must be higher. In today’s business environment the sooner you can get answers to the typical DWH queries the sooner you can make strategic business decisions. This leads to the corollary that the DWH should have the highest possible IOPS. Figure 5 shows a comparison between various SAN systems for IO latency.

Figure 5: IOPS Comparison for Various SANs(Source: www.storageperformance.org)

Page 5: My Ideal Data Warehouse System

As you can see, the IOPS and latency numbers from an SSD-based SAN are better than those for more expensive hard disk based SANs. Even with disk form-factor SSDs, an SSD system designed from the ground up for performance is still superior. As shown in Figure 6, the latency from the new EMC Flash based drives still cannot compete with SSDs built from the start to perform.

Figure 6: EMC SSD Response time: 1-2 MS EMC HDD Response time: 4-8 MS(Source: ”EMC Tech Talk: Enterprise Flash Drives”, Barry A. Burke, Chief

Strategy Officer, Symmetrix Product Group, EMC Storage Division, June 25, 2008)

Processing Characteristics of a Data Warehouse SystemDWH systems usually provide summarizations of data, total sales, total uses, or the number of people doing X at a specific point in time and place. This use of aggregation in a DWH system leads to requirements for large amounts of sort memory and temporary tablespace areas, generally speaking. Thus the capability to rapidly sort, summarize, and characterize data is a key component of a data warehouse system.

In all TPC-H tests we see the use of large numbers of CPUs, large core memories, and parallel query and partitioning options to allow DWH systems to process the large amounts of data. Most TPC-H tests are run using clustered systems of one type or another. For a 300 GB TPC-H we usually see a minimum of 32 CPUs spread evenly amongst several servers.

Technologies such as blade servers offer great flexibility but also tie us to a specific vendor and blade type. In addition, you will eventually be limited by the blade system

Page 6: My Ideal Data Warehouse System

enclosure to the capabilities of expansion of the system due to the underlying bus structures of the blade cabinet backplane.

What Have We Found So Far?

So far we have defined the following general characteristics for an ideal DWH system:

1. Large data storage capacity2. Able to sustain large numbers of IOPS3. Able to support high degrees of parallel processing (supports large numbers of

CPUs)4. Large core memory for each server/CPU 5. Easily increase data size and IOPS capacity6. Easily increase processing capability

The above requirements call for the following in an Oracle environment:1. Real Application clusters2. Partitioning3. Parallel Query

Given the Oracle requirements, the system server requirements would be:

1. Multi-high speed CPU servers2. Multiple servers3. High speed interconnect such as Infiniband between the servers4. Multiple high bandwidth connections to the IO subsystem

Given the IO subsystem requirements, the IO subsystem should:

1. Be easily expandable2. Provide high numbers of low latency IOPS3. Provide high bandwidth

Notice we haven’t talked about network requirements for a DWH system. Generally speaking DWH systems will have a small number of users in comparison to an online transaction processing system, so a single 1 Gigibit Ethernet type connection is generally sufficient for user access.

SoftwareOf course, since we are working with Oracle it is assumed that we will stay with Oracle. But for long term planning the idea that we might one day move away from Oracle should be entertained. Therefore our system should support multiple solutions, should the need arise. In the days where processers seemed to be increasing in speed on a daily basis and we were jumping from 8 to 16 to 32 to 64 bit processing the idea of keeping a system

Page 7: My Ideal Data Warehouse System

much beyond three years was virtually unheard of, unless it was a large SMP machine or a mainframe.

While processors are still increasing in speed, we aren’t seeing the huge leaps we used to. Now we are seeing the core wars. Each manufacturer seems to be placing more and more cores in a single chip footprint. Note the dual and quad core chips already available. Of course it seems as the number of cores on a single chip increase the number of operations that the individual cores can do actually decreases. For example, a single processor chip can do 4 GHz while a dual core may only do 2 GHz per chip. However, software will usually take advantage of the CPUs offered, so as far as the CPUs and their related servers are concerned, usually just by choosing the best high speed processors in a supported platform we can run most any software our operating system will support.

Of course disk-based systems used to be fairly generic and only required reformatting or erasure of existing files to be used for a new database system. Now we are seeing the introduction of database specific hardware such as the Exadata cell from Oracle that requires Oracle parallel query software at the cell level in order to operate. Needless to say, using technology that locks you into a specific software vendor may be good for the software vendor but it may not be best in the long run for a company that buys it.

Let’s Build the Ideal SystemLet’s start with the servers for the ideal system.

Data Warehouse ServersWe want the system to be flexible as far as upgrades, so while blade systems have a lot to offer, you are locked into a specific blade cabinet and blades so we will use individual rack mount servers instead. The use of individual rack mount servers gives us the flexibility to change our servers without having to re-purchase support hardware such as the blade cabinet and other support blades.

The server I suggest is the DELL R905 PowerEdge . The DELL R905 supports 4-quadcore Opteron 8393™, 3.1GHz processors, arguably the fastest quadcore chips and all around best processors available for the money. The complete specifications are shown in Appendix A for the suggested dual socket, 16 core configuration, which includes a dual 1 GB NIC and 2 dual channel 4 GB Fibre Channel connections. Also included is a 10Ghz NIC for the Real Application Cluster crossconnect. Since we will want the capability to parallelize our queries, we will also want more than 16 CPUs, so for our ideal configuration I suggest starting at 2 servers, giving us 32 – 3.1 GHz processors. To maximize our SGA sizes for the Oracle instances I suggest the 32 gigabyte memory option with the fastest memory available. With currently available pricing this 2 server configuration will cost just under $35K with 3 years of maintenance.

IO SubsystemCall me a radical but rather than go the safe route and talk about a mixed environment of disks and solid state storage, I am going to go out on a limb and propose that all active storage be on a mix of Flash and DDR memory devices. We will have disks, but they will

Page 8: My Ideal Data Warehouse System

be in the backup system. Figure 7 shows the speed/latency pyramid with Tier Zero at the peak.

Figure 7: The Speed/Latency Pyramid

First, let’s look at what needs to be the fastest, lowest latency storage, Tier Zero.

Tier ZeroAs a Tier Zero device I propose a RAM based solid state system such as the RamSan-440 to provide storage for temporary tablespaces, redo logs, and undo segments, as well as any indexes that will fit after the write dependent objects have been placed. The prime index candidates for the Tier Zero area would be the bitmap indexes used to support the star or snowflake schema for fast star join processing. I propose a set of 4-RamSan-440s with 512 gigabytes of available storage each in a mirrored set to provide us with a 1 terabyte Tier Zero storage area. At current costs this would run $720K. The RamSan-440 provides up to 600,000 IOPS at .015 millisecond latency. Now let’s move on to the next fastest storage, Tier 1.

Tier 1Tier 1 storage will hold the data and index areas of the data warehouse. Tier 1 of this ideal system will be placed on Flash memory. A Flash system such as the RamSan-620 will provide up to 5 terabytes of Flash with RAM-based cache in front of it to enhance write performance.

We would utilize 2-2TB RamSan-620s in a mirrored configuration. In our 300 gigabytes of data and around 250 gigabytes of indexes configuration this would provide for 2 terabytes of mirrored space to allow for growth and reliability. At current costs this would be $202K (2 TB option with 3 years maintenance and 1 extra dual port FC card).

Page 9: My Ideal Data Warehouse System

The RamSan-620 provides 250,000 IOPS with a worst-case 0.25 millisecond read latency.

Assuming we could add enough HBAs, we can achieve 2.9 million low latency IOPS from our Tier 0 and Tier 1 systems using the above configuration.

Tier 2Tier 2 storage would be our backup area. I suggest using compression and de-duplication hardware/software to maximize the amount of backup capability while minimizing the amount of storage needed. The manufacturer that comes to the top of the pile for this type of backup system is DataDomain. The DD120 system would fulfill our current needs for backup on this project system. The list price for the DataDomain DD120 appliance is $12.5K.

All of this tier talk is fine, but how are we going to hook it all together?

SwitchesAs a final component we need to add in some Fibre Channel switches, probably 4-16 channel 4 GB switches to give us redundancy in our pathways. A QLogic SanBox 5600Q provides 16-4GB ports. Four 5600Q’s would give us the needed redundancy and provide the needed number of ports at a cost of around $3,275.00 each for a total outlay on SAN switches of $13.1K. The cost of the XG700 10 gigbit Ethernet 16 port switch from Fujitsu is about $6.5K, so our total outlay for all switches is $19.6K

Final Cost of the Dream MachineLet’s run up the total bill for the data warehouse dream machine:

Servers: 36,484.00RamSan-440: 720,000.00RamSan-620: 202,000.00DataDomain: 12,500.00Switches: 19,600.00Misc. 1,500.00 (cables, rack, etc.)Total 992,084.00

So for $992K we could get a data warehouse system capable of over 2,000,000 IOPS with 32 – 3.1Ghz CPUs, a combined memory capability of 64 gigabytes, and an online available storage capacity of 3 terabytes of low latency storage that is database and (generally speaking) OS agnostic, expandable, and provides its own backup, de-duplication, and compression services. Not bad.

What about Oracle?I am afraid Oracle licenses are a bit confusing, depending on what existing licenses you may have, time of year, where you are located, and how good a negotiator your buyer is.

Page 10: My Ideal Data Warehouse System

The best I can do is an estimate based on sources such as TPC-H documents (www.tpc.org). A setup similar to what we have outlined with RAC and Parallel Query will cost about $440K as of June 3, 2009 for three years of maintenance and initial purchase price. I took out the advanced compression option since we really don’t need it.

So adding Oracle licenses into the cost brings our total for the system and software to slightly less than 1.5 million dollars ($1,442,084.00).

How does this compare to the Oracle Database Machine?To match the ideal solutions IOPS would require 741 Exadata cells at 18 mil hardware cost, 44 mil software cost plus support and switches. Since that would be an unfair comparison, we will use data volume instead of IOPS even though it puts the ideal solution at a disadvantage.

The Exadata based ODM has a quoted hardware price of $600K; for a full system with license costs it could run a total of anywhere from 2.3 to over 5 million dollars. However, this is for a fully loaded 14-42 terabyte usable capacity machine. We could probably get by with 2 Exadata cells offering between 1-2 terabytes of storage for each cell on the storage side of it. This would provide 2 terabytes of high speed (with the 1TB disks) mirrored Tier Zero and Tier 1 space. We would still need a second set of Exadata cells or some other solution for the backup Tier 2. Each cell with the high-speed disks is only capable of 2700 IOPS so our total IOPS will be 5400.

Essentially we would be using a little more than a quarter size ODM, cutting back to 4-8 CPU servers from the full size ODM total of 8 – 8 CPU servers, and only using 4 instead of 14 Exadata cells (2 for system. 2 for backup.)

The best price estimates I have seen come from the website:http://www.dbms2.com/2008/09/30/oracle-database-machine-exadata-pricing-part-2/ that have been blessed by various ODM pundits such as Kevin Clossin. Table 1 summarizes the spreadsheet found at the above location. Note that I have added in new costing figures from TPC-H documents which may be lower than posted prices for the per-disk license costs for the Exadata cells and Oracle software. The actual price is somewhere between what I have here and essentially double the license cost per disk for the Exadata cells taking their total from 240K to 480K. The general Oracle software pricing is based on a 12 named user license scheme rather than per processor. The additional cost of support for the Exadata cells is somewhere between $1100-2200 per cell so that also adds an additional $12-24K to the three year cost.

Config Partial ODM Full ODM

Exadata Server 4 14Small DB Server (4 core) 0 0Medium DB Server (8 core) 4 8Large DB Server (16 core) 0 0

Page 11: My Ideal Data Warehouse System

Total Server 4 8Total Cores 32 64

1 Exadata Server cost 24,000 24,000Total Storage cost 96,000 336,0001 DB server cost 30,000 30,000Total DB servers cost 120,000 240,000Other items 50,000 74,000Total HW price 266,000 650,000

Software price    Exadata server software per drive 5,000 5,000Exadata server software per cell 60,000 60,000

Total Exadata server software 240,000 840,000   

Oracle Licenses    Oracle database, enterprise edition 11,875 11,875RAC option 5,750 5,750Partitioning option 2,875 2,875Advanced compression option 2,875 2,875Tuning pack 3,500 3,500 Diagnostics pack 3,500 3,500 Total price per processor 30,375 30,375After Dualcore discount (50%) 15,188 15,188

Oracle License Cost $486,016 $972,032

Total Software price $725,760 $1,812,032Total System price $992,769 $2,902,032

Table 2: ODM Solution Cost Projections

So it appears that the Exadata solution will cost less initially ($450K) for fewer total IOPS (5,700 verses 2,000,000) with higher latency (5-7 ms verses .015-.25 ms) and less capable servers. However, some of the latency issues are mitigated by the advanced query software that is resident in each cell.

When looking at costing you need to remember that the software support for the Exadata cell software is going to be paid yearly in perpetuity, so you need to add that cost into the overall picture.

Green ConsiderationsThe energy consumption of an Exadata cell is projected to be 900 watts. For 4-cells that works out to 3.6KW compared to 600W for each of the RamSan-440 systems and about 325W for each RamSan-620 for a total of 3.05KW. Over one year of operation the difference in energy and cooling costs could be close to $2K all by themselves. Once all

Page 12: My Ideal Data Warehouse System

of the license and long term ongoing costs are considered, the ideal solution provides dramatic savings.

Note that aggressive marketing techniques from Oracle sales may do much to reduce the hardware and initial licensing costs for the Exadata solution; however, the ongoing costs will still play a significant factor. Expansion of the Exadata solution is by sets of 2 cells for redundancy. The ideal solution can expand to 5 terabytes on each of the RamSan620s by adding cards. Additional sets of RamSan620’s can be added in mirrored sets of 2-2 GB base sets.

You must use ASM and RMAN as a backup solution with the Exadata Database Machine. If your projected growth exceeds the basic machine we have proposed then you will have to add in the cost of more Exadata Cells and associated support costs, driving the Exadata solution well above and beyond the RamSan solution in initial and ongoing costs.

Remember that with the Exadata solution you must run Oracle11g version 11.1.0.7 at a minimum and for now it is limited to the Oracle supported Linux OS, so you have also given up the flexibility of the first solution.

Score CardLet’s sum up the comparison between the ideal system and the Oracle Data Warehouse Machine. Look at the chart in Table 2.

Consideration Ideal Configuration Oracle DWHMOS Flexible Yes NoDB Flexible Yes NoExpandable Yes Yes

High IO Bandwidth Yes YesLow Latency Yes NoHigh IOPS Yes NoInitial cost Higher Best

Long term cost Good Poor

Table 2: Comparison of Ideal with Oracle DWHM

From the chart in Table 2 we can see that the ODM is only on par with a few of the total considerations for our ideal system. However, the ODM does offer great flexibility and expandability as long as you stay within the supported OS and with Oracle11g databases. The ODM also offers better performance than standard hard disk arrays within its limitations.

SummaryIn this paper I have shown what I would consider to be the ideal data warehouse system architecture. One thing to remember is that this type of system is a moving target, as

Page 13: My Ideal Data Warehouse System

technologies and definitions of what a data warehouse is supposed to be change. An ideal architecture allows high IO bandwidth, low latency, capability for high degree of parallel operations, and flexibility as far as future database system and OS are concerned. The savvy system purchaser will weigh all factors before selecting a solution that may block future movement to new OS or databases as they become available.

Page 14: My Ideal Data Warehouse System

Appendix A: Server Configuration

PowerEdge R905Starting Price $20,224Instant Savings $1,800

Subtotal $18,424Preliminary Ship Date: 7/30/2009Date: 7/23/2009 9:36:05 AM Central Standard TimeCatalog Number 4 Retail 04Catalog Number / Description Product Code Qty SKU Id PowerEdge R905:R905 2x Quad Core Opteron 8393SE, 3.1Ghz, 4x512K Cache, HT3 90531S 1 [224-5686] 1Additional Processor:Upgrade to Four Quad Core Opteron 8393SE 3.1GHz 4PS31 1 [317-1156] 2Memory:32GB Memory, 16X2GB, 667MHz 32G16DD 1 [311-7990] 3Operating System:Red Hat Enterprise Linux 5.2AP, FI x64, 3yr, Auto-Entitle, Lic & MediaR52AP3 1 [420-9802] 11Backplane:1X8 SAS Backplane, for 2.5 Inch SAS Hard Drives only, PowerEdge R905 1X825HD 1 [341-6184] 18External RAID Controllers:Internal PERC RAID Controller, 2 Hard Drives in RAID 1 configPRCR1 1 [341-6175][341-6176]27Primary Hard Drive:73GB 10K RPM Serial-Attach SCSI 3Gbps 2.5-in HotPlug Hard Drive73A1025 1 [341-6095] 82nd Hard Drive:73GB 10K RPM Serial-Attach SCSI 3Gbps 2.5-in HotPlug Hard Drive73A1025 1 [341-6095] 23Rack Rails:Dell Versa Rails for use in Third Party Racks, Round HoleVRSRAIL 1 [310-6378] 28Bezel:PowerEdge R905 Active Bezel BEZEL 1 [313-6069] 17Power Cords:2x Power Cord, NEMA 5-15P to C14, 15 amp, wall2WL10FT 1[310-8509][310-38snCFG6 plug, 10 feet / 3 meter 8509]Integrated Network Adapters:4x Broadcom® NetXtreme II 5708 1GbE Onboard NICs with TOE4B5708 1 [430-2713] 41Optional Feature Upgrades for Integrated NIC Ports:LOM NICs are TOE, iSCSI Ready (R905/805) ISCSI 1 [311-8713] 6Optional Network Card Upgrades:Intel PRO 10GbE SR-XFP Single Port NIC, PCIe-8 10GSR 1 [430-2685] 613Optional Optical Drive:DVD-ROM Drive, Internal DVD 1 [313-5884] 16Documentation:Electronic System Documentation, OpenManageDVD Kit with DMCEDOCSD 1[330-0242][330-5280]21Hardware Support Services:3Yr Basic Hardware Warranty Repair: 5x10 HWOnly, 5x10 NBD OnsiteU3OS 1[988-0072][988-4210][990-5809][990-6017][990-6038]29

Page 15: My Ideal Data Warehouse System

Appendix B: RamSan 440 Specs

RamSan-440 Details

RamSan-440 highlights:

The World's Fastest Storage® Over 600,000 random I/Os per second 4500 MB/s random sustained external throughput Full array of hardware redundancy to ensure availability IBM Chipkill technology protects against memory errors up to and including loss of a memory

chip. RAIDed RAM boards protect against the loss of an entire memory board. Exclusive Active Backup® software constantly backs up data without any performance

degradation. Other SSDs only begin to backup data after power is lost. Patented IO2 (Instant-On Input Output) software allows data to be accessed during a recovery.

Customers no longer have to wait for a restore to be completed before accessing their data.

I/Os Per Second600,000 Capacity256-512 GB Bandwidth4500 MB per second LatencyLess than 15 microseconds Fibre Channel Connection

4-Gigabit Fibre Channel (2-Gigabit capable) 2 ports standard; up to 8 ports available Supports point-to-point, arbitrated loop, and switched fabric topologies Interoperable with Fibre Channel Host Bus Adaptors, switches, and operating systems

 Management

Browser-enabled system monitoring, management, and configuration SNMP supported Telnet management capability Front panel displays system status and provides basic management functionality Optional Email home feature

 LUN Support

1 to 1024 LUNs with variable capacity per LUN Flexible assignment of LUNs to ports Hardware LUN masking

 Data Retention

Non-volatile solid state disk Redundant internal batteries (N+1) power the system for 25 minutes after power loss Automatically backs up data to Flash memory modules at 1.4 GB/sec

 Reliability and Availability

Page 16: My Ideal Data Warehouse System

Chipkill technology protects data against memory errors up to and including loss of an entire memory chip

Internal redundancies o Power supplies and fans o Backup battery power (N+1) o RAIDed RAM Boards (RAID 3) o Flash Memory modules (RAID 3)

Hot-swappable components o Five Flash Memory modules (front access) o Power supplies

Active BackupTM o Active BackupTM mode (optional) backs up data constantly to internal redundant Flash

Memory modules without impacting system performance making shutdown time significantly shorter.

IO2 o IO2 allows instant access to data when power is restored to the unit and while data is

synced from Flash backup. Soft Error Scrubbing

o When a single bit error occurs on a read, the RamSan will automatically re-write the data to memory thus scrubbing soft errors. Following the re-write the system re-reads to verify the data is corrected.

 Backup ProceduresSupports two backup modes that are configurable per system or per LUN:

Data Sync mode synchronizes data to redundant internal Flash Memory modules before shutdown or with power loss

Active BackupTM mode (optional) - backs up data constantly to internal redundant Flash Memory modules without impacting system performance.

 Size

7” (4U) x 24”

 Power Consumption (peak)650 Watts Weight (maximum)90 lbs

Page 17: My Ideal Data Warehouse System

Appendix C: RamSan620 Specifications

RamSan-620 highlights: 2-5 TB SLC Flash storage 250,000 IOPS random sustained throughput 3 GB/s random sustained throughput 325 watts power consumption Lower cost

Features  A Complete Flash storage system in a 2U rack Low overhead, low power High performance and high IOPS, bandwidth, and capacity Standard management capabilities Two Flash ECC correction levels Super Capacitors for orderly power down Easy installation  Fibre Channel or Infiniband connectivity Low initial cost of ownership Easy incremental addition of performance and capacity

I/Os Per Second: 250,000 read and write Capacity : 2-5 TB of SLC Flash  Bandwidth: 3 GB per second Latency Writes: 80 microseconds Reads: 250 microseconds  Fibre Channel Connection

4-Gigabit Fibre Channel 2 ports standard; up to 8 ports available Supports point-to-point and switched fabric topologies Interoperable with Fibre Channel Host Bus Adaptors, switches, and operating

systems  Management

Browser-enabled system monitoring, management, and configuration SNMP supported Telnet management capability SSH management capability Front panel displays system status and provides basic management functionality

 LUN Support 1 to 1024 LUNs with variable capacity per LUN Flexible assignment of LUNs to ports

 Data Retention Completely nonvolatile solid state disk Reliability and Availability

Page 18: My Ideal Data Warehouse System

Flash Layer 1: ECC (chip) Flash Layer 2: board-level RAID

Internal redundancies   - Power supplies and fans Hot-swappable components    - Power supplies  Size : 3.5" (2U) X 18"Power Consumption (peak) : 325 Watts Weight (maximum): 35 lbs

Page 19: My Ideal Data Warehouse System

Appendix D: DataDomain DD120 SpecificationsRemote Office Data Protection> High-speed, inline deduplicationstorage> 10-30x data reduction average> Reliable backup and rapid recovery> Extended disk-based retention> Eliminate tape at remote sites> Includes Data Domain ReplicatorsoftwareEasy Integration> Supports leading backup andarchive applications from:Symantec EMCHP IBMMicrosoft CommVaultAtempo BakBoneComputer Associates> Supports leading enterpriseapplications including:> Database: Oracle, SAP, DB2> Email: Microsoft Exchange> Virtual environments: VMware> Simultaneous use of NAS andSymantec OpenStorage (OST)Multi-Site Disaster Recovery> 99% bandwidth reduction> Consolidate remote office backups> Flexible replication topologies> Replicate to larger Data Domainsystems at central site> Multi-site tape consolidation> Cost-efficient disaster recoveryUltra-Safe Storage forReliable Recovery> Data Invulnerability Architecture> Continuous recovery verification,fault detection and healingOperational Simplicity> Lower administrative costs> Power and cooling efficiencies forgreen operation> Reduced hardware footprint> Supports any combination ofnearline applications in asingle systemSPECIFICATIONS DD120Capacity: Raw 3 750 GBLogical Capacity: Standard 1, 3 7 TBLogical Capacity: Redundant 2, 3 18 TBMaximum Throughput 150 GB/hrPower Dissipation 257 WCooling Requirement 876 BTU/hrSystem Weight23 lbs (11 kg)System Dimensions (WxDxH)16.92 x 25.51 x 1.7 inches (43 x 64.8 x 4.3 cm) withoutrack mounting ears and bezel.

Page 20: My Ideal Data Warehouse System

19 x 27.25 x 1.7 inches (48.3 x 69.2 x 4.3 cm) with rackmounting ears and bezel.Minimum ClearancesFront, with Bezel: 1” (2.5 cm)Rear: 5” (12.7 cm)Operating Current115VAC/230VAC2.2/1.1 AmpsSystem Thermal Rating876 BTU/hrOperating Temperature5°C to 35°C (41°F to 95°F)Operating Humidity20% to 80%, non-condensingNon-operating (Transportation) Temperature-40°C to +65°C (-40°F to +149°F)Operating Acoustic NoiseMax 7.0 BA, at typical office ambient temperature(23 +/- 2° C)REGULATORY APPROVALSSafety: UL 60950-1, CSA 60950-1, EN 60950-1,IEC 60950-1, SABS, GOST, IRAMEmissions: FCC Class A, EN 55022, CISPR 22, VCCI,BSMI, RRLImmunity: EN 55024, CISPR 24Power Line Harmonics: EN 610003-2