Copyright © 2004, SAS Institute Inc. All rights reserved. Wayne Embry Technical Account Manager...
-
Upload
pierce-shaw -
Category
Documents
-
view
214 -
download
0
Transcript of Copyright © 2004, SAS Institute Inc. All rights reserved. Wayne Embry Technical Account Manager...
Copyright © 2004, SAS Institute Inc. All rights reserved.
Wayne Embry
Technical Account Manager
March 17, 2005
Delivering Enterprise Value with SAS® 9 Architecture:
GRID COMPUTING and SAS
Copyright © 2004, SAS Institute Inc. All rights reserved. 2
Agenda Defining Grid
Why is Grid Computing Important?
Who’s Interested in Grid and Why?
SAS Technology Behind Grid
Packaging
Architecture
Supported Platforms
Summary
Copyright © 2004, SAS Institute Inc. All rights reserved. 3
Defining Grid in the IT World… According to Gartner, "a grid is a collection of
resources owned by multiple organizations that is coordinated to allow them to solve a common problem." Gartner (and Wayne) further define three commonly recognized forms of grid: *Computing Grid - multiple computers to solve one
application problem Data Grid - multiple storage systems to host one very
large data set Collaboration Grid - multiple collaboration systems for
collaborating on a common issue. Other:
Utility Grid – Resources are chosen for you; ASPs
Copyright © 2004, SAS Institute Inc. All rights reserved. 4
Why is Grid Computing Important to SAS?
SAS believes that 2005 will be the year customers begin to view grid computing as a practical solution to their business problems, so the timing is right for it to be an important focus.
Our ability to speak to our Grid capabilities will further positions our solutions and toolsets as enterprise class, and substantially differentiates our offerings from those of competitors. We will also be able to build additional enterprise credibility.
Proof: A recent IDC report projected that the grid computing market
may exceed $12 billion by 2007. Gartner reported that 56% of large IT customers had not been
contacted by a single vendor regarding Grid.
Copyright © 2004, SAS Institute Inc. All rights reserved. 6
Why is Grid Computing Important?
Grid computing leverages under-utilized and un-tapped computing resources to drastically reduce processing times which in turn saves money.
Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers to more rapidly solve complex problems and to run increasingly data-intensive applications.
IT spending continues to be substantially restricted while demands on the IT department continue to increase. Grid computing is a strategic alternative to resolve this dilemma, providing one of the biggest “bangs-for-the buck” in IT.
Copyright © 2004, SAS Institute Inc. All rights reserved. 7
Reality Check: Who’s Interested and Why?…
Frugal Phyllis
Title: CIO of a business unit of a large corporation
Report to: CEO of the business unit
Computing Skills: Advanced
Top ETL-related issues:1. Faced with processing ever-increasing volumes of data
2. Challenged to provide useable results in ever-shorter time-frames
3. Short on funds, especially for additional hardware
Copyright © 2004, SAS Institute Inc. All rights reserved. 8
Reality Check: Who’s Interested and Why?…
Al the Architect Title: Head Information Architect and “right-hand”
to CIO Report to: CIO of a business unit of a large
corporation Computing Skills: Expert Top ETL-related issues:
1. Charged with building fast and flexible architectures without spending much money
2. Needs to find ways to cope with more jobs, and larger jobs, all being squeezed into the same batch window
3.Would be nice if his solutions to the above could inspire the Enterprise as a whole, or at least integrate with their existing tools
Copyright © 2004, SAS Institute Inc. All rights reserved. 9
Reality Check: Who’s Interested and Why?…
Silo Sandy (somewhat similar to Frugal Phyllis)
Title: CEO (or Director) of a business unit of a large corporation
Report to: CEO of the Enterprise
Computing Skills: Average
Top ETL-related issues:1.Trying to build own information organization because
she is not satisfied with corporate IT
2.Needs to do so using only existing hardware resources
3.Needs solutions running quickly and with reliability and maintainability
Copyright © 2004, SAS Institute Inc. All rights reserved. 10
Reality Check: Who’s Interested and Why?…
And a user persona who influences the above buyers:
Forever Fred Title: Business Analyst (a.k.a Power User) Report to: Director or Sr. Manager of a business
unit of a large corporation Computing Skills: Power User Top ETL-related issues:
1.Takes too long to load data for his job, so he misses batch windows
2.Constantly being admonished for monopolizing system resources
3.“Beaten up” for not delivering reports fast enough
Copyright © 2004, SAS Institute Inc. All rights reserved. 11
Types of Applications Suitable for Grid
Long running jobs (batch window)
Many repetitive iterations of a fundamental task Simulation BY GROUP processing
Parallelism Independent tasks against large data sources
Scoring, Risk analysis Pipeline parallelism (Piping) Both
Copyright © 2004, SAS Institute Inc. All rights reserved. 12
RFID Data
Collector
RFID Data
Collector
RFID Data
Collector
RFID Data
Collector
REALTIME
SAP/R3
REALTIME REALTIME REALTIME
DB/2ORACLE
SYBASE
RFID COMPLEXITY
Copyright © 2004, SAS Institute Inc. All rights reserved. 13
SAS Technology Behind Grid – Today…Analytics Scenario
Base, Connect,….
Base, Connect,…
Base, Connect,….
…
n
Connect Client
%Distribute
SAS
Copyright © 2004, SAS Institute Inc. All rights reserved. 14
SAS Technology Behind Grid – Today…Data Integration Scenario
ETL Studio
SAS MC
Schedule Manager
SAS
Servers
Base Connect,….
Base, Connect, …..
Base, Connect,…..
…
n
Metadata Server
Workspace Server
Connect Client
LSF
Job Scheduler
Copyright © 2004, SAS Institute Inc. All rights reserved. 15
SAS Technology Behind Grid – 2005…Improving our Capabilities
Base, Connect,.....
LSF
Base, Connect,……
LSF
Base, Connect, ……
LSF
…
n
Connect Client
LSF
SAS
Server
Copyright © 2004, SAS Institute Inc. All rights reserved. 16
SAS Grid –2005…
ETL Studio
SAS MC
Schedule Manager
Grid Manager - New
SAS
Servers
Metadata Server
Workspace Server
Connect Client
LSF
Job Scheduler Base, Connect,…
LSF
Base, Connect,….
LSF
Base, Connect,.…
LSF
…
n
Enterprise Miner
Copyright © 2004, SAS Institute Inc. All rights reserved. 17
SAS 9 Packaging… Head Start – SAS\Connect is already included in
ETL Server and EETL Server
Any solution including ETL Server
Copyright © 2004, SAS Institute Inc. All rights reserved. 18
Supported Platforms…
Good News – Any platform that supports Base and Connect
Heterogeneous architecture
Copyright © 2004, SAS Institute Inc. All rights reserved. 19
Architecture Guidelines
There are guidelines to keep in mind when architecting SAS Grid environments:
Permanent data SASWORK
Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance.
For help architecting SAS Grids, please call SAS Account Representative
Copyright © 2004, SAS Institute Inc. All rights reserved. 20
Example Grid Job 1
ETL StudioSAS
Server
Workspace Server
-Base
Connect
L8364 - 1 CPU (1.6 GHz; 2 GB RAM)
Base, Connect Data
Quality
Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)
Base, Connect Data
Quality
Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM)
Customer
Orders_grid
Order_item_grid
Copyright © 2004, SAS Institute Inc. All rights reserved. 21
Example Grid Job 2
ETL StudioSAS
Server
Workspace Server
-Base
Connect
L8364 - 1 CPU (1.6 GHz; 2 GB RAM)
Base, Connect Data
Quality
Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)
Base, Connect Data
Quality
Demo0507 – 2 CPU (3.06 GHz; 4 GB RAM)
Orders_gridOrder_item_grid
LXYZ
SASWORK Customer
Copyright © 2004, SAS Institute Inc. All rights reserved. 22
An Example - The Scenario… Single Platform Job - Local_Complicated
Run locally on my laptop in sequential order Source Data – 3 local SAS tables:
– Customer: 16 Mb; 89,954 rows; 12 columns– Orders_grid: 214 Mb; 5,710,014 rows; 8
columns– Order_item_grid: 315 Mb; 4,487,718 rows; 7
columns Target – 1 local SAS table with 15 columns
Copyright © 2004, SAS Institute Inc. All rights reserved. 23
Local_Complicated Job
ETL StudioSAS
Server
Workspace Server
-Data Quality
-Base
L8364 - 1 CPU (1.6 GHz; 2GB RAM)
Order_item_grid
Orders_grid
Customer
Elapsed Wall Clock Time: 4
minutes
Copyright © 2004, SAS Institute Inc. All rights reserved. 26
Leveraging the Grid - The Scenario… Enable Job to Run on a SAS Grid -
Remote_Complicated Grid Strategies:
Independent parallelism – Independent data and processes
Pipeline parallelism Source Data:
2 remote SAS tables:– Orders_grid: 214 Mb; 5,710,014 rows; 8
columns– Order_item_grid: 315 Mb; 4,487,718 rows; 7
columns 1 local SAS table:
– Customer: 16 Mb; 89,954 rows; 12 columns Target – 1 local SAS table with 15 columns
Copyright © 2004, SAS Institute Inc. All rights reserved. 27
Remote_Complicated Job
ETL StudioSAS
Server
Workspace Server
-Base
Connect
L7875 - 1 CPU (1.6 GHz; 1 GB RAM)
Base, Connect Data
Quality
Demo0505 – 2 CPU (3.06 GHz; 4 GB RAM)
Base, Connect Data
Quality
Demo0507– 2 CPU (3.06 GHz; 4 GB RAM)
Customer
Orders_grid
Order_item_grid
Elapsed Wall Clock Time: 30
seconds
90% improvement!
Copyright © 2004, SAS Institute Inc. All rights reserved. 28
Performance Issues Competition answer to performance issues
Buy a bigger server (i.e., 32 way to a 64 way) Increase the number of RDMS instances (i.e., Oracle) More $$$$
SAS’ answer Grid computing leverages under-utilized and un-tapped
heterogeneous computing resources to drastically reduce processing times
Grid computing allows organizations to further leverage their current IT investment by harnessing the collective processing power of existing computers
Save $$$$
Copyright © 2004, SAS Institute Inc. All rights reserved. 29
Architecture Guidelines
There are guidelines to keep in mind when architecting SAS Grid environments:
Permanent data SASWORK
Data Accessibility - Where it is and how each of the machines on the grid are attached to it (NFS, SAN) greatly affects performance.
Copyright © 2004, SAS Institute Inc. All rights reserved. 30
How is it Set Up? The SAS Technology Behind the Scenario…
Components and Considerations: Base, SAS/Connect ETL Studio Metadata Server Data Quality
Copyright © 2004, SAS Institute Inc. All rights reserved. 38
Closing Thoughts… Mileage may vary
Next step in evolving the SAS9 Platform
Enterprise credibility
Competition Buy more servers and license more DBMS instances These 50 jobs will use this server, these 30 jobs run on
this server…. Manageability
BI – Stored processes
EMiner and LSF Integration ITMS – ITRM will have a generic collector to collect
LSF performance data
Copyright © 2004, SAS Institute Inc. All rights reserved. 39
Collateral… White Papers
SUGI29 - http://support.sas.com/rnd/scalability/papers/sugi29_grid.pdf
Connect Syntax - http://support.sas.com/rnd/scalability/papers/mpconnect0401.pdf
%DISTRIBUTE –http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf
Web Site http://support.sas.com/rnd/scalability/grid/index.html
Customer Reference Stories http://support.sas.com/rnd/scalability/grid/gridcust.html