IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.

IUP’s Production Data Warehouse

By

Indiana University of Pennsylvania

Daniel J. Kuta

Agenda Introduction Overview of the hardware/software

environment Overview of the data warehouse items

that have been implemented Items that have worked well Items that have been a challenge

Agenda Future directions Useful references

Introduction Database administrator at IUP Graduate of IUP March 2004 – 19 years at IUP Worked primarily with Office of the

Registrar, Graduate School, Undergraduate Admissions and some Financial Aid.

About IUP Approx.13,800 students; 1,800 employees Largest Member, SSHE 3 campuses; 1 center; 1 academy More than 100 undergraduate programs,

close to 50 master’s degree programs and 8 programs leading to a doctoral degree.

Clock-hour programs

Banner at IUP Implemented five baseline modules and

three “Web For” products 1998-2000 Banner 5.x (soon to be Banner 6) Oracle 9i, OAS (soon to be 9IAS) Sun Solaris

Post Implementation SSD, FAMIS, Workflow, TouchNet,

Resource25, IDWorks, CSI Web For Admissions and Web For Alumni Quest Central For Oracle Dozens of custom-written programs and

web applications Large data warehousing initiative

Banner IT Support at IUP Application Development Group

1 Coordinator 1 Senior Systems Analyst/DBA 2 Senior DBAs 7 Developers

Miscellaneous Entities User Services, Tech. Services, Acad. Support

Reps., Power Users

Hardware/software environment Development environment

Dell PowerEdge 2500 server 1.266 GHz CPU 1 Gb RAM 72 Gb disk storage Windows 2000 Server operating system Oracle 9.2.0.4 Enterprise Edition

Hardware/software environment Production environment

Sun Sparc Ultra-4 server 4 - 296 MHz CPUs 2 Gb RAM 208 Gb total disk storage Sun Solaris 5.6 operating system Oracle 8.1.6.3 Enterprise Edition

Initial End User Base Staff within the office of the Vice Provost for

Administration and Technology Staff within the University Planning and

Analysis and Institutional Research area.

Current projects and their impetus Replacement of an Institutional Research

database. Migration of routines from MS Access

queries to packaged PL/SQL procedures. Want the ability to “prove” or “justify” the

data.

Current Meeting Structure Meet with representatives from the

Associate Provost and the Planning and Analysis areas approximately every three weeks.

Set agenda of topics to discuss Email/phone call follow-ups as needed

between meetings.

Overview of Data Warehouse items that have been implemented

User starting point Warehouse web site Intro page lists the data sets or subject areas

available with a brief description. Last extract/freeze date recorded. Next extract/freeze date identified.


Student Grades Provides data that allows for the analysis of

grades and program review.

Course Master Allows further analysis of the above, plus

enrollment and credit hours generated by student level within a course.


Course Schedule Provides data that allows for the monitoring of

enrollment levels in courses at determined intervals.

Allow for attrition/migration analysis.

Quarterly Financial Summarizations A rollup of financial data within Fund,

Organization, Program and Account Code.


Payroll Source is the bi-weekly payroll Focus feed’s

received from the SSHE payroll system. Data regarding earnings, benefits, deductions,

etc. is recorded. Data is linked back to Banner position numbers,

giving a tie back to the FOAPAL strings responsible for the expense.

Items that have worked well Extract all data needed to a staging

database and build/rebuild from there. All columns traced back to their source. Once a table was “touched” with a column or

columns, the entire row is extracted to the staging database. All columns are pulled.


database and build/rebuild from there. Staging tables mimic the layout of their source

tables. They include an additional column that identifies their “freeze id.”

All rows required from the source tables are extracted and tagged in the staging database with an indicator to tie them together – the “freeze id” column is populated.


database and build/rebuild from there. The builds of the data sets are now based on the

staged data. Any subsequent rebuilds all run from the same

staged data.


database and build/rebuild from there. Benefit: Consistent builds/rebuilds. Not hitting a

moving target with data from the Banner production database.

Benefit: We’re able to “prove” and “justify” the builds of the data sets.

Items that have worked well Model-based construction for the extracts

from Banner production to staging. Parameter/profile tables were created to identify

the source tables required to build the data sets for a subject area.

The tables also identified any special SQL “FROM” or “WHERE” clause logic that was needed to extract the data.


from Banner production to staging. If a table was not listed as requiring any special

SQL extract logic, the entire table was pulled. This was used to pull copies of required Banner

validation tables, usually needed for some transformations or code descriptions.


from Banner production to staging. Parameter/profile tables assisted with...

The generation of the scripts that created the tables in the staging database.

The generation of the SQL extract scripts to pull data from the Banner production database to the staging database.


from Banner production to staging. Parameter/profile tables assisted with...

Scripts to provide “record counts” of data pulled to staging.

Scripts to delete data from test runs of the extract scripts in the staging database.

Items that have worked well Initially started with Java programs

generating the scripts... CREATE TABLE scripts for the staging database The extract scripts The record count scripts The delete scripts

Items that have worked well Initially started with Java programs

generating the scripts... Running the extract scripts in this manner

worked well for high volume, low frequency extracts – 3 per year.

It was a manageable process. However...

Items that have worked well PL/SQL packaged procedures were

created to dynamically create and execute the SQL extract scripts. Need dictated by low volume, high frequency,

off-hours extracts. Additional tables were created to record run-time

parameters and the job’s results.

Items that have worked well PL/SQL packaged procedures were

created to dynamically create and execute the SQL extract scripts. Procedures run “unattended”, logging their

results.

Items that have worked well Builds of the data sets are done by PL/SQL

packaged procedures. Call to execute a “build procedure” with passed

parameters that identify the “freeze data” to use.

Vast majority of the transformations and description lookups coded as PL/SQL functions. Benefit: reusability

Items that have worked well The completed data sets are built in the

staging database. This allows for an analysis of the builds by

validation procedures.

Items that have worked well After validation of the new data sets in the

staging database, the new data sets are then copied into the data warehouse. Separate procedures are used to perform the

migration of the data from staging to the data warehouse.

Items that have worked well Once the updated data sets have been

migrated into the data warehouse… The data warehouse web site is updated to

reflect the status of the data sets available. Keeping the web site updated and current is

necessary to gain user buy-in to use it. Otherwise, expect phone calls asking for the

status of...

Items that have been a challenge User dictated design – Replacement of the

existing IR database. Too many databases and queries were already

written and dependent on the existing structure.

Items that have been a challenge Discovery of all existing transformations

Transformations hidden in a vast array of MS Access queries.

Special “fix” routines coded in SQL scripts run through SQL*Plus.

Items that have been a challenge Missing data

Analysis of the builds sheds light on data missing from the Banner production database.

Resolution: Identify critical data. Verify it is available prior to performing the

extracts.

Items that have been a challenge User availability

Subject matter experts must be available to provide needed information and feedback in a timely manner.

Items that have been a challenge Parallel builds of the data

Difficulty in coordinating parallel builds of the data sets within both systems in order to perform validation of the new procedures.

User testing Parallel builds performed – Yeah! User participation in the validation of the builds

was lacking.

Future directions/plans Complete the deactivation of the old IR

database. SSHE-related semester freezes.

Add additional functionality to the “job execution” environment. Currently logs start time, end time and duration

of the entire job.

Future directions/plans Add additional functionality to the “job

execution” environment. Will have it log each “job step” or extract it is

performing. Record the start time, end time and duration of the

step. Metadata on the target table: initial storage

requirements, it’s needs after the extract and the change in those requirements.

Future directions/plans Add additional functionality to the “job

execution” environment. Keep the “build” and “migration” procedures, but

add procedure calls to perform the logging of the job’s metadata.

Future directions/plans Existing project in the queue for financial

reporting. Desire is to have flexible, responsive, rollup

reporting. Detail data must be available for drilldown. Look to model budgets, commitments, payments,

revenue, etc.

Future directions/plans Existing project in the queue for financial

reporting. Challenges:

No intimate knowledge of Banner Finance. First truly dimensional model. Some Ragged Hierarchies. Implementation of change data capture

procedures.

Future directions/plans Change the focus of the data warehousing

projects. Currently, too heavy on mandated state

reporting. It’s focus is on reporting the past, or “what has

happened.”

Future directions/plans Change the focus of the data warehousing

projects. Need to direct attention to the detection of trends

and our reaction to them. And yes, you do need historical data to do that.

But it must be in the proper format to easily answer the questions that are asked.

Future directions/plans As a simple example, running a University

(or any business) is a lot like driving a car... Can you successfully get to where you want to

be by constantly looking in the rear view mirror? You must look out the front windshield and focus

on what you see. Like it or not, there’s stuff coming at you!


(or any business) is a lot like driving a car... You must navigate around any obstacles you

encounter. But this is only short-term success, a nice

leisurely drive. You need direction, a destination, and a “road

map” to get there.


(or any business) is a lot like driving a car... The strategic plan of the university defines it’s

goals – it’s “destination.” If so, what’s our plan or “road map” look like in

trying to get to reach that destination? Have we aligned our data warehouse initiatives

with that plan?


(or any business) is a lot like driving a car... Are we collecting and analyzing the data needed

to measure our progress at reaching that destination?

What triggers a change, a “detour” or “alternate route” in the journey?

Conclusion Satisfied with the environment setup to

perform the extracts, builds and migrations of the data sets.

Users are satisfied with what they are receiving.

Conclusion Yes, I feel a level of frustration that the

initiatives have focused on mandated reporting – the “What happened?” reporting.

Need to implement structures to capture and provide more metadata on the data sets and the procedures and functions that build them.

Useful references Books

Building the Data Warehouse - W. H. Inmon© 1996 – John Wiley & Sons

The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses – Ralph Kimball© 1996 – John Wiley & Sons


The Data Warehouse Toolkit – Second EditionThe Complete Guide to Dimensional ModelingRalph Kimball, Margy Ross© 2002 – Wiley Computer Publishing

Data Warehouse Design SolutionsChristopher Adamson, Michael Venerable© 1998 – Wiley Computer Publishing

Useful references The Data Warehousing Institute

www.dw-institute.com

Intelligent Enterprise www.intelligententerprise.com

DM Review www.dmreview.com

Useful references Bill Inmon’s web sites

www.inmoncif.com www.inmongif.com

Ralph Kimball’s web site www.ralphkimball.com

Oracle 9.2 documentation set

Questions? Comments?

Dan Kuta

[email protected]

(724) 357-2887

IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.

Documents

Transcript of IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.