IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.
-
Upload
sofia-powell -
Category
Documents
-
view
217 -
download
1
Transcript of IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.
IUP’s Production Data Warehouse
By
Indiana University of Pennsylvania
Daniel J. Kuta
Agenda Introduction Overview of the hardware/software
environment Overview of the data warehouse items
that have been implemented Items that have worked well Items that have been a challenge
Agenda Future directions Useful references
Introduction Database administrator at IUP Graduate of IUP March 2004 – 19 years at IUP Worked primarily with Office of the
Registrar, Graduate School, Undergraduate Admissions and some Financial Aid.
About IUP Approx.13,800 students; 1,800 employees Largest Member, SSHE 3 campuses; 1 center; 1 academy More than 100 undergraduate programs,
close to 50 master’s degree programs and 8 programs leading to a doctoral degree.
Clock-hour programs
Banner at IUP Implemented five baseline modules and
three “Web For” products 1998-2000 Banner 5.x (soon to be Banner 6) Oracle 9i, OAS (soon to be 9IAS) Sun Solaris
Post Implementation SSD, FAMIS, Workflow, TouchNet,
Resource25, IDWorks, CSI Web For Admissions and Web For Alumni Quest Central For Oracle Dozens of custom-written programs and
web applications Large data warehousing initiative
Banner IT Support at IUP Application Development Group
1 Coordinator 1 Senior Systems Analyst/DBA 2 Senior DBAs 7 Developers
Miscellaneous Entities User Services, Tech. Services, Acad. Support
Reps., Power Users
Hardware/software environment Development environment
Dell PowerEdge 2500 server 1.266 GHz CPU 1 Gb RAM 72 Gb disk storage Windows 2000 Server operating system Oracle 9.2.0.4 Enterprise Edition
Hardware/software environment Production environment
Sun Sparc Ultra-4 server 4 - 296 MHz CPUs 2 Gb RAM 208 Gb total disk storage Sun Solaris 5.6 operating system Oracle 8.1.6.3 Enterprise Edition
Initial End User Base Staff within the office of the Vice Provost for
Administration and Technology Staff within the University Planning and
Analysis and Institutional Research area.
Current projects and their impetus Replacement of an Institutional Research
database. Migration of routines from MS Access
queries to packaged PL/SQL procedures. Want the ability to “prove” or “justify” the
data.
Current Meeting Structure Meet with representatives from the
Associate Provost and the Planning and Analysis areas approximately every three weeks.
Set agenda of topics to discuss Email/phone call follow-ups as needed
between meetings.
Overview of Data Warehouse items that have been implemented
User starting point Warehouse web site Intro page lists the data sets or subject areas
available with a brief description. Last extract/freeze date recorded. Next extract/freeze date identified.
Overview of Data Warehouse items that have been implemented
Student Grades Provides data that allows for the analysis of
grades and program review.
Course Master Allows further analysis of the above, plus
enrollment and credit hours generated by student level within a course.
Overview of Data Warehouse items that have been implemented
Course Schedule Provides data that allows for the monitoring of
enrollment levels in courses at determined intervals.
Allow for attrition/migration analysis.
Quarterly Financial Summarizations A rollup of financial data within Fund,
Organization, Program and Account Code.
Overview of Data Warehouse items that have been implemented
Payroll Source is the bi-weekly payroll Focus feed’s
received from the SSHE payroll system. Data regarding earnings, benefits, deductions,
etc. is recorded. Data is linked back to Banner position numbers,
giving a tie back to the FOAPAL strings responsible for the expense.
Items that have worked well Extract all data needed to a staging
database and build/rebuild from there. All columns traced back to their source. Once a table was “touched” with a column or
columns, the entire row is extracted to the staging database. All columns are pulled.
Items that have worked well Extract all data needed to a staging
database and build/rebuild from there. Staging tables mimic the layout of their source
tables. They include an additional column that identifies their “freeze id.”
All rows required from the source tables are extracted and tagged in the staging database with an indicator to tie them together – the “freeze id” column is populated.
Items that have worked well Extract all data needed to a staging
database and build/rebuild from there. The builds of the data sets are now based on the
staged data. Any subsequent rebuilds all run from the same
staged data.
Items that have worked well Extract all data needed to a staging
database and build/rebuild from there. Benefit: Consistent builds/rebuilds. Not hitting a
moving target with data from the Banner production database.
Benefit: We’re able to “prove” and “justify” the builds of the data sets.
Items that have worked well Model-based construction for the extracts
from Banner production to staging. Parameter/profile tables were created to identify
the source tables required to build the data sets for a subject area.
The tables also identified any special SQL “FROM” or “WHERE” clause logic that was needed to extract the data.
Items that have worked well Model-based construction for the extracts
from Banner production to staging. If a table was not listed as requiring any special
SQL extract logic, the entire table was pulled. This was used to pull copies of required Banner
validation tables, usually needed for some transformations or code descriptions.
Items that have worked well Model-based construction for the extracts
from Banner production to staging. Parameter/profile tables assisted with...
The generation of the scripts that created the tables in the staging database.
The generation of the SQL extract scripts to pull data from the Banner production database to the staging database.
Items that have worked well Model-based construction for the extracts
from Banner production to staging. Parameter/profile tables assisted with...
Scripts to provide “record counts” of data pulled to staging.
Scripts to delete data from test runs of the extract scripts in the staging database.
Items that have worked well Initially started with Java programs
generating the scripts... CREATE TABLE scripts for the staging database The extract scripts The record count scripts The delete scripts
Items that have worked well Initially started with Java programs
generating the scripts... Running the extract scripts in this manner
worked well for high volume, low frequency extracts – 3 per year.
It was a manageable process. However...
Items that have worked well PL/SQL packaged procedures were
created to dynamically create and execute the SQL extract scripts. Need dictated by low volume, high frequency,
off-hours extracts. Additional tables were created to record run-time
parameters and the job’s results.
Items that have worked well PL/SQL packaged procedures were
created to dynamically create and execute the SQL extract scripts. Procedures run “unattended”, logging their
results.
Items that have worked well Builds of the data sets are done by PL/SQL
packaged procedures. Call to execute a “build procedure” with passed
parameters that identify the “freeze data” to use.
Vast majority of the transformations and description lookups coded as PL/SQL functions. Benefit: reusability
Items that have worked well The completed data sets are built in the
staging database. This allows for an analysis of the builds by
validation procedures.
Items that have worked well After validation of the new data sets in the
staging database, the new data sets are then copied into the data warehouse. Separate procedures are used to perform the
migration of the data from staging to the data warehouse.
Items that have worked well Once the updated data sets have been
migrated into the data warehouse… The data warehouse web site is updated to
reflect the status of the data sets available. Keeping the web site updated and current is
necessary to gain user buy-in to use it. Otherwise, expect phone calls asking for the
status of...
Items that have been a challenge User dictated design – Replacement of the
existing IR database. Too many databases and queries were already
written and dependent on the existing structure.
Items that have been a challenge Discovery of all existing transformations
Transformations hidden in a vast array of MS Access queries.
Special “fix” routines coded in SQL scripts run through SQL*Plus.
Items that have been a challenge Missing data
Analysis of the builds sheds light on data missing from the Banner production database.
Resolution: Identify critical data. Verify it is available prior to performing the
extracts.
Items that have been a challenge User availability
Subject matter experts must be available to provide needed information and feedback in a timely manner.
Items that have been a challenge Parallel builds of the data
Difficulty in coordinating parallel builds of the data sets within both systems in order to perform validation of the new procedures.
User testing Parallel builds performed – Yeah! User participation in the validation of the builds
was lacking.
Future directions/plans Complete the deactivation of the old IR
database. SSHE-related semester freezes.
Add additional functionality to the “job execution” environment. Currently logs start time, end time and duration
of the entire job.
Future directions/plans Add additional functionality to the “job
execution” environment. Will have it log each “job step” or extract it is
performing. Record the start time, end time and duration of the
step. Metadata on the target table: initial storage
requirements, it’s needs after the extract and the change in those requirements.
Future directions/plans Add additional functionality to the “job
execution” environment. Keep the “build” and “migration” procedures, but
add procedure calls to perform the logging of the job’s metadata.
Future directions/plans Existing project in the queue for financial
reporting. Desire is to have flexible, responsive, rollup
reporting. Detail data must be available for drilldown. Look to model budgets, commitments, payments,
revenue, etc.
Future directions/plans Existing project in the queue for financial
reporting. Challenges:
No intimate knowledge of Banner Finance. First truly dimensional model. Some Ragged Hierarchies. Implementation of change data capture
procedures.
Future directions/plans Change the focus of the data warehousing
projects. Currently, too heavy on mandated state
reporting. It’s focus is on reporting the past, or “what has
happened.”
Future directions/plans Change the focus of the data warehousing
projects. Need to direct attention to the detection of trends
and our reaction to them. And yes, you do need historical data to do that.
But it must be in the proper format to easily answer the questions that are asked.
Future directions/plans As a simple example, running a University
(or any business) is a lot like driving a car... Can you successfully get to where you want to
be by constantly looking in the rear view mirror? You must look out the front windshield and focus
on what you see. Like it or not, there’s stuff coming at you!
Future directions/plans As a simple example, running a University
(or any business) is a lot like driving a car... You must navigate around any obstacles you
encounter. But this is only short-term success, a nice
leisurely drive. You need direction, a destination, and a “road
map” to get there.
Future directions/plans As a simple example, running a University
(or any business) is a lot like driving a car... The strategic plan of the university defines it’s
goals – it’s “destination.” If so, what’s our plan or “road map” look like in
trying to get to reach that destination? Have we aligned our data warehouse initiatives
with that plan?
Future directions/plans As a simple example, running a University
(or any business) is a lot like driving a car... Are we collecting and analyzing the data needed
to measure our progress at reaching that destination?
What triggers a change, a “detour” or “alternate route” in the journey?
Conclusion Satisfied with the environment setup to
perform the extracts, builds and migrations of the data sets.
Users are satisfied with what they are receiving.
Conclusion Yes, I feel a level of frustration that the
initiatives have focused on mandated reporting – the “What happened?” reporting.
Need to implement structures to capture and provide more metadata on the data sets and the procedures and functions that build them.
Useful references Books
Building the Data Warehouse - W. H. Inmon© 1996 – John Wiley & Sons
The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses – Ralph Kimball© 1996 – John Wiley & Sons
Useful references Books
The Data Warehouse Toolkit – Second EditionThe Complete Guide to Dimensional ModelingRalph Kimball, Margy Ross© 2002 – Wiley Computer Publishing
Data Warehouse Design SolutionsChristopher Adamson, Michael Venerable© 1998 – Wiley Computer Publishing
Useful references Books
Designing a Data Warehouse: Supporting Customer Relationship ManagementChris Todman – Hewlett Packard Professional Books© 2001 – Prentice Hall Publishing
Useful references Books
Mastering Data Warehouse Design: Relational and Dimensional TechniquesClaudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger© 2003 Wiley Publishing
Useful references The Data Warehousing Institute
www.dw-institute.com
Intelligent Enterprise www.intelligententerprise.com
DM Review www.dmreview.com
Useful references Bill Inmon’s web sites
www.inmoncif.com www.inmongif.com
Ralph Kimball’s web site www.ralphkimball.com
Oracle 9.2 documentation set