Data Warehouse Project

DATA WAREHOUSING

CHSI PROJECT MODULE

Team 1:

Abhinav Garg (11761380)

Tanu Srivastav (11772446)

Tejbeer Chhabra (11756746)

Table of Contents

Executive Summary ............................................................................................................... 1

MDX Queries and their output ............................................................................................... 2

DMX Queries: Mining Model: ................................................................................................ 7

Appendix ............................................................................................................................. 10

1

Executive Summary With increasing amunt of Data, the need to store and find information from the data becomes

very cruical. With the use of On-Line Analytical Processing (OLAP) technology, we can not only

store large amout temporal data but also perfrom several business intelligence operations. CHSI

data provided has more than 5 and half million records. The aim of the project is to find

meanigful insight which could bring innovation in rural health systems.

1. OLAP Cube and Queries By executing MDX queries on OLAP Cube, We found several interesting figures and facts,

which are of buiness importance. We discovered how Formulatory type effects charge

quantity, wich year is making most profit on medication, what are profitable caresettings,

what is infusion time for various IV route medication and what are the most frequently

ocuuring discontinue reasons. These findings could be of mangerial importance to the

Hospital administration. As one of the objective of our project was to learn MDX queries,

we have implemented several different MDX functions such as TOPCOUNT, CROSSJOIN,

NON EMPTY, HEAD, FILTER, and SUBSET in our queries.

2. Data Mining We have build mining structure using DMX queries as well as using Visual Studio

Analytical Services. Data Mining Models could be build on these structure, which allows

us to predict what would be the discontinue reason for the medication. This could give us

information about the effectiveness of the medication. We have implemented three Data

Mining Algorithm namely, Decision Tree, Neural Network and Regression.

2

MDX Queries and their output

7

DMX Queries: Mining Model: 1. Creating Mining Structure

2. Creating Mining Model

8

3. Variable Importance:

Most important variable is found to be Infusion Time. Therefore, we can say that discontinuing a

medication could be predicted by the infusion time of medication.

4. Lift Chart Model comparison:

9

From the above table we can say, the models fits well to the data set and are reasonably of similar

strength.

10

Appendix 1. Partitions

Data partition is based on Date ID. We have made 4 partitions for the entire dataset.

2. Aggregations

3. Calculated Measure:

1. Since we have Unit Cost and Unit Price, we have made a calculated measure “NET_PROFIT”,

which is Unit Price – Unit Cost.

2. For data modeling, we have created a calculated measure called “Target_Discontinue”. This is

coded in binary format of 0’s and 1’s. When the discontinue reason is for a positive change in

Patient’s health then it is coded as 1 else 0.

Data Warehouse Project

Data & Analytics

Transcript of Data Warehouse Project