CIS 310 Management Information Systems Database Refresher.
-
Upload
abigail-hodge -
Category
Documents
-
view
231 -
download
1
Transcript of CIS 310 Management Information Systems Database Refresher.
CIS 310 Management Information Systems
Database Refresher
Database Refresher – Mental Model
Database = File CabinetRecords = File FolderData = Contents of the folder
Database = Student RecordsTable = A-G in the top DrawerRecord = BID 1234567890Attribute, Entity or Data Item = Student Name, Address, GPA, email…etc.
Relating TablesStudent Data• BID (unique)• First Name• Last Name• Address• City• State• Zip• Phone• eMail
Enrolled In, Winter 2013• BID (unique)• Class Number 1• Class Number 2• Class Number 3• Class Number 4…etc.
Classes in Winter 2013• Class No. (unique each term)• Dept. No.• Class Name• Units• Time• Professor
Relational Database Management System (RDBMS)
• Software that helps you to link the tables and perform reporting and queries on the data in the database.– Access (CIS 101)– Sybase– Oracle– SQL Server (Microsoft)– DB2 (IBM)
Entity Relationship Diagram (ERD)
• Design tool used to design and plan databases.
• Primary key
• Table
Attributes
ERDs Continues
Relationships• One-to-one
• One-to-many
• Many-to-many
1
11
ERD Example (engotzz.blogspot.com)
Bookstore ERD from jdonohue.com
End
• Is LastName a good primary key if you’re just using a small database for a class?
• Name three tables that would be in a database for pet adoption.
• What would data attributes for a pet adoption database be for a table called animals?
No. People have the same last name. Not expandable.
Animals NewOwners ShotHistory
petID, Breed, Age, Name, KidFriendly, OtherPetFriendly, Color…etc.
CIS 310 Management Information Systems
Data Warehousing, Data Marts, Data Integrity
Example: Rensselaer Polytechnic Institute Admissions Data Warehouse
• Attract the best and brightest and retain diversity, balance, geography and manage financial aid.
• Results– Invested $1.2 million. Costs $537,000 annually to
operate.– Savings in improved data analysis $820,000 annually– Savings in financial aid $500,000 annually– Savings in labor for reporting $320,000 annually
Source: Information Week, 2007
Example 2 – Cal Poly Data Warehouse
Data Warehouse
• Collection of data from several databases to support business analysis.– Aggregate lots of data– Internal and external sources– Drill down capability
• Benefit– Focus on managerial decision making instead of
operational decision making.– Provide insight not available before because data
was never connected before.
datawarehouse4u.infoExternal Data Sources
Data Mart: Subset of DW (Gdwsolutions.com)
Data Cube (Multidimensional Analysis)
• Allows you to look at the data from different dimensions to perform analysis.
Store AStore BStore CStore DStore E
Campaign A Campaign BCampaign C
Prod
uct A
Prod
uct B
Prod
uct C
Prod
uct D
Slicing an Dicing
Store AStore BStore CStore DStore E
Campaign A Campaign BCampaign C
Prod
uct A
Prod
uct B
Prod
uct C
Prod
uct D
Store AStore BStore CStore DStore E
Campaign A Campaign BCampaign C
Prod
uct A
Prod
uct B
Prod
uct C
Prod
uct D
Questions
1. What store is the most productive and what products are the best sellers at those stores?
2. What clinics need the most blood during which time of year?
3. Which advertising campaign was most productive in which areas?
4. What elements are decisively different between my worst performing and best performing store?
Information Granularity
• Yard foot inches • All sales sales per region sales per store • Kids with the flu kids with the flue by region kids with the flue by region and age.
• Granularity is how far you can dig into the detail of the data.
Data Integrity
• Not all data is ‘clean’. Sometimes data is erroneous or incomplete.
• You do not want to make a decision based upon bad data.
• High quality data can lead to better, or a least more informed, decisions.
5 Characteristics of High Quality Information
• Accuracy – the data is correct.• Completeness – all the data needed is there.• Consistency – data is uniform. A phone number
has 10 characters, never any other length. • Uniqueness – To have value, the data must
uniquely inform the company.• Timeliness – New and current data is better for
current decision making.
Data Scrubbing
• ‘Cleaning’ the data to get rid of incomplete, inconsistent or erroneous data.– Missing data or attributes– Redundant records– Missing keys– Erroneous records– Incorrect data
• Ex. How many fake accounts have you set up to try software for free?
• Ex. How many times did you sign up for something and then never return?
Information Accuracy Costs $$
• The more complete and accurate the data is, the higher it will cost.
• It takes resources to collect, verify and fix data.
• Another question – what is the costs of not having high-quality data?– Having to redo things.– Making bad decisions.– Process failure.
Getting the Right Data In
• Online forms designed to prevent errors.• @ to check if it is an email address or to email
the account before activation.• Form fields required. * Don’t let the user
continue until they fill out everything.• Form fields of a specific format or length. Zip
code has to be 5 digits or it is rejected.
End
• Is a monthly sales report an example of highly granular information?
• What is that OLAP thingy?
• Why would anyone use a data mart instead of the data warehouse?
No. It is more granular than a yearly sales report and less granular than a daily sales report.
Data cube…with the ability to slice and dice.
A data mart is a subset of a data warehouse, used for a more focused purpose. You would use the mart to make your analysisfaster and maybe easier.
Data Mining & Data Analysis
Winter, 2013
Data Mining
Use a variety of techniques to uncover interesting things about the data• Cluster analysis• Association Detection• Statistical Analysis
Structured vs. Unstructured Data
• Structured data is already in a database or spreadsheet format. – .mdbx or .xlsx
• Unstructured data doesn’t have an organized format to it.– Photos– Music– Pdf memos.– Emails.
Cluster Analysis
• Grouping a set of objects in a way that clusters form around certain attributes.
• Ex. Zip code clustering can show where most sales, customers..etc. are from.
• Ex. Social media cluster analysis may predict what words are more likely to be next to each other. – Music sales is directly linked to buzz on social media.– Mapped into chart where clusters form and grow on
different words, predicting success.
Association Detection
• Market Basket Analysis (also known as commodity bundle) – What is in your basket and how is it related/predictive?
• A student purchasing engineering books might also need a calculator.
• Amazon suggesting ‘customers who bought this also bought that.’ May entice you to purchase more books.
Statistical Analysis
• Forecasting & Time Series– Data can be collected at specific intervals to gain
predictive insight to it.– Ex. stock prices, power consumption, sales over
time in response to a marketing campaign.– Data could indicate seasonal or cyclic trends.
Other Mining Opportunities
• Text Mining –– Searching through a massive number of emails for
a company. – Searching twitter data.
• Web Mining– Look at people’s browsing and buying or
navigation habits.
So what is BI again?
• It is a set of processes and analysis tools used to examine data and get something great from it.
• BI and Big Data is booming. There are lots of massive data sets and we are at the beginning of understanding how to gain insight from all that data.
End