Sharing Enterprise Data
Data administrationData administrationData downloadingData downloadingData warehousingData warehousing
Data administration Organization-wide activity (the DBA of a Organization-wide activity (the DBA of a
particular database is only a part of this)particular database is only a part of this) Challenges:Challenges:
Many types of data existMany types of data exist Basic categories of data are not obviousBasic categories of data are not obvious The same data can have many names, The same data can have many names,
descriptions, and formatsdescriptions, and formats Data are changed – often concurrentlyData are changed – often concurrently Political and organizational issues complicate Political and organizational issues complicate
operational issuesoperational issues
Marketing
Communicate existence of data Communicate existence of data administration to organizationadministration to organization
Explain reason for existence of Explain reason for existence of standards, policies, and guidelinesstandards, policies, and guidelines
Describe in a positive light the services Describe in a positive light the services providedprovided
Data standards and policies
Establish standard means for describing Establish standard means for describing data items; standards include name, data items; standards include name, definition, description, processing definition, description, processing restrictions, etc.restrictions, etc.
Establish Establish data proponentsdata proponents Establish organization-wide data policy; Establish organization-wide data policy;
examples are security, data examples are security, data proponency, and distributionproponency, and distribution
Forum for data conflict resolution
Establish procedures for reporting Establish procedures for reporting conflictsconflicts
Provide means for hearing all Provide means for hearing all perspectives and viewsperspectives and views
Have authority to make decision to Have authority to make decision to resolve conflictresolve conflict
Return on organization's data investment
Focus attention on value of data Focus attention on value of data investmentinvestment
Investigate new methodologies and Investigate new methodologies and technologiestechnologies
Take proactive attitude toward Take proactive attitude toward information managementinformation management
Downloading: potential problems CoordinationCoordination
Conform downloaded data to database constraintsConform downloaded data to database constraints Coordinate local updates with downloadsCoordinate local updates with downloads
ConsistencyConsistency Downloaded data should not be updatedDownloaded data should not be updated Applications need features to prevent updatingApplications need features to prevent updating Warn users of possible problemsWarn users of possible problems
Access controlAccess control Data may be replicated on many computersData may be replicated on many computers More difficult data access control proceduresMore difficult data access control procedures
Risk of computer crimeRisk of computer crime Disks and modem access are easy to concealDisks and modem access are easy to conceal Illegal copying is difficult to preventIllegal copying is difficult to prevent
Data warehousing What if every department wants to What if every department wants to
download the organization’s data?download the organization’s data? The data management problem becomes The data management problem becomes
immenseimmense Data warehouseData warehouse:: a centralized a centralized
repository to facilitate management repository to facilitate management decision making and increase the decision making and increase the value of the enterprise data assetsvalue of the enterprise data assets
Integrated From Various Sources
Operational Data Data WarehouseOperational Data Data Warehouse
appln A - m,fappln A - m,f
appln. B - male, female m, fappln. B - male, female m, f
appln. C - x,yappln. C - x,y
appln.. D - 1,0appln.. D - 1,0
Data in Data Warehouse
Older Detail Data
Current Detail
Lightly Summarized
Highly Summarized
Sales Detail 1992-98
Sales Detail 1998-99
Regional Sales by Week 83-98
National Sales by Month 85-98
Time Variant
Operational DataOperational Data
time horizon 60-90 time horizon 60-90 daysdays
key may / may not key may / may not have element of timehave element of time
can be updatedcan be updated
Data WarehouseData Warehouse
time horizon 5-10 yearstime horizon 5-10 years key contains element key contains element
of timeof time once snapshot is made once snapshot is made
data cannot be data cannot be updatedupdated
Non - volatile
Operational DataOperational Data
Data is updated on a Data is updated on a record by record basisrecord by record basis
To support the record-To support the record-by-record on line by-record on line update, requires the update, requires the technology to have very technology to have very complex foundationcomplex foundation
Data WarehouseData Warehouse
Data is not updated Data is not updated The physical design The physical design
levels liberties can be levels liberties can be taken to optimize the taken to optimize the access of dataaccess of data
Replace
Insert
Change
ReplaceLoad Access
Data warehouse components Data extraction toolsData extraction tools Extracted dataExtracted data Metadata of warehouse contentsMetadata of warehouse contents Warehouse DBMS(s)Warehouse DBMS(s) Warehouse data management toolsWarehouse data management tools Data delivery programsData delivery programs End- user analysis toolsEnd- user analysis tools User training courses and materialsUser training courses and materials Warehouse consultantsWarehouse consultants
Data warehouse requirements Queries and reports with variable Queries and reports with variable
structurestructure OLAPOLAP:: On-Line Analytical Processing On-Line Analytical Processing
User- specified data aggregationUser- specified data aggregation User- specified User- specified drill downdrill down Graphical outputsGraphical outputs Integration with domain- specific programsIntegration with domain- specific programs
OLAP OLAP
----to gain insight into data through fast, consistent, interactive to gain insight into data through fast, consistent, interactive access to wide variety of viewsaccess to wide variety of views
--functionality characterized by dynamic multidimensional--functionality characterized by dynamic multidimensional
analysis of consolidated enterprise dataanalysis of consolidated enterprise data
Data ExtractionData Extraction --ability to capture, convert, & deliver data to various sources--ability to capture, convert, & deliver data to various sources
--provides fast disk-to-disk transfer capabilities and automate --provides fast disk-to-disk transfer capabilities and automate data compressiondata compression
Data Mining Tools Data Mining Tools -- -- helps by focusing end user attention on a smaller subset of helps by focusing end user attention on a smaller subset of
datadata
-- subset is determined by data mining “discovery”process, -- subset is determined by data mining “discovery”process, which is done in advance of in-depth analysiswhich is done in advance of in-depth analysis
Executive Information SystemExecutive Information System -- for senior executives with little computing experience-- for senior executives with little computing experience
-- available on demand with whatever level of detail ( drill--- available on demand with whatever level of detail ( drill-down)down)
-- add value, improve strategic & financial control, market & -- add value, improve strategic & financial control, market & economical information, better competitive analysis economical information, better competitive analysis
Financial & Marketing Analysis Financial & Marketing Analysis -- -- provides end user with highly value added report likeprovides end user with highly value added report like accounts receivable / payable, ledger mgmt., cost controlaccounts receivable / payable, ledger mgmt., cost control cost budgeting & planning,cost budgeting & planning, -- in marketing - product pricing, demand analysis,-- in marketing - product pricing, demand analysis, estimationestimation -- use non-technical language, run queries in fast, reliable -- use non-technical language, run queries in fast, reliable
manner..manner.. Report & Query ToolsReport & Query Tools -- most important & widely used-- most important & widely used -- emphasize generating value added reports-- emphasize generating value added reports -- user have flexibility to use either common English/ SQL-- user have flexibility to use either common English/ SQL -- support graphical interface-- support graphical interface
FINGERHUTFINGERHUT 150 catalog mailings in 1997150 catalog mailings in 1997 based on statistically predicted consumer based on statistically predicted consumer
responseresponse 30 million customers, 14% annual growth30 million customers, 14% annual growth database captures 1400 pieces of database captures 1400 pieces of
information about a householdinformation about a household demographics, purchasing historiesdemographics, purchasing histories
Example
Data warehouse challenges Inconsistent dataInconsistent data
E.g., different timing, different domains...E.g., different timing, different domains... Tool integrationTool integration
E.g., spreadsheets versus databases…E.g., spreadsheets versus databases… Lack of warehouse data management Lack of warehouse data management
toolstools In-house software development (expensive)In-house software development (expensive)
Ad-hoc requirementsAd-hoc requirements
Top Related