Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection,...

16
Metadata By N.Gopinath AP/CSE •Metadata and it’s role in the lifecycle. •The collection, maintenance, and deployment of metadata •Metadata and tool integration.

Transcript of Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection,...

Page 1: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata

ByN.GopinathAP/CSE

•Metadata and it’s role in the lifecycle. •The collection, maintenance, and deployment of metadata•Metadata and tool integration.

Page 2: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Definitions

Metadata – additional data warehouse used to understand what information is in the warehouse, and what it meansMetadata Repository – specialized database designed to maintain metadata, together with the tools and interfaces that allow a company to collect and distribute its metadata.Operational Data – elements from operation systems, external data (or other sources) mapped to the warehouse structures.

Page 3: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Industry TrendWhy were early Data Warehouses that did not include significant amounts of metadata collection able to succeed?• Usually a subset of data was targeted, making it easier to understand content, organization, ownership.• Usually targeted a subset of (technically inclined) end users

Early choices were made to ensure the success of initial data warehouse efforts.

Page 4: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Transition

Usually, metadata repositories are already in existence. Traditionally, metadata was aimed at overall systems management, such as aiding in the maintenance of legacy systems through impact analysis, and determining the appropriate reuse of legacy data structures.

Repositories can now aide in tracking metadata to help all data warehouse users understand what information is in the warehouse and what it means. Tools are now being positioned to help manage and maintain metadata.

Page 5: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Lifecycle1. Collection: Identify metadata and capture it in a central

repository.

2. Maintenance: Put in place processes to synchronize metadata automatically with the changing data architecture.

3. Deployment: Provide metadata to users in the right form and with the right tools.

The key to ensuring a high level of collection and maintenance accuracy is to incorporate as much automation as possible. The key to a successful metadata deployment is to correctly match the metadata offered to the specific needs of each audience.

Page 6: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Collection• Collecting the right metadata at the right time is the basis for a success. If the user does not already have an idea about what information would answer a question, the user will not find anything helpful in the warehouse.

•Metadata spans many domains from physical structure data, to logical model data, to business usage and rules.

• Typically the metadata that should be collected is already generated and processed by the development team anyway. Metadata collection preserves the analysis performed by the team.

Page 7: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Categories: Warehouse Data Sources

Information about the potential sources of data for a data warehouse (existing operational systems, external data, manually maintained information). The intent is to understand both the physical structure of the data and the meaning of the data. Typically the physical structure is easier to collect as it may exist in a repository that can be parsed automatically.

Page 8: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Categories: Data ModelsCorrelate the enterprise model to the warehouse model.

• Map entities in the enterprise model to their representation in the warehouse model. This will provide the basis for further change impact analysis and end user content analysis.

• Ensure the entity, element definition, business rules, valid values, and usage guidelines are transposed properly from the enterprise model to the warehouse model.

Page 9: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Categories: Warehouse MappingsMap the operational data into the warehouse data structures

• Each time a data element is mapped to the warehouse, the logical connection between the data elements, as well as any transformations should be recorded.

• Along with being able to determine that an element in the warehouse is populated from specific sources of data, the metadata should also discern exactly what happens to those elements as they are extracted from the data sources, moved, transformed, and loaded into the warehouse.

Page 10: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata Categories: Warehouse Usage InformationUsage information can be used to:• Understand what tables are being accessed, by whom, and how often. This can be used to fine tune the physical structure of the data warehouse.• Improve query reuse by identifying existing queries (catalog queries, identify query authors, descriptions).•Understand how data is being used to solve business problems.

This information is captured after the warehouse has been deployed. Typically, this information is not easy to collect.

Page 11: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Maintaining MetadataAs with any maintenance process, automation is key to maintaining current high-quality information. The data warehouse tools can play an important role in how the metadata is maintained.

Most proposed database changes already go through appropriate verification and authorization, so adding a metadata maintenance requirement should not be significant.

Capturing incremental changes is encouraged since metadata (particularly structure information) is usually very large.

Page 12: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Maintaining the WarehouseThe warehouse team must have comprehensive impact analysis capabilities to respond to change that may affect:

• Data extraction\movement\transformation routines• Table structures• Data marts and summary data structures• Stored user queries• Users who require new training (due to query or other changes)• What business problems are addressed in part using the element that is changing (help understand the significance of the change, and how it may impact decision making).

Page 13: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Metadata DeploymentSupply the right metadata to the right audience

• Warehouse developers will primarily need the physical structure information for data sources. Further analysis on that metadata leads to the development of more metadata (mappings).

• Warehouse maintainers typically require direct access to the metadata as well.

• End Users require an easy-to-access format. They should not be burdened with technical names or cryptic commands. Training, documentation and other forms of help, should be readily available.

Page 14: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

End UsersUsers of the warehouse are primarily concerned with two types

of metadata.

1. A high-level topic inventory of the warehouse (what is in the warehouse and where it came from).

2. Existing queries that are pertinent to their search (reuse).

The important goal is that the user is easily able to correctly find and interpret the data they need.

Page 15: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Integration with Data Access Tools1. Side by Side access to metadata and to real data. The user can

browse metadata and write queries against the real data.

2. Populate query tool help text with metadata exported from the repository. The tool can now provide the user with context sensitive help at the expense of needing updating whenever metadata changes and the user may be using outdated metadata.

3. Provide query tools that access the metadata directly to provide context sensitive help. This eliminates the refresh issue, and ensures the user always sees current metadata.

4. Full interconnectivity between query tool and metadata tool (transparent transactions between tools)

Page 16: Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.

Thank you…