Data Profiler

13
Data Profiler Rohit Agarwal

description

Data Profiler. Rohit Agarwal. OUTLINE. Introduction Types of Profiling When should Data Profiling be done? General Model Methodology Conclusion References. Introduction. - PowerPoint PPT Presentation

Transcript of Data Profiler

Page 1: Data Profiler

Data Profiler

Rohit Agarwal

Page 2: Data Profiler

OUTLINE• Introduction• Types of Profiling• When should Data Profiling be done?• General Model• Methodology• Conclusion• References

Page 3: Data Profiler

Introduction

• Data as we know is a piece of information, this piece of information is very important for the large organizations and companies, just like money and property are important assets for a human likewise data is an asset to any successful organization.

• Data profiling is a technique or methodology to discover correct, accurate and properly structured data with no discrepancy.

Page 4: Data Profiler

Cont…

• Data profiling in simple words is a tool which help individual be at any level in an organization to access data without having any knowledge about database, without firing a query.

• It also helps experts in terms of saving time and resources and also minimizes the error chances because organizations contain large amount of data and in order to get small piece of information they have to write complex queries, but with data profiling they can directly obtain a data.

Page 5: Data Profiler

Business Strategy

• Data profiling is always a initial step or process for any business related applications like Customer relationship management (CRM), Enterprise resource planning (ERP) and Data warehousing.

• Today almost all big organizations around the globe are implementing two primary enterprise applications ERP and CRM where ERP is used to keep the expenses in check and CRM is used to build the relationships with customer.

Page 6: Data Profiler

Types of Profiling

• Column Analysis• Frequency Analysis• Null rule Analysis• Constant Analysis• Empty Column Analysis• Unique Analysis• Single Table Analysis

Page 7: Data Profiler

When should data profiling be done?

• “Clearly, data profiling should be done on all data quality assessment projects as well on all IT projects that either move data to another structure or migrate or consolidate data” [1].

• In large organizations there is very important databases with lot of data. The possibility and probability of this data to have inaccuracies due to many changes is very high. So data has to be reprofiled several times periodically.

Page 8: Data Profiler

General Model

Page 9: Data Profiler

Cont…

• Data Profiling is basically a data quality assurance process.

• Data analyst is the one who usually performs data profiling as a single analyst or in a team. It then sends the profiled data to business analyst.

• Business analyst is the one who knows how to use that correct and accurate data.

Page 10: Data Profiler

Cont…

• Developers are also involved in the process because they are the ones who built the application, manages the physical data and give basic definitions to each and every rule. So at the time of verification they can be of real help.

• Database Administrator are the ones who provides data analysts with the data to be profiled.

• Other staff is useful in providing some suggestions and insight during the process.

Page 11: Data Profiler

Methodology

• Bottom-up approach• Top-down approach

Page 12: Data Profiler

Conclusion

• Well begun is half done”, this phrase very well suits in case of data profiling because for organizations or for the development team where data plays an important role, data profiling should always be the first step. In the project we talk about different types of profiling which provides great deal of insight about how data profiling really works and also including the mechanism to store the accurate finding in excel which in this case serves as a data profiling repository.

Page 13: Data Profiler

References

• DataFlux Corporation,”Data profiling The foundation for data management,” 2003; http://infoimpact.com/articles/Data%20Profiling%20White%20Paper1003-final.pdf

• B. Dorr et al., ”Data Profiling: Designing the blueprint for improved data quality,” SAS Institite.Inc; http://www2.sas.com/proceedings/sugi30/102-30.pdf

• J. Olson, Data Quality: The accuracy dimension: Morgan Kauffman, 2003. http://books.google.com/books?hl=en&lr=&id=x8ahL57VOtcC&oi=fnd&pg=PP2&dq=data+quality+the+accuracy+dimension+jack+e+olson&ots=pXQrjhX6H0&sig=Ntqx7E384vIT2sqVBpsd_WlX7SA#v=onepage&q&f=false