Change Data Capture

Post on 29-Oct-2015

191 views 0 download

Transcript of Change Data Capture

Basics of Change Data Capture

Change data capture records insert, update, and delete activity that is applied to a SQL Server table. This makes the details of the changes available in an easily consumed relational format. Column information and the metadata that is required to apply the changes to a target environment is captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables. Table-valued functions are provided to allow systematic access to the change data by consumers.

A good example of a data consumer that is targeted by this technology is an extraction, transformation, and loading (ETL) application. An ETL application incrementally loads change data from SQL Server source tables to a data warehouse or data mart. Although the representation of the source tables within the data warehouse must reflect changes in the source tables, an end-to-end technology that refreshes a replica of the source is not appropriate. Instead, you need a reliable stream of change data that is structured so that consumers can apply it to dissimilar target representations of the data. SQL Server change data capture provides this technology.

Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.

Change Data Capture Data Flow: The following illustration shows the principal data flow for change data capture.

The source of change data for change data capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log serves as input to the change data capture capture process. This reads the log and adds information about changes to the tracked table’s associated change table. Functions are provided to enumerate the changes that appear in the change tables over a specified range, returning the information in the form of a filtered result set. The filtered result set is typically used by an application process to update a representation of the source in some external environment.

Understanding Change Data Capture and the Capture Instance: Before changes to any individual tables within a database can be tracked, change data capture must be explicitly enabled for the database. This is done by using the stored procedure sys.sp_cdc_enable_db. When the database is enabled, source tables can be identified as tracked tables by using the stored procedure sys.sp_cdc_enable_table. When a table is enabled for change data capture, an associated capture instance is created to support the dissemination of the change data in the source table. The capture instance consists of a change table and up to two query functions. Metadata that describes the configuration details of the capture instance is retained in the change data capture metadata tables cdc.change_tables, cdc.index_columns, and cdc.captured_columns. This information can be retrieved by using the stored procedure sys.sp_cdc_help_change_data_capture.

All objects that are associated with a capture instance are created in the change data capture schema of the enabled database. The requirements for the capture instance name is that it be a valid object name, and that it be unique across the database capture instances. By default, the name is <schema name_table name> of the source table. Its associated change table is named by appending _CT to the capture instance name. The function that is used to query for all changes is named by prepending fn_cdc_get_all_changes_ to the capture instance name. If the capture instance is configured to support net changes, the net_changes query function is also created and named by prepending fn_cdc_get_net_changes_ to the capture instance name.

Change Table: The first five columns of a change data capture change table are metadata columns. These provide additional information that is relevant to the recorded change. The remaining columns mirror the identified captured columns from the source table in name and, typically, in type. These columns hold the captured column data that is gathered from the source table.

Each insert or delete operation that is applied to a source table appears as a single row within the change table. The data columns of the row that results from an insert operation contain the column values after the insert. The data columns of the row that results from a delete operation contain the column values before the delete. An update operation requires one row entryto identify the column values before the update, and a second row entry to identify the column values after the update.

Each row in a change table also contains additional metadata to allow interpretation of the change activity. The column __$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change. The commit LSN both identifies changes that were committed

within the same transaction, and orders those transactions. The column __$seqval can be used to order more changes that occur in the same transaction. The column __$operation records the operation that is associated with the change: 1 = delete, 2 = insert, 3 = update (before image), and 4 = update (after image). The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns.

Change Data Capture Validity Interval for a Database: The change data capture validity interval for a database is the time during which change data is available for capture instances. The validity interval begins when the first capture instance is created for a database table, and continues to the present time.

Data that is deposited in change tables will grow unmanageably if you do not periodically and systematically prune the data. The change data capture cleanup process is responsible for enforcing the retention-based cleanup policy. First, it moves the low endpoint of the validity interval to satisfy the time restriction. Then, it removes expired change table entries. By default, three days of data is retained.

At the high end, as the capture process commits each new batch of change data, new entries are added to cdc.lsn_time_mapping for each transaction that has change table entries. Within the mapping table, both a commit Log Sequence Number (LSN) and a transaction commit time (columns start_lsn and tran_end_time, respectively) are retained. The maximum LSN value that is found in cdc.lsn_time_mapping represents the high water mark of the database validity window. Its corresponding commit time is used as the base from which retention based cleanup computes a new low water mark.

Because the capture process extracts change data from the transaction log, there is a built in latency between the time that a change is committed to a source table and the time that the change appears within its associated change table. While this latency is typically small, it is nevertheless important to remember that change data is not available until the capture process has processed the related log entries.

Change Data Capture Validity Interval for a Capture Instance: Although it is common for the database validity interval and the validity interval of individual capture instance to coincide, this is not always true. The validity interval of the capture instance starts when the capture process recognizes the capture instance and starts to log associated changes to its change table. As a result, if capture instances are created at different times, each will initially have a different low endpoint. The start_lsn column of the result set that is returned by sys.sp_cdc_help_change_data_capture shows the current low endpoint for each defined capture instance. When the cleanup process cleans up change table entries, it adjusts the start_lsn values for all capture instances to reflect the new low water mark for available change data. Only those capture instances that have start_lsn values that are currently less than the new low water mark are adjusted. Over time, if no new capture instances are created, the validity intervals for all individual instances will tend to coincide with the database validity interval.

The validity interval is important to consumers of change data because the extraction interval for a request must be fully covered by the current change data capture validity interval for the capture instance. If the low endpoint of the extraction interval is to the left of the low endpoint of the validity interval, there could be missing change data due to aggressive cleanup. If the high endpoint of the extraction interval is to the right of the high endpoint of the validity interval, the capture process has not yet processed through the time period that is represented by the extraction interval, and change data could also be missing.

The function sys.fn_cdc_get_min_lsn is used to retrieve the current minimum LSN for a capture instance, while sys.fn_cdc_get_max_lsn is used to retrieve the current maximum LSN value. When querying for change data, if the specified LSN range does not lie within these two LSN values, the change data capture query functions will fail.

Handling Changes to Source Tables : To accommodate column changes in the source tables that are being tracked is a difficult issue for downstream consumers. Although enabling change data capture on a source table does not prevent such DDL changes from occurring, change data capture helps to mitigate the effect on consumers by allowing the delivered result sets that are returned through the API to remain unchanged even as the column structure of the underlying source table changes. This fixed column structure is also reflected in the underlying change table that the defined query functions access.

To accommodate a fixed column structure change table, the capture process responsible for populating the change table will ignore any new columns that are not identified for capture when the source table was enabled for change data capture. If a tracked column is dropped, null values will be supplied for the column in the subsequent change entries. However, if an existing column undergoes a change in its data type, the change is propagated to the change table to ensure that the capture mechanism does not introduce data loss to tracked columns. The capture process also posts any detected changes to the column structure of tracked tables to the cdc.ddl_history table. Consumers wishing to be alerted of adjustments that might have to be made in downstream applications, use the stored procedure sys.sp_cdc_get_ddl_history.

Typically, the current capture instance will continue to retain its shape when DDL changes are applied to its associated source table. However, it is possible to create a second capture instance for the table that reflects the new column structure. This allows the capture process to make changes to the same source table into two distinct change tables having two different column structures. Thus, while one change table can continue to feed current operational programs, the second one can drive a development environment that is trying to incorporate the new column data. Allowing the capture mechanism to populate both change tables in tandem means that a transition from one to the other can be accomplished without loss of change data. This can happen any time the two change data capture timelines overlap. When the transition is effected, the obsolete capture instance can be removed.

NoteThe maximum number of capture instances that can be concurrently associated with a single source table is two.

Relationship Between the Capture Job and the Transactional Replication Logreader

The logic for change data capture process is embedded in the stored procedure sp_replcmds, an internal server function built as part of sqlservr.exe and also used by transactional replication to harvest changes from the transaction log. When change data capture alone is enabled for a database, you create the change data capture SQL Server Agent capture job as the vehicle for invoking sp_replcmds. When replication is also present, the transactional logreader alone is used to satisfy the change data needs for both of these consumers. This strategy significantly reduces log contention when both replication and change data capture are enabled for the same database.

The switch between these two operational modes for capturing change data occurs automatically whenever there is a change in the replication status of a change data capture enabled database.

Important:Both instances of the capture logic require SQL Server Agent to be running for the process to execute.

The principal task of the change data capture capture process is to scan the log and write column data and transaction related information to the change data capture change tables. To ensure a transactionally consistent boundary across all the change data capture change tables that it populates, the capture process opens and commits its own transaction on each scan cycle. It detects when tables are newly enabled for change data capture, and automatically includes them in the set of tables that are actively monitored for change entries in the log. Similarly, disabling change data capture will also be detected, causing the source table to be removed from the set of tables actively monitored for change data. When processing for a section of the log is finished, the capture process signals the server log truncation logic, which uses this information to identify log entries eligible for truncation.

NoteWhen a database is enabled for change data capture, even if the recovery mode is set to simple recovery the log truncation point will not advance until all the changes that are marked for capture have been gathered by the capture process. If the capture process is not running and there are changes to be gathered, executing CHECKPOINT will not truncate the log.

The capture process is also used to maintain history on the DDL changes to tracked tables. The DDL statements that are associated with change data capture make entries to the database transaction log whenever a change data capture-enabled database or table is dropped or columns of a change data capture-enabled table are added, modified, or dropped. These log entries are processed by the capture process, which then posts the associated DDL events to the cdc.ddl_history table. You can obtain information about DDL events that affect tracked tables by using the stored procedure sys.sp_cdc_get_ddl_history.

Change Data Capture Agent Jobs: Two SQL Server Agent jobs are typically associated with a change data capture enabled database: one that is used to populate the database change tables, and one that is responsible for change table cleanup. Both jobs consist of a single step that runs a Transact-SQL command. The Transact-SQL command that is invoked is a change data capture

defined stored procedure that implements the logic of the job. The jobs are created when the first table of the database is enabled for change data capture. The Cleanup Job is always created. The capture job will only be created if there are no defined transactional publications for the database. The capture job is also created when both change data capture and transactional replication are enabled for a database, and the transactional logreader job is removed because the database no longer has defined publications.

Both the capture and cleanup jobs are created by using default parameters. The capture job is started immediately. It runs continuously, processing a maximum of 1000 transactions per scan cycle with a wait of 5 seconds between cycles. The cleanup job runs daily at 2 A.M. It retains change table entries for 4320 minutes or 3 days, removing a maximum of 5000 entries with a single delete statement.The change data capture agent jobs are removed when change data capture is disabled for a database. The capture job can also be removed when the first publication is added to a database, and both change data capture and transactional replication are enabled.

Internally, change data capture agent jobs are created and dropped by using the stored procedures sys.sp_cdc_add_job and sys.sp_cdc_drop_job, respectively. These stored procedures are also exposed so that administrators can control the creation and removal of these jobs.

An administrator has no explicit control over the default configuration of the change data capture agent jobs. The stored procedure sys.sp_cdc_change_job is provided to allow the default configuration parameters to be modified. In addition, the stored procedure sys.sp_cdc_help_jobs allows current configuration parameters to be viewed. Both the capture job and the cleanup job extract configuration parameters from the table msdb.dbo.cdc_jobs on startup. Any changes made to these values by using sys.sp_cdc_change_job will not take effect until the job is stopped and restarted.

Two additional stored procedures are provided to allow the change data capture agent jobs to be started and stopped: sys.sp_cdc_start_job and sys.sp_cdc_stop_job.

NoteStarting and stopping the capture job does not result in a loss of change data. It only prevents the capture process from actively scanning the log for change entries to deposit in the change tables. A reasonable strategy to prevent log scanning from adding load during periods of peak demand is to stop the capture job and restart it when demand is reduced.

Both SQL Server Agent jobs were designed to be flexible enough and sufficiently configurable to meet the basic needs of change data capture environments. In both cases, however, the underlying stored procedures that provide the core functionality have been exposed so that further customization is possible.

Change data capture cannot function properly when the Database Engine service or the SQL Server Agent service is running under the NETWORK SERVICE account. This can result in error 22832.

Security Model

This section describes the change data capture security model.

Configuration and Administration

To either enable or disable change data capture for a database, the caller of sys.sp_cdc_enable_db (Transact-SQL) or sys.sp_cdc_disable_db (Transact-SQL) must be a member of the fixed server sysadmin role. Enabling and disabling change data capture at the table level requires the caller of sys.sp_cdc_enable_table (Transact-SQL) and sys.sp_cdc_disable_table (Transact-SQL) to either be a member of the sysadmin role or a member of the database db_owner role.

To support the security model, a special change data capture user and change data capture schema that is owned by the change data capture database user are created in the database when change data capture is enabled. All change data capture objects that are not in the resource database are created in this schema, and owned by the change data capture user. This includes any gating roles that are created when a table is enabled for change data capture.

Use of the stored procedures to support the administration of change data capture jobs is restricted to members of the server sysadmin role and members of the db_owner role.

Change Enumeration and Metadata Queries

To gain access to the change data that is associated with a capture instance, the user must be granted select access to all the captured columns of the associated source table. In addition, if a gating role is specified when the capture instance is created, the caller must also be a member of the specified gating role. Other general change data capture functions for accessing metadata will be accessible to all database users through the public role, although access to the returned metadata will also typically be gated by using select access to the underlying source tables, and by membership in any defined gating roles.

DDL Operations to Change Data Capture Enabled Source Tables

When a table is enabled for change data capture, DDL operations can only be applied to the table by a member of the fixed server role sysadmin, a member of the database role db_owner, or a member of the database role db_ddladmin. Users who have explicit grants to perform DDL operations on the table will receive error 22914 if they try these operations.

Configuring Change Data Capture

To use change data capture to capture insert, update, and delete activity that is applied to SQL Server tables, it must first be configured. This section provides information about how to enable and disable change data capture.

Enabling Change Data Capture

This topic covers how to enable change data capture for a database and a table.

Enabling Change Data Capture for a Database

Before a capture instance can be created for individual tables, a member of the sysadmin fixed server role must first enable the database for change data capture. This is done by running the stored procedure sys.sp_cdc_enable_db (Transact-SQL) in the database context. To determine if a database is already enabled, query the is_cdc_enabled column in the sys.databases catalog view. When a database is enabled for change data capture, the cdc schema, cdc user, metadata tables, and other system objects are created for the database. The cdc schema contains the change data capture metadata tables and, after source tables are enabled for change data capture, the individual change tables serve as a repository for change data. The cdc schema also contains associated system functions used to query for change data.Change data capture requires exclusive use of the cdc schema and cdc user. If either a schema or a database user named cdc currently exists in a database, the database cannot be enabled for change data capture until the schema and or user are dropped or renamed.

See the Enable Database for Change Data Capture template for an example of enabling a database.

ImportantTo locate the templates in SQL Server Management Studio, go to View, click Template Explorer, and then select SQL Server Templates. Change Data Capture is a sub-folder. Under this folder, you will find all the templates referenced in this topic. There is also a Template Explorer icon on the SQL Server Management Studio toolbar.

Enable Database for CDC template

USE MyDBGOEXEC sys.sp_cdc_enable_dbGO

Enabling Change Data Capture for a Table: After a database has been enabled for change data capture, members of the db_owner fixed database role can create a capture instance for individual source tables by using the stored procedure sys.sp_cdc_enable_table. To determine whether a source table has already been enabled for change data capture, examine the is_tracked_by_cdc column in the sys.tables catalog view.

The following options can be specified when creating a capture instance:

Columns in the source table to be captured.

By default, all of the columns in the source table are identified as captured columns. If only a subset of columns need to be tracked, such as for privacy or performance reasons, use the @captured_column_list parameter to specify the subset of columns.

A filegroup to contain the change table.

By default, the change table is located in the default filegroup of the database. Database owners who want to control the placement of individual change tables can use the @filegroup_name parameter to specify a particular filegroup for the change table associated with the capture instance. The named filegroup must already exist. Generally, it is recommended that change tables be placed in a filegroup separate from source tables. See the Enable a Table Specifying Filegroup Option template for an example showing use of the @filegroup_name parameter.

Enable a Table Specifying Filegroup Option Template


EXEC sys.sp_cdc_enable_table@source_schema = N'dbo',@source_name = N'MyTable',@role_name = N'MyRole',@filegroup_name = N'MyDB_CT',@supports_net_changes = 1GO

A role for controlling access to a change table.

The purpose of the named role is to control access to the change data. The specified role can be an existing fixed server role or a database role. If the specified role does not already exist, a database role of that name is created automatically. Members of either the sysadmin or db_owner role have full access to the data in the change tables. All other users must have SELECT permission on all the captured columns of the source table. In addition, when a role is specified, users who are not members of either the sysadmin or db_owner role must also be members of the specified role.If you do not want to use a gating role, explicitly set the @role_name parameter to NULL. See the Enable a Table Without Using a Gating Role template for an example of enabling a table without a gating role.

-- ===================================================

-- Enable a Table Without Using a Gating Role template

-- ===================================================

USE MyDBGOEXEC sys.sp_cdc_enable_table@source_schema = N'dbo',@source_name = N'MyTable>',@role_name = NULL,@supports_net_changes = 1GO

A function to query for net changes. A capture instance will always include a table valued function for returning all change table entries that occurred within a defined interval. This

function is named by appending the capture instance name to "cdc.fn_cdc_get_all_changes_". For more information, see cdc.fn_cdc_get_all_changes_<capture_instance> (Transact-SQL).

If the parameter @supports_net_changes is set to 1, a net changes function is also generated for the capture instance. This function returns only one change for each distinct row changed in the interval specified in the call. For more information, see cdc.fn_cdc_get_net_changes_<capture_instance> (Transact-SQL).

To support net changes queries, the source table must have a primary key or unique index to uniquely identify rows. If a unique index is used, the name of the index must be specified using the @index_name parameter. The columns defined in the primary key or unique index must be included in the list of source columns to be captured. See the Enable a Table for All and Net Changes Queries template for an example demonstrating the creation of a capture instance with both query functions.


-- Enable a Table for All and Net Changes Queries template

-- =======================================================

USE MyDBGOEXEC sys.sp_cdc_enable_table@source_schema = N'dbo',@source_name = N'MyTable',@role_name = N'MyRole',@supports_net_changes = 1GO

NoteIf change data capture is enabled on a table with an existing primary key, and the @index_name parameter is not used to identify an alternative unique index, the change data capture feature will use the primary key. Subsequent changes to the primary key will not be allowed without first disabling change data capture for the table. This is true regardless of whether support for net changes queries was requested when change data capture was configured. If there is no primary key on a table at the time it is enabled for change data capture, the subsequent addition of a primary key is ignored by change data capture. Because change data capture will not use a primary key that is created after the table was enabled, the key and key columns can be removed without restrictions.

Disabling Change Data Capture

This topic describes how to disable change data capture for a database and a table.

Disabling Change Data Capture for a Database

A member of the sysadmin fixed server role can run the stored procedure sys.sp_cdc_disable_db (Transact-SQL) in the database context to disable change data capture for a database. It is not

necessary to disable individual tables before you disable the database. Disabling the database removes all associated change data capture metadata, including the cdc user and schema and the change data capture jobs. However, any gating roles created by change data capture will not be removed automatically and must be explicitly deleted. To determine if a database is enabled, query the is_cdc_enabled column in the sys.databases catalog view.

If a change data capture enabled database is dropped, change data capture jobs are automatically removed.

See the Disable Database for Change Data Capture template for an example of disabling a database.

ImportantTo locate the templates in SQL Server Management Studio, go to View, click Template Explorer, and then click SQL Server Templates. Change Data Capture is a sub-folder where you will find all the templates that are referenced in this topic. There is also a Template Explorer icon on the SQL Server Management Studio toolbar.

-- =================================

-- Disable Database for Change Data Capture template

-- =================================


EXEC sys.sp_cdc_disable_dbGO

Disabling Change Data Capture for a Table

Members of the db_owner fixed database role can remove a capture instance for individual source tables by using the stored procedure sys.sp_cdc_disable_table. To determine whether a source table is currently enabled for change data capture, examine the is_tracked_by_cdc column in the sys.tables catalog view. If there are no tables enabled for the database after the disabling takes place, the change data capture jobs are also removed.

If a change data capture-enabled table is dropped, change data capture metadata that is associated with the table is automatically removed.

See the Disable a Capture Instance for a Table template for an example of disabling a table.

-- ===============================================

-- Disable a Capture Instance for a Table template

-- ===============================================


EXEC sys.sp_cdc_disable_table@source_schema = N'dbo',@source_name = N'MyTable',@capture_instance = N'dbo_MyTable'GO

Using Change Data

Change data is made available to change data capture consumers through table-valued functions (TVFs). All queries of these functions require two parameters to define the range of Log Sequence Numbers (LSNs) that are eligible for consideration when developing the returned result set. Both the upper and lower LSN values that bound the interval are considered to be included within the interval.

Several functions are provided to help determine appropriate LSN values for use in querying a TVF. The function sys.fn_cdc_get_min_lsn returns the smallest LSN that is associated with a capture instance validity interval. The validity interval is the time interval for which change data is currently available for its capture instances. The function sys.fn_cdc_get_max_lsn returns the largest LSN in the validity interval. The functions sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time are available to help place LSN values on a conventional timeline. Because change data capture uses closed query intervals, it is sometimes necessary to generate the next LSN value in a sequence to ensure that changes are not duplicated in consecutive query windows. The functions sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn are useful when an incremental adjustment to an LSN value is required.

Validating LSN Boundaries

We recommend validating the LSN boundaries that are to be used in a TVF query before their use. Null endpoints or endpoints that lie outside the validity interval for a capture instance will force an error to be returned by a change data capture TVF.

For example, the following error is returned for a query for all changes when a parameter that is used to define the query interval is not valid, or is out of range, or the row filter option is invalid.

Msg 313, Level 16, State 3, Line 1

An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_all_changes_ ...

The corresponding error returned for a net changes query is the following:

Msg 313, Level 16, State 3, Line 1

An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_net_changes_ ...


It is recognized that the message for Msg 313 is misleading and does not convey the actual cause of the failure. This awkward usage stems from the inability to raise an explicit error from within a TVF. Nevertheless, the value of returning a recognizable, if inaccurate, error was deemed preferable to simply returning an empty result. An empty result set would not be distinguishable from a valid query returning no changes.

Authorization failures will return failures when querying for all changes, as shown:

Msg 229, Level 14, State 5, Line 1

The SELECT permission was denied on the object 'fn_cdc_get_all_changes_...', database 'MyDB', schema 'cdc'.

The same is true when querying for net changes:

Msg 229, Level 14, State 5, Line 1

The SELECT permission was denied on the object fn_cdc_get_net_changes_...', database 'MyDB', schema 'cdc'.

See the template Enumerate Net Changes Using TRY CATCH for a demonstration of how to intercept these known TVF errors and return more meaningful information about the failure.


 To locate change data capture templates in SQL Server Management Studio, on the View menu, click Template Explorer, expand SQL Server Templates and then expand the Change Data Capture folder.Query Functions

Depending on the characteristics of the source table being tracked and the way in which its capture instance is configured, either one or two TVFs for querying change data are generated.

The function cdc.fn_cdc_get_all_changes_<capture_instance> returns all changes that occurred for the specified interval. This function is always generated. Entries are always returned sorted, first by the transaction commit LSN of the change, and then by a value that sequences the change within its transaction. Depending on the row filter option chosen, either the final row is returned on update (row filter option "all") or both the new and old values are returned on update (row filter option "all update old"').

The function cdc.fn_cdc_get_net_changes_<capture_instance> is generated when the parameter @supports_net_changes is set to 1 when the source table is enabled.


 This option is only supported if the source table has a defined primary key or if the parameter @index_name has been used to identify a unique index.

The net changes function returns one change per modified source table row. If more than one change is logged for the row during the specified interval, the column values will reflect the final contents of the row. To correctly identify the operation that is necessary to update the target environment, the TVF must consider both the initial operation on the row during the interval and the final operation on the row. When the row filter option 'all' is specified, the operations that are returned by a net changes query will either be insert, delete, or update (new values). This option always returns the update mask as null because there is a cost associated with computing an aggregate mask. If you require an aggregate mask that reflects all changes to a row, use the 'all with mask' option. If downstream processing does not require inserts and updates to be distinguished, use the 'all with merge' option. In this case, the operation value will only take on two values: 1 for delete and 5 for an operation that could be either an insert or an update. This option eliminates the additional processing needed to determine whether the derived operation should be an insert or an update, and can improve the performance of the query when this differentiation is not necessary.

The update mask that is returned from a query function is a compact representation that identifies all columns that changed in a row of change data. Typically, this information is only required for a small subset of the captured columns. Functions are available to assist in extracting information from the mask in a form that is more directly usable by applications. The function sys.fn_cdc_get_column_ordinal returns the ordinal position of a named column for a given capture instance, whereas the function sys.fn_cdc_is_bit_set returns the parity of the bit in the provided mask based on the ordinal that was passed in the function call. Together, these two functions allow information from the update mask to be efficiently extracted and returned with the request for change data. See the template Enumerate Net Changes Using All With Mask for a demonstration of how these functions are used.

Query Function Scenarios

The following sections describe common scenarios for querying change data capture data by using the query functions cdc.fn_cdc_get_all_changes_<capture_instance> and cdc.fn_cdc_get_net_changes_<capture_instance>.

Querying for All Changes Within the Capture Instance Validity Interval

The most straightforward request for change data is one that returns all of the current change data in a capture instance’s validity interval. To make this request, first determine the lower and upper LSN boundaries of the validity interval. Then, use these values to identify the parameters @from_lsn and @to_lsn passed to the query function cdc.fn_cdc_get_all_changes_<capture_instance> or cdc.fn_cdc_get_net_changes_<capture_instance>. Use the function sys.fn_cdc_get_min_lsn to obtain the lower bound, and sys.fn_cdc_get_max_lsn to obtain the upper bound. See the template Enumerate All Changes for the Valid Range for sample code to query for all current valid

changes by using the query function cdc.fn_cdc_get_all_changes_<capture_instance>. See the template Enumerate Net Changes for the Valid Range for a similar example of using the function cdc.fn_cdc_get_net_changes_<capture_instance>.

Querying for All New Changes Since the Last Set of Changes

For typical applications, querying for change data will be an ongoing process, making periodic requests for all of the changes that occurred since the last request. For such queries, you can use the function sys.fn_cdc_increment_lsn to derive the lower bound of the current query from the upper bound of the previous query. This method ensures that no rows are repeated because the query interval is always treated as a closed interval where both end-points are included in the interval. Then, use the function sys.fn_cdc_get_max_lsn to obtain the high end-point for the new request interval. See the template Enumerate All Changes Since Previous Request for sample code to systematically move the query window to obtain all changes since the last request.

Querying for all New Changes Up Until Now

A typical constraint that is placed on the changes returned by a query function is to include only the changes that occurred between the previous request until the current date and time. For this query, apply the function sys.fn_cdc_increment_lsn to the @from_lsn value that was used in the previous request to determine the lower bound. Because the upper bound on the time interval is expressed as a specific point in time, it must be converted to an LSN value before it can be used by a query function. Before the datetime value can be converted to a corresponding LSN value, you must ensure that the capture process has processed all changes that were committed through the specified upper bound. This is required to ensure that all the qualifying changes have been propagated to the change table. One way to do this is to structure a wait loop that periodically checks to see if the current maximum commit lsn recorded for any database change table exceeds the desired end time of the request interval.

After the delay loop verifies that the capture process has already processed all the relevant log entries, use the function sys.fn_cdc_map_time_to_lsn to determine the new high end-point expressed as an LSN value. To ensure that all entries that were committed through the specified time are retrieved, call the function sys.fn_cdc_map_time_to_lsn, and use the option 'largest less than or equal'.


In periods of inactivity, a dummy entry is added to the table cdc.lsn_time_mapping to mark the fact that the capture process has processed the changes up to a given commit time. This prevents it from appearing that the capture process has fallen behind when there are simply no recent changes to process.

The template Enumerate All Changes Up Until Now demonstrates how to use the previous strategy to query for change data.

Adding a Commit Time to an All Changes Result Set

The commit time of each transaction with an associated entry in a database change table is available in the table cdc.lsn_time_mapping. By joining the __$start_lsn value returned in a request for all changes with the start_lsn value of a cdc.lsn_time_mapping table entry, you can return the tran_end_time along with the change data to stamp the change with the commit time of the transaction at the source. The template Append Commit Time to All Changes Result Set demonstrates how to perform this join.

Joining Change Data with Other Data from the Same Transaction

Occasionally, it is useful to join change data with other information gathered about the transaction when it committed at the source. The tran_begin_lsn column in the table cdc.lsn_time_mapping provides the information needed to perform such a join. When the update of the source occurs, the value for database_transaction_begin_lsn from the system dynamic view sys.dm_tran_database_transactions must be saved along with any other information to be joined with the change data. Use the function fn_convertnumericlsntobinary to compare the database_transaction_begin_lsn and tran_begin_lsn values. The code to create this function is available in the template Create Function fn_convertnumericlsntobinary. The template Return All Changes with a Given tran_begin_lsn demonstrates how to effect the join.

Querying Using Datetime Wrapper Functions

A typical application scenario for querying for change data is to periodically request change data by using a sliding window bounded by datetime values. For this class of consumers, change data capture provides the stored procedure sys.sp_cdc_generate_wrapper_function that generates scripts to create custom wrapper functions for the change data capture query functions. These custom wrappers allow the query interval to be expressed as a datetime pair.

Calling options for the stored procedure allow for wrappers to be generated for all capture instances that the caller has access to, or only a specified capture instance. Supported options also include the ability to specify whether the high end-point of the capture interval should be open or closed, which of the available captured columns should be included in the result set and which of the included columns should have associated update flags. The procedure returns a result set with two columns: the generated function name, which is derivable from the capture instance name, and the create statement for the wrapper stored procedure. The function to wrap the all changes query is always generated. If the @supports_net_changes parameter was set when the capture instance was created, the function to wrap the net changes function is also generated.

It is the responsibility of the application designer to call the script generation stored procedure to generate the create statements for the wrapper stored procedures, and to execute the resulting create scripts to create the functions. This does not occur automatically when a capture instance is created.

Datetime wrappers are owned by the user, and not are created in the default schema of the caller. The generated function is suitable without modification for most users. However, further customization can always be applied to the generated script prior to creating the function.

The name of the function to wrap the all changes query is fn_all_changes_ followed by the capture instance name. The prefix that is used for the net changes wrapper is fn_net_changes_. Both functions take three arguments, just as their associated change data capture TVFs do. However, the query interval for the wrappers is bounded by two datetime values instead of than by two LSN values. The @row_filter_option parameter for both sets of functions are the same.

The generated wrapper functions support the following convention for systematically walking the change data capture timeline: It is expected that the @end_time parameter of the previous interval be used as the @start_time parameter of the subsequent interval. The wrapper function takes care of mapping the datetime values to LSN values and ensuring that no data is lost or repeated if this convention is followed.

The wrappers can be generated to support either a closed upper bound or an open upper bound on the specified query window. That is, the caller can specify whether entries having a commit time equal to the upper bound of the extraction interval are to be included within the interval. By default, the upper bound is included.

While the generated query TVFs fail if supplied a null value for either the @from_lsn value or the @to_lsn value, the datetime wrapper functions use null to allow the datetime wrappers to return all current changes. That is, if null is passed as the low end-point of the query window to the datetime wrapper, the low end point of the capture instance validity interval is used in the underlying SELECT statement that is applied to the query TVF. Similarly, if null is passed as the high end-point of the query window, the high end-point of the capture instance validity interval is used when selecting from the query TVF.

The result set returned by a wrapper function includes all the requested columns followed by an operation column, recoded as one or two characters to identify the operation that is associated with the row. If update flags have been requested, they appear as bit columns after the operation code, in the order specified in the @update_flag_list parameter. For information about the calling options for customizing the generated datetime wrappers, see sys.sp_cdc_generate_wrapper_function (Transact-SQL).

The template Instantiate a Wrapper TVF With Update Flag shows how to customize a generated wrapper function to append an update flag for a specified column to the result set returned by a net changes query. The template Instantiate CDC Wrapper TVFs for a Schema shows how to instantiate the Datetime Wrappers for the Query TVFs for all of the capture instances created for the source tables in a given database schema.

For an example that uses a datetime wrapper to query for change data, see the template Get Net Changes Using Wrapper With Update Flags. This template demonstrates how to query for net changes with a wrapper function when the wrapper is configured to return update flags. Note that the row filter option 'all with mask' is required for the underlying query function to return a non-

null update mask on update. Null values are passed for both the lower and upper datetime interval boundaries to signal the function to use the low end point and the high end point of the validity interval for the capture instance when performing the underlying LSN based query. The query returns one row for each modification to a source row that occurred within the valid range for the capture instance.

Using the Datetime Wrapper Functions to Transition Between Capture Instances

Change data capture supports up to two capture instances for a single tracked source table. The principal use of this capability is to accommodate a transition between multiple capture instances when data definition language (DDL) changes to the source table expand the set of available columns for tracking. When transitioning to a new capture instance, one way to protect higher application levels from changes in the names of the underlying query functions is to use a wrapper function to wrap the underlying call. Then, ensure that the name of the wrapper function remains the same. When the switch is to occur, the old wrapper function can be dropped, and a new one with the same name created that references the new query functions. By first modifying the generated script to create a wrapper function of the same name, you can make the switch to a new capture instance without affecting higher application layers.