Please do more research before implementing the following.
CDC Implementation - Prerequisites
- SQL Server Agent Service should be running in order to get notified of the CDC actions
- Enable CDC for database by a member of the sysadmin fixed server role·
- Enable CDC for individual table by the members of the db_owner fixed database role
CDC Implementation - Steps
1. Enable CDC for the database
use <Database>
GO
exec sys.sp_cdc_enable_db
GO
A new column is_cdc_enabled is added to sys.databases table. To check whether the CDC has been enabled for a database: select is_cdc_enabled from sys.databases
where name = ‘<Database>
--1GO
Once CDC has been enabled for a database, a new schema named ‘cdc' and a new user named ‘cdc' will be created. Also, following 5 new tables will be created
Contains one row for each column tracked in a capture instance. By default, all columns of the source table are captured. However, columns can be included or excluded when the source table is enabled for change data capture by specifying a column list.
Contains one row for each change table in the database.
Contains schema modifications to source table
Contains one row for each index column associated with a change table. The index columns are used by change data capture to uniquely identify rows in the source table. By default, the columns of the primary key of the source table are included. However, if a unique index on the source table is specified when change data capture is enabled on the source table, columns in that index are used instead. A primary key or unique index is required on the source table if net change tracking is enabled.
Contains one row for each transaction having rows in a change table. This table is used to map between log sequence number (LSN) commit values and the time the transaction committed. Entries may also be logged for which there are no change tables entries. This allows the table to record the completion of LSN processing in periods of low or no change activity.
2. Enable CDC for the table
exec sys.sp_cdc_enable_table
@source_schema = 'dbo' --mandatory
, @source_name = 'Employee' --mandatory
, @role_name = 'cdc_test' --mandatory
, @supports_net_changes = 1
, @index_name = N'PK_Employee_EmployeeID'
, @captured_column_list = N'EmployeeID,EmpName, EmpCity'
, @filegroup_name = N'PRIMARY';
GO
@source_schema = Is the name of the schema to which the source table belongs. source_schema is sysname, with no default, and cannot be NULL.
@source_name = Is the name of the source table on which to enable change data capture. source_name is sysname, with no default, and cannot be NULL.
source_name must exist in the current database. Tables in the cdc schema cannot be enabled for change data capture.
@role_name = Is the name of the database role used to gate access to change data. role_name is sysname and cannot be NULL. If the role currently exists, it is used. If the role does not exist, an attempt is made to create a database role with the specified name.
@supports_net_changes = Indicates whether support for querying for net changes is to be enabled for this capture instance.
supports_net_changes is bit with a default of 0. If 0 or if not specified, only support for querying for all changes is allowed. That is, only the system functions to query for all changes are created.
If 1, the system function to query for net changes is created in addition to the system function to query for all changes. If supports_net_changes is set to 1, index_name must be specified, or the source table must have a defined primary key. The columns defined in index_name or in the primary key are used to uniquely identify a row in the source table. These columns are added to the table cdc.index_columns.
@index_name = The name of a unique index to use instead of the primary key index to uniquely identify rows in the source table. index_name is sysname and can be NULL. index_name must be a valid unique index on the source table.
@captured_column_list = Identifies the source table columns that are to be included in the change table. captured_column_list is nvarchar(max) and can be NULL. If NULL, all columns are included in the change table. Column names must be valid columns in the source table. Columns defined in a primary key index, or columns defined in index_name must be included.
captured_column_list is a comma-separated list of column names. Individual column names within the list can be optionally quoted with either 'or []. If a column name contains an embedded comma, the column name must be quoted.
captured_column_list cannot contain the following reserved column names: __$start_lsn, __$end_lsn, __$seqval, __$operation, and __$update_mask as these are the metadata columns to allow interpretation of the change activity in the ‘change table'.
captured_column_list cannot contain columns defined with a data type introduced in SQL Server 2008.
@capture_instance = Is the name that you assign to this particular CDC instance; you can have up two instances for a given table. (default:schema_sourcetable)
@filegroup_name = Is the filegroup to be used for the change table created for the capture instance. filegroup_name is sysname and can be NULL. filegroup_name must exist in the current database. If NULL, the default filegroup is used. A separate filegroup is recommended for creating change data capture change tables keeping the speed of data retrieval in the view.
To check whether the CDC has been enabled for a table:
select is_tracked_by_cdc from sys.tables
where name = ‘<tablename>'
Note: If you make any schema changes to the table, you need to disable CDC at the table level and then re-enable it in order to capture changes to the data for the new columns.
To disable CDC on a table -
exec sys.sp_cdc_disable_table
@source_schema = 'dbo',
@source_name = 'Employee',
@capture_instance = 'dbo_Employee'
(or 'all', if you have more than one capture_instance name)
SQL Agent Jobs:
When the first table in a database has been enabled for change data capture, two new jobs are created and started by SQL Server Agent, as mentioned below -
Job 'cdc.TestDb_capture' started successfully.
Job 'cdc.TestDb_cleanup' started successfully.
A capture job is created using the default values when the first table in the database is enabled for change data capture and no transactional publications exist for the database. When a transactional publication exists, the transactional log reader is used to drive the capture mechanism, and this separate capture job is neither required nor allowed.
The cleanup job is created using the default values when the first table in the database is enabled for change data capture.
cdc.TestDb_capture job:
This job scans the database transaction log to pickup changes to the tables that have CDC enabled. The first step of the job is to raise an error event to the client indicating the start of the change data capture session using the T-SQL command RAISERROR(22801, 10, -1) and queries sys.messages for id = 22801. Once this is done, it executes the system stored procedure [sp_MScdc_capture_job] to start the Change Data Capture Collection Agent.
cdc.TestDb_cleanup job:
This SQL Server job purges the change tables periodically by executing the system stored procedure sys.sp_MScdc_cleanup_job.
Please note that the capture job should be kept running in order to capture changes to the source tables and the cleanup job is scheduled to run at regular intervals to remove the data from the change table. If the SQL Server agent or the capture job is stopped for any reason, all the changes that happened during this period will get captured from the transaction log when the job is started again.
Audit (Change Data) Table
When a table has CDC enabled, a new change table is created which corresponds to the main table. When the name of the change table is not specified at the time the source table is enabled, the table is named as: cdc.capture_instance_CT where capture_instance is the combination of schema name of the source table and the source table name in the format schema_table. (Ex: cdc.dbo_Employee_CT).
This table contains some book keeping columns such as __$start_lsn, __$end_lsn (always populates with NULL; may be there for future use), __$seqval, __$operation, and __$update_mask.
The column __$operation has various codes as shown below -
Insert - 2
Update (row with before image) - 3
Update (row with after image) - 4
Delete - 1
Data consumers access change data through table-valued functions (fn_cdc_get_all_changes_<capture_instance> & fn_cdc_get_net_changes_<capture_instance>) rather than by querying the change tables directly.
In general, querying for all changes is more efficient than querying for net changes. If, for example, you know that the change data will consist entirely of inserts, a query for all changes using the 'all' row filter option and a query for net changes using the 'all' or 'all with mask' option will return essentially the same resultset. The performance of the cdc.fn_cdc_get_all_changes_<capture_instance> function, however, is likely to be significantly better. If a primary key or unique index is not defined for a table, only queries using the cdc.fn_cdc_get_all_changes_<capture_instance> function are supported.
The function cdc.fn_cdc_get_net_changes_<capture_instance> returns only one change per row. If multiple changes are logged for the row during the request interval, the column values will reflect the final contents of the row.
Data is requested for changes having commit log sequence numbers (LSNs) that lie within a specified range. If a source row had multiple changes during the interval, each change is represented in the returned result set. In addition to returning the change data, four metadata columns provide the information you need to apply the changes to another data source. Row filtering options govern the content of the metadata columns as well as the rows returned in the result set. When the 'all' row filter option is specified, each change has exactly one row to identify the change. When the 'all update old' option is specified, update operations are represented as two rows: one containing the values of the captured columns before the update and another containing the values of the captured columns after the update.
For further details refer MSDN link below
http://msdn.microsoft.com/en-us/library/cc627369.aspx