Wednesday 23 February 2011

Delta change script for SQL Server

One of the things I have been working on recently (and finally cracked today) is a way of comparing two record sets and finding if anything has changed. This is handy if you want to integrate data from one system into another and need a constant feed of data. Obviously you don't want to be updating every single record every time; as soon as you get into thousands of records there's a big performance hit and it's slow.

I found this script on a website somewhere but can't remember the original source:

SELECT
refno, digit, pro_refno, par_refno,
title, forename, surname, nino, DOB, CorrelationID
FROM
tblDataOld
WHERE
CorrelationID IN(
SELECT DISTINCT
CorrelationId
FROM (
SELECT
*
FROM
tblDataOld
UNION ALL
SELECT
*
FROM
tblDataNew)
AS
xxx
GROUP BY
refno, digit, pro_refno, par_refno,
title, forename, surname, nino, DOB, correlationid
HAVING
COUNT(*) = 1)

You can see that I'm pulling data from two tables - tblDataOld and tblDataNew. The idea is that you have an out of date data set to compare against a brand new one. Replace my column names with your own and the script will show you all the rows that have a change. The correlation ID needs to be unique for each row; what I did was concatenate the first four fields together. In the dataset that I have used these are all numbers and I'm pretty certain that if you lump the four together it's unique from any other combination.

A note about the method too; originally I had three tables - Old, New and Delta. I would compare Old and New and put the changes into Delta to be processed. This does work but I found that if something happened in the job - one of the tables got truncated but not rebuilt properly - that the next time it ran it would see everything as a change and try to process all my records. As I have said above - not good. To get around this my tblDataNew is a temporary table that I'm populating for the duration of the identification process. Once I'm done I drop the table. Also, I don't use a table for the Deltas, I run the script inside an ASP.NET application and process each Delta in memory using the SqlDataReader object. I only need to store one table - tblDataOld - in my local database and I repopulate this at the end of the process so new becomes old.

It's much neater and I like the fact that it has a small footprint. If you search Google for Delta scripts you get a lot of links to applications that will identify changes for you but for something as simple as this one SQL script will suffice.

If anyone is interested in the full application let me know and I can mail you the source code; it includes all the steps for building the temporary table, processing the changes and tidying up at the end.

No comments: