SQL Diff Query Comparing Two Tables

In this example, we’re comparing the dataset on our local machine to one on Azure, so we need to use a linked server to connect Azure to the SQL install on our local machine.  You could easily adapt this technique to two similar tables in the same database however.

The Components

  1. We need a linked server, because the queries are all run locally but will need to both the remote Azure server, and the local server.
  2. We use a query to load a copy of the data we want to compare from Azure into a local temp table.
  3. We also use a query to load a copy of the data we want to compare from the local server into a local temp table.
  4. The queries only include data fields we want to compare. We’re stripping out things like id’s because we just want to see which rows have changed data. If we include id fields, then adding one row would cause hundreds to change.
  5. Finally is the actual “diff” query, which actually looks pretty simple.

Part 1: Drop If Exists

We want to drop the temp tables if they exist, so that we are sure we are comparing the most recent copies of data.

IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like ‘#remote_temp%’) BEGIN
DROP TABLE #remote_temp;
END;
IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like ‘#local_temp%’) BEGIN
DROP TABLE #local_temp;
END;

Part 2: Create Temp Tables

To make a temp table you need to define the data types for each of the columns.  Only make columns for the data fields you are comparing and selecting in the next part, not id columns.

CREATE TABLE #local_temp
(
Report_Name NVARCHAR(255),
Report_Label NVARCHAR(255),
…etc…
);

CREATE TABLE #remote_temp
(
Report_Name NVARCHAR(255),
Report_Label NVARCHAR(255),
…etc…
);

Part 3: Populate the Temp Tables

I’m not going to reproduce the queries here, since they are long and specific to my dataset.

The interesting bit here is that the queries are the same, except the FROM statement. When selecting from Azure, we are using our linked server code (Creating Linked Azure Server) so that we just need to refer to the table name with “azure.” in front, and we’ll select the data from Azure instead.

INSERT INTO #local_temp
SELECT
l_report.report_name AS Report_Name,
l_report.report_label AS Report_Label,
…etc…
FROM isnap.dbo.report AS l_report
WHERE blahblah = ‘blah’;

INSERT INTO #remote_temp
SELECT
r_report.report_name AS Report_Name,
r_report.report_label AS Report_Label,
…etc…
FROM azure.isnap.dbo.report AS r_report
WHERE blahblah = ‘blah’;

Part 4: The Diff Query

The query is simple, we just take:

  • The local data set minus the remote data set, and name it local (rows that are only on the local data set)
  • The remote data set minus the local data set, and name it remote (rows that are only on the remote data set)
  • and union those two things together so when we look at the results of the diff query:
    • If a record is deleted, a copy of the record will show up with Stage = Remote only, no row for Local
    • If a record is being added, a copy of the record will show up with Stage = Local only, no row for Remote
    • If a record is being updated, there will be two rows, one with Local (the new version) and one with Remove (the old version)
(SELECT
     'Local' as Stage, *
     FROM (
          (SELECT * FROM #local_temp)
          EXCEPT
          (SELECT * FROM #remote_temp)
     ) OnLocal)
UNION
(SELECT
     'Remote' as Stage, *
     FROM (
          (SELECT * FROM #remote_temp)
          EXCEPT
          (SELECT * FROM #local_temp)
     ) OnRemote)
ORDER BY Report_Name, Category_Name, Subcategory_Name, Description_Name, Subdescription_Name, Year_Name, Stage

Leave a Reply

Your email address will not be published. Required fields are marked *