Making use of SQL Server JSON for change tracking

Tracking data changes in SQL Server, especially in SQL Azure, can be a bit of a pain. Thanks to new JSON Data support, there's another option which is simple, maintainable and most of all flexible.

All I want to do is track data changes!

A viable change tracking solution for my purposes needed to:

  • track data changes over time,
  • work on SQL Azure,
  • work on an existing schema with minimal changes
  • and be simple to maintain as table structures change over the system's lifetime.

As always, there are a number of options for this in SQL Server such as Change Data Capture (CDC), Change Tracking, and Temporal Tables

If you're working with Azure SQL, however, CDC and Change Tracking are not supported, so they are out.

Temporal Tables are a new addition to SQL Server 2016 and are supported by Azure SQL. Temporal Tables allow SQL Server to take care of putting data into your designated history table as and when a record is altered.

There are however a couple of stipulations about what a Temporal Table's structure should be.

Any tracked table must have a startdate and enddate attribute on it. When working with an existing schema, this didn't appeal.

You must also maintain the history table structure yourself too, which must mirror the attributes of the source table. Especially when using SQL Server Data Tools (SSDT), this can be a real pain point.

Were those the only options I had?

What about using SQL Server JSON?

There is another option though it is bit more of a manual effort. SQL Server has introduced support for JSON Data.

Consider the following:

SELECT  
    FirstName, 
    LastName 
FROM  
    crm.Customer
FOR JSON AUTO  

Running this will result in a JSON string similar to:

[
    {
        "FirstName": "John",
        "LastName": "Smith"
    }
]

Wow, great you say (with a hint of sarcasm). Well, what if we took advantage of this in a trigger on a table, and simply used it to serialise the record being changed.

Ah, now we're talking!

Change tracking using JSON

Within a trigger, we have access to two special tables: DELETED and INSERTED. These tables hold the before and after state of the record that is being changed. So, if we serialise and store those as JSON data in a table, we can track changes over time.

Let's get to an example. Consider that we have a Customer table and we want to start tracking the data changes that are made.

We'll need a history table to store the before and after state of the table. Something like this will do:

CREATE TABLE [audit].[CustomerHistory]  
(
    [Id] BIGINT NOT NULL IDENTITY(1,1),
    [Before] NVARCHAR(MAX) NULL,                
    [After] NVARCHAR(MAX) NULL,     
    [CreatedOn] DATETIMEOFFSET(7) NOT NULL,
    CONSTRAINT [PK_audit.History] PRIMARY KEY (Id)
)

It's then simply a case of creating a trigger which can serialise the DELETED and INSERTED tables to JSON:

CREATE TRIGGER crm.AuditCustomer  
    ON crm.Customer FOR INSERT, UPDATE, DELETE AS
BEGIN  
    SET NOCOUNT ON

    DECLARE @beforeJson NVARCHAR(MAX),
            @afterJson NVARCHAR(MAX)

    SET @beforeJson = (SELECT * FROM DELETED FOR JSON AUTO)
    SET @afterJson = (SELECT * FROM INSERTED FOR JSON AUTO)

    INSERT INTO [audit].[CustomerHistory]
    (                   
        [Before],
        [After],        
        CreatedOn
    )
    SELECT      
        @beforeJson AS [Before],
        @afterJson AS [After],      
        SYSDATETIMEOFFSET()
END  

As we make changes to the Customer table, records will be made to the CustomerHistory table too:

Screenshot showing JSON serialised before and after states

Making a query on the history table

That's great, we've now got a history of all changes made to the table. So how would we get the data out to actually make it useful?

By using OPENJSON we can serialise back out into table:

SELECT  
    h.*,    
    ch.CreatedOn
FROM  
    [audit].[CustomerHistory] ch
    CROSS APPLY OPENJSON(ch.[After], '$[0]')
    WITH (
            Id             int             '$.Id' ,
            FirstName      varchar(200)    '$.FirstName' ,
            Surname        varchar(200)    '$.Surname'         
            -- Further Parts of the schema can go here
    ) h
WHERE  
    h.Id = 1
ORDER BY  
    ch.CreatedOn DESC

So what we're doing above is, for each CustomerHistory record, use OPENJSON on the After attribute to format it into a table with columns for FirstName and Surname.

Running this gives us something like the following, showing the current state first, going back in time:

How about when the table schema changes over time?

The best part is that we're protected from potential issues from schema changes over time.

Let's say that we introduce a CustomerNumber attribute to the table. Well, none of our previous history records have that attribute, but that doesn't matter.

In fact, we can go ahead right now and change our query to include it (any other non-existent attribute) to the OPENJSON clause.

SELECT  
    h.*,    
    ch.CreatedOn
FROM  
    [audit].[CustomerHistory] ch
    CROSS APPLY OPENJSON(ch.[After], '$[0]')
    WITH (
            Id             int             '$.Id' ,
            FirstName      varchar(200)    '$.FirstName' ,
            Surname        varchar(200)    '$.Surname',
            CustomerNumber varchar(25)     '$.CustomerNumber',         
            SomeOtherField varchar(255)    '$.SomeOtherField'
    ) h
WHERE  
    h.Id = 1
ORDER BY  
    ch.CreatedOn DESC

Running this gives us the result shown below. Note that we haven't added a CustomerNumber field to the table yet, so none of the JSON will have that as a property. This is highlighted by the SomeOtherField column I also added.

Screenshot showing extra columns safely showing null

If we now add the CustomerNumber attribute to the table and update a record, the most recent history record's JSON, will include a CustomerNumber property.

Re-running the query gives the following result:

Screenshot showing CustomerNumber column showing data

To me, that is just simply awesome! It means I just don't have to worry about how my schema changes may affect the history records as and when they happen.

That's it in a nutshell

It's a simple approach, if not a little manual at first glance.

The advantages that JSON provides us here are huge:

  • It works nicely with SQL Server Data Tools (SSDT) as you simply modify the schema as normal.
  • No schema maintenance on the history table as new attributes are added to the source table. In fact, with a little more effort, the same "history" table can be used for all tables should you wish.
  • No changes are needed to the table structure of tracked tables, it just needs a trigger.
  • We're not limited in any way in how to query the history data; we can always still get it out and query if we need to.

The code above is just for demonstration purposes, but hopefully, you can see how this could be improved for your situation.