Create a Slowly Changing Dimension Type 2 with T-SQL

Creating a slowly changing dimension (SCD) is an essential part of Data Warehousing. There are many tools for the job but I prefer to do the work using T-SQL against the database rather than using something like SSIS.

This will generally involve staging the data from source and then merging the source copy against target.

Source > Stage > Target

In this post I use a sample data set and go through how to accomplish this using a merge statement in a sub query with an output clause.

I’ve created the below with T-SQL against SQL Server 2016.

Example Data Set

Lets say we have our sample data of Imperial employees from the Star Wars universe.

You can follow along with the below SQL.

DROP TABLE IF EXISTS dbo.Employee;
GO

CREATE TABLE dbo.Employee 
	(
	 EmployeeId			int IDENTITY(1,1)	NOT NULL
	,Forename			varchar(100)		NOT NULL
	,Surname			varchar(250)		NOT NULL
	,CurrentLocation	varchar(100)		NOT NULL
	,CurrentStatus		varchar(50)			NOT NULL

	CONSTRAINT [PK_dbo_Employee_EmployeeId] PRIMARY KEY CLUSTERED (EmployeeId)
	);
GO

INSERT INTO dbo.Employee (Forename, Surname, CurrentLocation, CurrentStatus)
VALUES
	 ('Emperor', 'Palpatine' , 'Coruscant', 'Active')
	,('Darth', 'Vader' , 'Mustafar', 'Active')
	,('Wilhuff', 'Tarkin' , 'Death Star', 'Active')
	,('Orson', 'Krennic' , 'Scarif', 'Active')
	,('Galen', 'Erso' , 'Eadu', 'Active')
GO

The SCD Type 2 Table

I now need my target SCD Type 2 table that captures the changes through a start and end date range. Every-time a row is changed in our sample data a new record needs to be inserted into the target. The only exception to this will be when a row is deleted.

DROP TABLE IF EXISTS dbo.dimEmployeeT2;		
GO
CREATE TABLE dbo.dimEmployeeT2 
	(
	 EmployeeKey		int IDENTITY(1,1)	NOT NULL
	,EmployeeId			int					NOT NULL
	,Forename			varchar(100)		NOT NULL
	,Surname			varchar(250)		NOT NULL
	,CurrentLocation	varchar(100)		NOT NULL
	,CurrentStatus		varchar(50)			NOT NULL
	,_$DateFrom			datetime2			NOT NULL
	,_$DateTo			datetime2			NULL
	,_$IsCurrent		bit					NOT NULL
	,_$IsDeleted		bit					NOT NULL

	CONSTRAINT [PK_dbo_Employee_EmployeeKey] PRIMARY KEY CLUSTERED (EmployeeKey)
	);
GO

You’ll notice that we have a column for identifying when a record has been deleted, _$IsDeleted. When a row is deleted we will mark it with an end date and and set _$IsDeleted to true.

The table also contains another boolian column _$IsCurrent for reconigsing when a row is the most current record.

You’ll also notice that we have a surrogate key named EmployeeKey with IDENTITY and the business key EmployeeId so we can join source and target rows in our merge statement.

The SCD SQL Logic

This SQL logic is long so I’m going to break it down into parts as I explain but first here it is in its entirety. You can run and create this procedure in the same db as your test tables from above.

DROP PROCEDURE IF EXISTS dbo.uspEtlDimEmployeeT2
GO

CREATE PROCEDURE dbo.uspEtlDimEmployeeT2
AS
BEGIN TRY

	/*******************************
	* Title:	Merge dbo.dimEmployeeT2
	* SCD Type:	Type 2
	********************************/
	
	SET NOCOUNT ON;

	-- Variables
	DECLARE  @Date	datetime2  = GETDATE()

	-- Insert changed record.
	INSERT INTO dbo.dimEmployeeT2 -- Target
		(
		EmployeeId
		,Forename
		,Surname
		,CurrentLocation
		,CurrentStatus
		,_$DateFrom
		,_$IsCurrent
		,_$IsDeleted
		)
	SELECT
		 M.EmployeeId
		,M.Forename
		,M.Surname
		,M.CurrentLocation
		,M.CurrentStatus
		,@Date
		,1 -- IsCurrent
		,0 -- IsDeleted
	FROM (
		-- Merge
		MERGE dbo.dimEmployeeT2 AS T -- Target
		USING dbo.Employee AS S -- Source
			-- Join on business key.
		   ON T.EmployeeId = S.EmployeeId 
		WHEN MATCHED AND
			-- Match the record on the latest (IsCurrent) and there being a diference.
			_$IsCurrent = 1
			AND
			EXISTS
			( -- Column values are different.
				SELECT T.Forename ,T.Surname ,T.CurrentLocation ,T.CurrentStatus
				EXCEPT
				SELECT S.Forename ,S.Surname ,S.CurrentLocation ,S.CurrentStatus
			)
		   THEN UPDATE SET
			 _$DateTo		= @Date
			,_$IsCurrent	= 0
		-- Insert new rows.
		WHEN NOT MATCHED BY TARGET
			THEN INSERT
			(
			 EmployeeId
			,Forename
			,Surname
			,CurrentLocation
			,CurrentStatus
			,_$DateFrom
			,_$IsCurrent
			,_$IsDeleted
			)
			VALUES
			(
			 S.EmployeeId
			,S.Forename
			,S.Surname
			,S.CurrentLocation
			,S.CurrentStatus
			,@Date
			,1
			,0
			)
		-- When deleted update deleted flag. 
		WHEN NOT MATCHED BY SOURCE 
			AND _$IsCurrent = 1
			THEN UPDATE SET
			 _$DateTo		= @Date
			,_$IsCurrent	= 0
			,_$IsDeleted	= 1
		-- Output the rows that have been updated in order to insert in outer query.
		OUTPUT $ACTION AS ChangeType
			,S.EmployeeId
			,S.Forename
			,S.Surname
			,S.CurrentLocation
			,S.CurrentStatus
		) AS M
	WHERE 
		-- Return rows that where updated inside the MERGE sub query.
		M.ChangeType = 'UPDATE'
		AND EmployeeId IS NOT NULL -- Don't include deleted rows.

END TRY

BEGIN CATCH
	THROW
END CATCH
GO

The Merge Statement

Firstly lets start with the merge statement. The merge is inside the sub query and essentially it’s joining on the business key and looking for differences. The logic works like this:

  • If the row has changed then mark it as no longer current and add an end date.
  • If a row is new (exists in the source but not target) then insert a new row with a current start date and no end date. Mark as the most current record.
  • If a row has been deleted (exists in target but not in source) then mark it as deleted and add an end date.

The Output Clause

Output clauses are useful for archiving, logging changes or throwing up messages from results to an application layer. In this instance we want to use it as if we’re archiving. So the procedure outputs changes from the sub query and in the SELECT we insert the new changes into the target table.

Lets Test the SCD

So now we have all our SQL objects ready lets run the procedure and check the results.

EXEC dbo.uspEtlDimEmployeeT2
GO

SELECT * FROM dbo.dimEmployeeT2
GO

Excellent, we have all our records and they’re the most current. But what happens when Galen Erso runs away from the Empire before Rogue One? Run the following to see.

-- Galen runs away.
UPDATE dbo.Employee
	SET CurrentStatus = 'Missing'
WHERE Surname = 'Erso'
GO
-- Update the SCD.
EXEC dbo.uspEtlDimEmployeeT2
GO

SELECT * FROM dbo.dimEmployeeT2
GO

As you can see we now have two records for Erso. The first one is no longer marked as active and has an end date. The second row is a reflection of the updated row in the source table.

This time lets run through some more changes with the below…

-- Krennic and Erso don't make it.
UPDATE dbo.Employee
	SET CurrentStatus = 'KIA'
WHERE Surname IN ('Krennic','Erso')
GO

EXEC dbo.uspEtlDimEmployeeT2
GO
-- Erso gets deleted from imperial records.
DELETE FROM dbo.Employee WHERE Surname = 'Erso'
GO

EXEC dbo.uspEtlDimEmployeeT2
GO

SELECT * FROM dbo.dimEmployeeT2
GO

Now we can see an example of further updates and a deletion.

So there you go. Hope that’s helpful!

For finding out the useful applications of Slowly Changing Dimensions I highly recommend reading The Data Warehouse Toolkit by Ralph Kimball and Margy Ross.