Home > Uncategorized > Deal with NULLs and Uniqueness in Microsoft SQL Server

Deal with NULLs and Uniqueness in Microsoft SQL Server

How do you apply uniqueness to only non-NULL values? Find out the answer here.

As an example lets say you have a field in a table that requires a unique constraint.  In this row however you have some records with a NULL value.  With SQL server you can not directly apply a unique constraint the this row because of the NULL records that exist.

 

This particular issue of uniqueness around NULLs is somewhat controversial within the SQL community. Because NULL = NULL is false, how can two NULLs not be unique? In the case of SQL Server, as you’ve discovered, two NULLs are not considered unique, but in other databases, notably Oracle, they are. Be that as it may, there are a couple of ways to deal with this problem. I’ll show you both.

Just to set the stage clearly for the problem, here’s a table:

CREATE TABLE Employee (
EmployeeID INT IDENTITY NOT NULL,
LastName VARCHAR(50) NOT NULL,
FirstName VARCHAR(50) NOT NULL,
SSN CHAR(11) NULL,
CONSTRAINT PK_Employee PRIMARY KEY (EmployeeID));

CREATE UNIQUE INDEX IDX_Employee_SSN
ON Employee (SSN);

Notice that I’ve created a unique index on the SSN. So now insert some rows:

INSERT INTO Employee (LastName, FirstName, SSN)
VALUES (‘Forte’, ‘Steve’, NULL);

INSERT INTO Employee (LastName, FirstName, SSN)
VALUES (‘Campbell’, ‘Richard’, NULL);

The second insert fails because the second null SSN violates the uniqueness constraint.

The first solution I have to this is to use a view to get rid of the nulls, and then index the view. So the first step is to remove the unique index I created earlier, then create a view and put a unique index on the view:

DROP INDEX IDX_Employee_SSN ON dbo.Employee;

CREATE VIEW Employee_SSN_NOT_NULL
WITH SCHEMABINDING AS
SELECT LastName, FirstName, SSN
FROM dbo.Employee
WHERE SSN IS NOT NULL;

CREATE UNIQUE CLUSTERED INDEX Employee_SSN_Unique
ON Employee_SSN_NOT_NULL (SSN);

Notice the view I’ve created has the "WITH SCHEMABINDING" clause on it, which tells SQL Server that the underlying table cannot be modified in a way that affects the view without first dropping the view. SQL Server requires that constraint so I can execute the final statement of the three, adding an index to the view. And it’s that index that provides the uniqueness constraint, but only on the rows in Employee that have an SSN.

This is a somewhat elaborate way to address the issue, but it certainly works. Some folks don’t like it because they use views for other things and don’t want views for uniqueness cluttering up their schema. There is a simpler, somewhat "hackier" way to work around the problem, using a thing called the "nullbuster" as coined by Steve Kass.

The simplest way to understand the nullbuster in action is to look at the code:

CREATE TABLE Employee (
EmployeeID INT IDENTITY NOT NULL,
LastName VARCHAR(50) NOT NULL,
FirstName VARCHAR(50) NOT NULL,
SSN CHAR(11) NULL,
nullbusterSSN AS
(CASE WHEN SSN IS NULL THEN EmployeeID ELSE 0 END),
CONSTRAINT PK_Employee PRIMARY KEY (EmployeeID));

CREATE UNIQUE INDEX IDX_Employee_SSN
ON Employee (SSN, nullbusterSSN);

So once again I’ve created the Employee table and added an index with a unique constraint on it. The difference is the nullbusterSSN column. The nullbusterSSN column is a derived column that uses a CASE statement to check to see if SSN is NULL. If it is, nullbusterSSN contains the EmployeeID. If SSN is not NULL, then nullbusterSSN contains 0. The unique index I created is a composite index containing both SSN and nullbusterSSN.

The result is that you can have as many NULL SSNs as you like and the unique constraint won’t fire, but as soon as you try and write two rows with the same non-NULL SSN, you’ll hit the uniqueness constraint.

So now instead of having an extra view in the database schema, I have an extra column in the table. This might annoy you even more; it’s very much a personal choice. Both approaches work equally well. Choose whichever one you prefer.

We could build a more elaborate schema that had a separate table for SSNs and dealt with uniqueness that way, but I find that even more complicated and fraught with peril. As soon as you introduce another table, you run the risk that an employee has more than one SSN assigned to them. I’d prefer either one of these hacks to that.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: