Download the Code
The benefits and drawbacks of GUIDs as primary keys are generally known. Despite the drawbacks, some people use them
to fulfill specific business requirements (the IDs are unique across multiple
servers). For a concise explanation
of the drawbacks, see the second paragraph of this
blog
entry by Marcus Mac Innes.
Thankfully, SQL Server 2005 introduces a new way to create GUIDs that
generates them in sequence. This gives you the benefit of sequential IDs
(like using an IDENTITY) with the uniqueness benefits of GUIDs.
There are some very important things to understand about this feature. From
SQL Server 2005 Books Online:
"Creates a GUID that is greater than any GUID previously generated by this function on a specified computer."
"Important: If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID, and therefore access data associated with that GUID."
"The GUIDs generated by NEWSEQUENTIALID() are unique only within a particular computer if the computer does not have a network card."
"You can use NEWSEQUENTIALID() to generate GUIDs to reduce page contention at the leaf level of indexes."
Read that first sentence carefully, because
some have interpreted it incorrectly. It means that if you don't have a network card, you could create GUIDs that match GUIDs
created on another machine.
Let's see how this works by looking at GUIDs created one after the other
on two different machines.
This code will get us started:
CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID())
CREATE TABLE t2 (EntryID uniqueidentifier DEFAULT NewSequentialID())
INSERT INTO t1 DEFAULT VALUES
INSERT INTO t1 DEFAULT VALUES
INSERT INTO t1 DEFAULT VALUES
INSERT INTO t2 DEFAULT VALUES
INSERT INTO t2 DEFAULT VALUES
INSERT INTO t2 DEFAULT VALUES
SELECT EntryID, CONVERT(binary(16), EntryID) AS ConvertBinary FROM t1
SELECT EntryID, CONVERT(binary(16), EntryID) AS ConvertBinary FROM t2
|
|
EntryID ConvertBinary
------------------------------------ ----------------------------------
853C63ED-D7CC-4E86-965E-F425A53E1BA2 0xED633C85CCD7864E965EF425A53E1BA2
032F7CCC-70DF-41F5-AD24-4296922D6E3C 0xCC7C2F03DF70F541AD244296922D6E3C
16208550-D184-4E57-BBFD-35DE50D95C31 0x5085201684D1574EBBFD35DE50D95C31
EntryID ConvertBinary
------------------------------------ ----------------------------------
24AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA24F63911DA8F7300096B6FFD84
25AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA25F63911DA8F7300096B6FFD84
26AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA26F63911DA8F7300096B6FFD84
|
|
EntryID ConvertBinary
------------------------------------ ----------------------------------
5E11A586-534C-4CB8-8073-BD60B9D9F57F 0x86A5115E4C53B84C8073BD60B9D9F57F
DA29DBAF-B273-4648-914C-B069A0072060 0xAFDB29DA73B24846914CB069A0072060
6C0AE722-F93A-40F3-88F0-F58FD470E9C1 0x22E70A6C3AF9F34088F0F58FD470E9C1
EntryID ConvertBinary
------------------------------------ ----------------------------------
6009C875-3BF6-DA11-8104-000E35C144DE
0x75C80960F63B11DA8104000E35C144DE
6109C875-3BF6-DA11-8104-000E35C144DE
0x75C80961F63B11DA8104000E35C144DE
6209C875-3BF6-DA11-8104-000E35C144DE
0x75C80962F63B11DA8104000E35C144DE |
Based on the documentation, there would never be a collision between the GUIDs created on different machines.
As expected, multiple tables using NewSequentialID() will create sequential IDs. This is clear from a simple example.
CREATE TABLE dbo.t1 (col1 uniqueidentifier DEFAULT NewSequentialID())
CREATE TABLE dbo.t2 (col1 uniqueidentifier DEFAULT NewSequentialID())
CREATE TABLE dbo.t3 (col1 uniqueidentifier DEFAULT NewSequentialID())
-- Three inserts into t1
INSERT INTO dbo.t1 DEFAULT VALUES
INSERT INTO dbo.t1 DEFAULT VALUES
INSERT INTO dbo.t1 DEFAULT VALUES
-- Alternating inserts into t2 and t3
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES
SELECT CONVERT(binary(16), Col1) AS 't1Col1' FROM dbo.t1
SELECT CONVERT(binary(16), Col1) AS 't2Col1' FROM dbo.t2
SELECT CONVERT(binary(16), Col1) AS 't3Col1' FROM dbo.t3
|
|
t1Col1
----------------------------------
0xBC0AB27AF25311DA8F7300096B6FFD84
[Sequence: 1]
0xBC0AB27DF25311DA8F7300096B6FFD84
[Sequence: 2]
0xBC0AB27FF25311DA8F7300096B6FFD84
[Sequence: 3]
t2Col1
----------------------------------
0xBC0AB281F25311DA8F7300096B6FFD84
[Sequence: 4]
0xBC0AB287F25311DA8F7300096B6FFD84
[Sequence: 6]
0xBC0AB28BF25311DA8F7300096B6FFD84
[Sequence: 8]
t3Col1
----------------------------------
0xBC0AB284F25311DA8F7300096B6FFD84
[Sequence: 5]
0xBC0AB289F25311DA8F7300096B6FFD84
[Sequence: 7]
0xBC0AB28DF25311DA8F7300096B6FFD84
[Sequence: 9]
|
I built a test database to compare NewID()
GUIDs, NewSequentialID() GUIDs, and integer IDENTITY() as clustered primary (surrogate) keys. The basic setup is a credit card database. There are three tables:
Customer, Card, and Purchase. Each table has an ID column, and each child table has a foreign key
to the parent's primary key.
The script creates 100,000 customers, two cards per customer, and 100
purchases per card. (Note that the attached script has the 100,000 number
commented out. Generating the full data set took
almost 43 hours.)
| Table |
Type |
Rows |
Data Size (KB) |
Index Size (KB) |
| Customer1 |
Sequential GUID |
100,000 |
5,264 |
2,760 |
| Customer2 |
GUID |
100,000 |
7,736 |
2,760 |
| Customer3 |
bigint |
100,000 |
4,472 |
1,944 |
|
|
|
|
|
| Card1 |
Sequential GUID |
20,000 |
9,704 |
12,496 |
| Card2 |
GUID |
20,000 |
14,288 |
17,088 |
| Card3 |
bigint |
20,000 |
6,536 |
7,632 |
|
|
|
|
|
| Purchase1 |
Sequential GUID |
20,000,000 |
1,126,768 |
707,736 |
| Purchase2 |
GUID |
20,000,000 |
1,635,232 |
1,067,176 |
| Purchase3 |
bigint |
20,000,000 |
812,184 |
381,960 |
|
Using a sequential GUID here saves significant space versus a standard NewID() GUID. The bigint is still much better than either of the GUID alternatives.
INSERT ... OUTPUT ... INTO
When inserting into an identity column, SCOPE_IDENTITY() will return
the identity column value from the just-added record.
For SQL Server 2005, when inserting into a
column with NewID() or NewSequentialID() as the primary (surrogate) key, you
could retrieve the ID based on the
natural key. But the exciting development in 2005 is the OUTPUT clause. Try
this simple test:
CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID())
CREATE TABLE t2 (EntryID uniqueidentifier DEFAULT NewSequentialID())
INSERT INTO t1 OUTPUT INSERTED.* DEFAULT VALUES
INSERT INTO t2 OUTPUT INSERTED.* DEFAULT VALUES
|
|
EntryID
------------------------------------
FE06B4B9-0838-4AC5-AF45-827D570A9F75
EntryID
------------------------------------
53644D63-38F6-DA11-8F73-00096B6FFD84 |
The OUTPUT clause gives you back the GUID that was just created. This is a
clear advantage over doing a SELECT to look up the GUID based on data in the
record.
I always want to understand how changes in design and implementation affect
performance. (All tests were run on a single-processor desktop with a 2.66 GHz
P4, 2 GB RAM, and a single-spindle 80 GB IDE drive running Windows 2003 Server
and SQL Server 2005 SP1.)
In this simple test, the execution plan shows that the insert into Customer1 (NewSequentialID) takes 62% of the overall batch time
(INSERT: 56%, SELECT: 6%), while the insert into Customer3 (bigint IDENTITY) takes only
38%. If you leave out the post-insert SELECT statements, the numbers are
60%/40%, so the OUTPUT statement is causing a hit by itself.
DECLARE @outputTblCustomer1 TABLE (CustomerID uniqueidentifier)
-- Customer1
INSERT INTO dbo.Customer1 (CustomerNumber, LastName)
OUTPUT INSERTED.CustomerID INTO @outputTblCustomer1
VALUES (-1, N'LastName')
SELECT CustomerID FROM @outputTblCustomer1
-- Customer3
INSERT INTO dbo.Customer3 (CustomerNumber, LastName)
VALUES (-1, N'LastName')
SELECT SCOPE_IDENTITY() AS CustomerID |
If you have avoided using GUIDs because of page splits and other issues, reconsider this option
as you move to SQL Server 2005.
Because of the performance disadvantage of the ID
retrieval, the larger amount of space required (16 bytes versus 8 bytes), and
the inherent difficulty in using GUIDs, I believe bigint IDENTITY() still wins
over NewSequentialID(). If you have some compelling reason to use GUIDs, then I
suggest using NewSequentialID() rather than NewID() ... unless you cannot accept
the predictability of NewSequentialID(), then use NewID().
It always comes down to making a decision based on your unique business
requirements.