January 2005 - Posts

Hiring the top 0.5%

Here's a humorous bit from Joel Spolsky...

Rant mode on

I wonder, do people not consider that those 200 resumes they receive for their ill-defined, badly written, horrible job req might be the BOTTOM 200 resumes? So they're really hiring the top person from the bottom of the pile. Good job, employers.

That's what's going to keep happening as long as employers demand things like I've seen on recent job postings: "5+ years practical DBA experience. At least 5 years SQL Server. At least 5 years Oracle. Expert knowledge of C++. Expert knowledge of UNIX. Expert knowledge of TCPIP ..."

Huh? TCPIP?? And what is "expert knowledge"? Can anyone really be an expert in that many technologies? Really???

And of course, all of the job postings have the same sad requirement: "Good communication skills"

When's the last time you had a resume submitted, along with a cover letter that said, "Me no talk good, but I willing to learn. But me is be C++ expert!"

If you're a C++ expert, you probably have good enough communication skills to get by as a C++ developer!!!!

And the salary for these wonderful job posts is always the punchline. They generally come in around 20% under market.

So good luck getting those "experts", employers!

Rant mode off

Bitmask Handling, part 2: Bitmask reconstitution

Posting the first part of my series on bitmasks (yes, this is now officially a series) taught me a lot about my readers: You don't care about handling bitmasks in the database. And I respect you for that! I'm overjoyed, as a matter of fact! That article has received the least hits of anything I've posted in this blog to date. So good for you for not clicking.

However, I'm going to continue and take this thing all the way to the bitter end. Why? I'm not sure, but I think it has something to do with my freshman year computer science course, "Data Structures". It was a course taught in C++, and the very first project was to create a series of classes that could support working with very large numbers. Of course, this was all handled via a bunch of bit manipulation, but I didn't know that because slacker that I was, I never went to class and only started the project the night before it was due.

I never finished the project. But out of the kindness of his heart -- or madness in his brain -- my professor let it slide and I passed the class anyway (I started going to class and completed the rest of the projects after that little slip!)

So here I am with this gaping hole in my education. I never finished the project for large numbers. And I'm really rusty on my C++ at the moment, but I know my way around SQL pretty well.

So what's next? I'm thinking we'll start with standard logical operators: AND, OR, NOT. And then move on to right-shift and left-shift. From there, it should be a pretty simple jump to addition, subtraction, multiplication, and division. I think. My other goal -- if I can figure out how to do it -- is to provide a mechanism by which the user will be able to view a decimal representation (in the form of a string) of the number contained in the bitmask. So this isn't really "handling bitmasks" anymore -- it's now become "how to represent really big numbers".

Anyway, on to the meat of this episode (for the 5 people who are probably going to bother reading this)...

We now have a way of taking a bitmask of arbitrary size (up to 4096 bytes, but could be bigger with more rows in the numbers table) and producing a table of integers representing which "bits" are set to 1 in the bitmask. That's thanks to the splitBitmask function.

Integral to finishing this project is a method of going the other way around -- we need to reconstitute bitmasks from a table of integers representing bit positions.

Since this series is a subseries of my series of things not to do in SQL Server, I'm allowed to use all sorts of dirty tricks to accomplish my goal. The trick du jour is aggregate concatenation (don't you love all of these intra-blog links?). I considered looping and tried to think of a truly set-based solution, but nothing beats the fun of letting SQL Server handle the dirty work in a totally undocumented way.

As you may recall, the BitmaskNumbers table defined in the first article contains three columns: A "Number", which represents an individual bit, a "Byte", which represents a collection of 8 bits, and a "BitValue", which represents the decimal value of that bit within the byte.

Assuming that we have a table of integers representing bit positions, we can join to the BitmaskNumbers table to get the associated bytes and bit values for those bits. So for instance, if we were to split the bitmask 0x1001F3 using the splitBitmask function:

SELECT
	BitmaskNumbers.Number,
	BitmaskNumbers.Byte,
	BitmaskNumbers.Bitvalue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number


Number	Byte	Bitvalue 
------	------	-------- 
1	1	1
2	1	2
5	1	16
6	1	32
7	1	64
8	1	128
9	2	1
21	3	16

We have three bytes, and the de-aggregated values for the bits in those bytes. Aggregating them is a simple matter of using the SUM() function:

SELECT
	BitmaskNumbers.Byte,
	SUM(BitmaskNumbers.Bitvalue) AS TotalValue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte


Byte	TotalValue  
------	----------- 
1	243
2	1
3	16

Of course, that's a decimal representation and we require hexadecimal...

SELECT
	BitmaskNumbers.Byte,
	CONVERT(VARBINARY(1), SUM(BitmaskNumbers.Bitvalue)) AS TotalValue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC


Byte	TotalValue 
------	---------- 
3	0x10
2	0x01
1	0xF3

... And now you can see our original three bytes: 0x10, 0x01, and 0xF3. Lucky for those of us who want to do weird things with big bitmasks, SQL Server treats binary values like strings when using the + operator:

-- 0x10 + 0x01 != 17

-- Instead:

SELECT 0x10 + 0x01 AS Concat


Concat 
------ 
0x1001

... And one other property that will be useful for aggregate concatenation:

SELECT 0x00 + 0xFF AS Concat


Concat 
------ 
0x00FF

--That's not very useful; how do we initialize a NULL varbinary variable to an empty value?


SELECT 0x + 0xFF AS Concat


Concat 
------ 
0xFF


--0x it is...

From here, it's a simple step to rebuilding the original bitmask:

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
	@Bitmask = @Bitmask + 
		CONVERT(VARBINARY(1), 
			SUM(BitmaskNumbers.Bitvalue)
		)
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0x1001F3

But this bitmask has no blank spots. What if we switch to 0xFF00FF?

TheBitmask
------------
0xFFFF

--That's not good!

Changing the INNER JOIN to a RIGHT JOIN and adding a NULL check solves the problem a bit:

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
	@Bitmask = @Bitmask + 
		CONVERT(VARBINARY(1), 
			SUM(CASE
				WHEN x.Number IS NULL THEN 0
				ELSE BitmaskNumbers.BitValue
				END)
		)
FROM dbo.splitBitmask(0xFF00FF) x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
-------------
0x00000000000000000000000000000000000000000000 ... FF00FF

--Long string of zeroes followed by the original bitmask truncated for brevity

To keep things sparse, we should only take as many bytes as we require. We can filter the results post-join, and also take advantage of the index that was created on the Byte column in the first installment:

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
	@Bitmask = @Bitmask + 
		CONVERT(VARBINARY(1), 
			SUM(CASE
				WHEN x.Number IS NULL THEN 0
				ELSE BitmaskNumbers.BitValue
				END)
		)
FROM dbo.splitBitmask(0xFF00FF) x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
WHERE BitmaskNumbers.Byte <=
	(SELECT
		CASE MAX(Number) % 8
			WHEN 0 THEN (MAX(Number) - 1) / 8
			ELSE  MAX(Number) / 8
		END + 1
	FROM dbo.splitBitmask(0xFF00FF))
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0xFF00FF

--Finally, the correct re-constituted output

A small optimization is using a table variable for the results of splitBitmask so that it will only have to be executed once. So the final pattern I'll present for solving this problem, ready for insertion into a variety of new UDFs, is:

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x11002200330044

DECLARE @BitsInBitmask TABLE(Number SMALLINT)
INSERT @BitsInBitmask
SELECT Number
FROM dbo.splitBitmask(@Bitmask)

SET @Bitmask = 0x

SELECT @Bitmask = @Bitmask +
	CONVERT(VARBINARY(1), 
		SUM(CASE
			WHEN x.Number IS NULL THEN 0
			ELSE BitmaskNumbers.BitValue
			END)
		)
FROM @BitsInBitmask x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
WHERE BitmaskNumbers.Byte <=
	(SELECT
		CASE MAX(Number) % 8
			WHEN 0 THEN (MAX(Number) - 1) / 8
			ELSE  MAX(Number) / 8
		END + 1
	FROM @BitsInBitmask)
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0x11002200330044

... And there you have it. Next time I'll investigate how to use this technique to very easily implement binary logical operators. And perhaps I'll disclose more tales about my poor study habits. Or maybe not.

Correction on bitmask handling

In the article on handling bitmasks I posted the other day, I made a fatal error in the splitBitmask function. The function treated the low byte as the first byte, instead of the high byte. Therefore:

0x01 != 0x0001

... and that is not good!

So here's a corrected version that fixes the problem:

CREATE FUNCTION dbo.splitBitmask
(
	@Bitmask VARBINARY(4096)
)
RETURNS TABLE
AS
RETURN
(
	SELECT Number
	FROM BitmaskNumbers
	WHERE (SUBSTRING(@Bitmask, DATALENGTH(@Bitmask) - Byte + 1, 1) & BitValue) = BitValue
		AND Byte <= DATALENGTH(@Bitmask)
)
GO

More to come soon... Bit shifting, logical operators, and other fun ways to annoy this guy:

Dealing with very large bitmasks

Continuing in my series of things you should probably not do in SQL Server but sometimes have to, I'm going to do a few posts on dealing with very large bitmasks.

Let me first state my utter hatered of bitmasks in databases. I think they're annoying, make the system difficult to understand, and whether or not they violate the First Normal Form (that's up for discussion), using them is just a sign of bad design.

But as I've said in other posts, I realize that short deadlines and tiny budgets are an issue at most shops, and sometimes we just need to shoehorn in a solution real quick (yeah, as if it's not going to last for the next 5+ years?)

In one case in the past, bitmasks were a very convenient solution to a problem I faced with an access control system. But alas, I only had 8 bytes available to me. Only enough for 64 values. And so that solution failed. And the company failed. And many, many tears were shed... If only I'd been able to figure out how to manipulate a bigger bitmask, I might have saved the little children...

I won't go into any more detail on that particular issue since there are still a few pending lawsuits, but suffice it to say that if that situation were happening today, I probably wouldn't use a bitmask anyway. But you might need one -- so here's how you do it:

First we're going to modify the table of numbers that I'm always telling you that you absolutely must have in every single database.

SELECT (a.number * 256 + b.number) AS Number,
	CASE (a.number * 256 + b.number) % 8 
		WHEN 0 THEN ((a.number * 256 + b.number) - 1) / 8
		ELSE (a.number * 256 + b.number) / 8 
		END + 1 AS Byte,
	POWER(2, CASE (a.number * 256 + b.number) % 8 
		WHEN 0 THEN 8
		ELSE (a.number * 256 + b.number) % 8 
		END-1) AS BitValue
INTO BitmaskNumbers
FROM
	master..spt_values a,
	master..spt_values b
WHERE 
	a.type = 'p'
	AND b.type = 'p'
	AND (a.number * 256 + b.number) BETWEEN 1 AND 32767
GO

CREATE CLUSTERED INDEX IX_Byte ON BitmaskNumbers (Byte)
GO

This will produce a table with 32767 rows. Each row has a Number, which will represent a bit position in our bitmask, a Byte, which will help to parse the bitmask, and a BitValue, which is the value that the individual bits within each byte represent. Feel my 1960-esque skill!

The brighter bulbs in my audience have now figured out that I'm going to show you how to handle a 4096-byte bitmask -- capable of handling up to 32767 values. Not bad. But if you need more, just put more rows in the BitmaskNumbers table.

So what do you want to do with bitmasks? Most of the queries I've seen involve access control And for those queries, you want to use a logical and and see if it evaluates to a number other than 0. That is, we want to see if both bitmasks we're comparing have any of the same bits set.

Using what little math knowledge I have managed to retain, I conjured up the following, which indicates which bit positions, based on the "number", are filled in a bitmask. For instance:

DECLARE@x VARBINARY(4096)
SET @Bitmask = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

SELECT Number
FROM BitmaskNumbers
WHERE (SUBSTRING(@Bitmask, Byte, 1) & BitValue) = BitValue
	AND Byte <= DATALENGTH(@Bitmask)


Number
----------
1
2
3
4
5
290
293
301
302
401

Fun stuff, no?

The Byte <= DATALENGTH(@x) allows SQL Server to utilize the clustered index on Byte, so that a full table scan doesn't have to happen every single time. Small optimization. I couldn't think of any others. If you can, drop me a line...

Those of you who've read this far are probably yawning and wondering where the access control stuff is... Who cares about chunking up the bitmask into its bit positions? Well, it's simply the first step. Bear with me.

What we need to do is wrap this in a UDF. Then if we had two bitmasks, we could join the bit positions to eliminate those that aren't in common...

CREATE FUNCTION dbo.splitBitmask
(
	@Bitmask VARBINARY(4096)
)
RETURNS TABLE
AS
RETURN
(
	SELECT Number
	FROM BitmaskNumbers
	WHERE (SUBSTRING(@Bitmask, Byte, 1) & BitValue) = BitValue
		AND Byte <= DATALENGTH(@Bitmask)
)
GO

DECLARE @Bitmask1 VARBINARY(4096)
SET @Bitmask1 = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

DECLARE @Bitmask2 VARBINARY(4096)
SET @Bitmask2 = 0x0E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

SELECT x.Number
FROM dbo.splitBitmask(@Bitmask1) x
JOIN dbo.splitBitmask(@Bitmask2) y ON x.Number = y.Number
GO


Number
----------
2
3
4

We now know that these two bitmasks share bits 2, 3, and 4 in common. But for most access control situations, we don't care what bits they share in common -- just that they share some.

CREATE FUNCTION dbo.HasAccess
(
	@Bitmask1 VARBINARY(4096),
	@Bitmask2 VARBINARY(4096)
)
RETURNS BIT
AS
BEGIN
	DECLARE @Result BIT

	SELECT @Result = 
		CASE COUNT(*)
			WHEN 0 THEN 0
			ELSE 1
		END
	FROM dbo.splitBitmask(@Bitmask1) x
	JOIN dbo.splitBitmask(@Bitmask2) y ON x.Number = y.Number

	RETURN (@Result)
END
GO

DECLARE @Bitmask1 VARBINARY(4096)
SET @Bitmask1 = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

DECLARE @Bitmask2 VARBINARY(4096)
SET @Bitmask2 = 0x0E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

SELECT dbo.HasAccess(@Bitmask1, @Bitmask2) AS HasAccess
GO


HasAccess
-------------
1

... And that is pretty much it for this installment. We can now determine whether or not two bitmasks have bits in common, and if necessary which bits they share.

Future installments will cover how to manipulate large bitmasks in other ways -- flip specific bits, perform a logical and that produces a bitmask instead of a result set, and perform a logical or that produces a bitmask. All very useful stuff if you need to work with these bitmasks. But now I just need to figure out how to do all of that stuff.

So until next time.... Don't use this technique.

Feel like utter crap today?

Don't worry, you're not alone... According to this article, January 24 has been identified as the most depressing day of the year.

So relax, have a drink, and rest assured that brighter times are on the horizon!

Firefox performance tweak

Off-topic for this blog, but I figured I'd share anyway... I have no clue why it works or what it's really doing, so if someone reading can fill me in and tell me if this is a bad idea...

Anyway, in Firefox open a new window/tab and in the address bar enter about:config and hit enter.

Then change the following values:

  • network.http.pipelining to true
  • network.http.pipelining.maxrequests to 23
  • network.http.proxy.pipelining to true

End result: Firefox seems quite a bit faster. Again, I have no idea what's going on here, but no complaints yet!

Thanks to Lorin for this tip!

Blog Ethics

Interesting article from the Boston Globe, Blogger influence raises ethical questions.

Probably not much of a concern in technical blogs, but certainly an interesting read none the less!

Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005

Although it was published in July 2004, I just found a whitepaper by Arun Marathe on Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005.

This is a very well-written paper on how plan caching works, and most of the content applies to both SQL Server 2000 and 2005. I highly recommend that anyone interested in performance issues read this paper.

Blog Spammer #2

Welcome to the blog spammer hall of shame, Mr. or Ms. (I'm not sure) Sarahi Calista!

This piece of trash peddles everything from online poker to viagara, and hails from middlecay.org:

  Domain ID:D80297510-LROR
  Domain Name:MIDDLECAY.ORG
  Created On:27-Nov-2001 00:39:32 UTC
  Last Updated On:13-Jan-2005 07:23:12 UTC
  Expiration Date:27-Nov-2007 00:39:32 UTC
  Sponsoring Registrar:Moniker Online Services Inc. (R145-LROR)
  Status:CLIENT DELETE PROHIBITED
  Status:CLIENT TRANSFER PROHIBITED
  Status:CLIENT UPDATE PROHIBITED
  Status:TRANSFER PROHIBITED
  Registrant ID:moniker21494
  Registrant Name:Sarahi Calista
  Registrant Organization:Sarahi Calista
  Registrant Street1:Fairway Rd
  Registrant Street2:26
  Registrant Street3:
  Registrant City:Lytton
  Registrant State/Province:Puerto Rico
  Registrant Postal Code:83623
  Registrant Country:US
  Registrant Phone:+1.9017635434
  Registrant Phone Ext.:
  Registrant FAX:
  Registrant FAX Ext.:
  Registrant Email:mail28@support-2000.net
  Admin ID:moniker21494
  Admin Name:Sarahi Calista
  Admin Organization:Sarahi Calista
  Admin Street1:Fairway Rd
  Admin Street2:26
  Admin Street3:
  Admin City:Lytton
  Admin State/Province:Puerto Rico
  Admin Postal Code:83623
  Admin Country:US
  Admin Phone:+1.9017635434
  Admin Phone Ext.:
  Admin FAX:
  Admin FAX Ext.:
  Admin Email:mail28@support-2000.net
  Tech ID:moniker21494
  Tech Name:Sarahi Calista
  Tech Organization:Sarahi Calista
  Tech Street1:Fairway Rd
  Tech Street2:26
  Tech Street3:
  Tech City:Lytton
  Tech State/Province:Puerto Rico
  Tech Postal Code:83623
  Tech Country:US
  Tech Phone:+1.9017635434
  Tech Phone Ext.:
  Tech FAX:
  Tech FAX Ext.:
  Tech Email:mail28@support-2000.net
  Name Server:NS0.DNS-4U.ORG
  Name Server:NS1.DNS-4U.ORG

Code Camp 3 Sessions

Thom Robbins has posted an initial look at the Code Camp 3 Sessions, along with a note that "the Data Track folks are coming out strong"!

There are some great sessions lined up from Aaron Weiker, Hilary Cotter, Andrew Novick, Paul Corriveau (who I cannot find a link for), and, of course, your all-time favorite SQL blogger.

Good stuff! I'll keep you posted as I find out more... And as far as I know Thom is still accepting submissions, so be sure to send one if you have something you'd like to present on.

Validate a URL from SQL Server

In the category of things you probably shouldn't do...

Validate a URL from SQL Server

Blog Spammer #1

Hey blog spammers! Every time I receive a blog spam from a new domain, I'm going to post your info here.

So here we go, the official start of my list... We welcome Mr. Thomas Reece, purveyor of fine perscription drugs. Give him a call or e-mail and say hi.

  Domain Name: MEDIAVISOR.COM
  
  Registrant:
   Reece, Thomas
   249 W 89 Street
   NYC
   NY
   US
   10024
  
  Administrative Contact:
   Reece, Thomas (NIC-21871) contact100@team-support-24x7.net
   Thomas Reece
   249 W 89 Street
   NYC
   NY, US
   10024
   Phone: 2128732251
  
  Billing Contact:
   Reece, Thomas (NIC-21871) contact100@team-support-24x7.net
   Thomas Reece
   249 W 89 Street
   NYC
   NY, US
   10024
   Phone: 2128732251
  
  Technical Contact:
   Reece, Thomas (NIC-21871) contact100@team-support-24x7.net
   Thomas Reece
   249 W 89 Street
   NYC
   NY, US
   10024
   Phone: 2128732251
  
  Domain servers in listed order:
  
   NS0.DNS-1995.NET
   NS1.DNS-1995.NET
  
   Record created on 2000-01-11 16:32:00.0
   Database last updated on 2004-12-24 01:34:58.64
   Domain Expires on 2006-01-11 16:32:00.0