MS MVP
On being an Microsoft Most Valuable Professional
Two of the cool new features that we talk about in Essential SQL Server 2005 and Guerrilla SQL Server 2005 are:
- Data Definition Language (DDL) Triggers, which allow us to
catalog code that gets fired synchronously with event that trigger it. This is
useful in cases where you want to hook into the processing sequence for
commands like CREATE TABLE, ALTER VIEW or DROP ASSEMBLY, potentially logging
who did what or issuing a ROLLBACK to cancel that work.
- EVENT NOTIFICATIONS, which route information about events
asynchronously to Service Broker queue after the event has been processed.
EVENT NOTIFICATIONS are available for 193 specific events, including the events
we can use for DDL triggers.
This processing behavior difference is critical. EVENT NOTIFICATIONS
cannot easily be used to cancel work before it gets done, which a DDL trigger
can do. On the other hand, DDL triggers can’t be written for all 193 possible
events. So there are times we find ourselves wanting to use a synchronous
trigger for non-DDL events. For example, consider the routine situation where --
at the end of the month -- a given SQL Server needs to be dedicated to running
long-running and demanding batches and procedures to close out the financial
records. Normally, a "black-out" period is set by management for some interval (like
8:00 PM on the last day of the month) until work resumes the next day. The
question becomes: how can we easily automate the enforcement of this policy?
Well, if we could inject code into the logon process, it
would be easy. However, SQL Server hasn’t really made it easy to do that. At
least before Service Pack Two for SQL Server 2005. In this service pack a new
type of trigger is introduced: the LOGON TRIGGER. Let’s see how they work by
creating one.
What I’d like to do for auditing and troubleshooting
purposes is first create a database with just one table in it that allows me to
record logon activity. There’s nothing new about coding that up:
use master
go
if not db_id('audit') is null
drop database audit
go
create database audit
go
use audit
go
create schema admin
go
create table admin.LogonAudit(eventData xml,denied bit default 0)
go
use master
go
Let’s also create a user to test with:
if not suser_sid('userLogin') is null
drop login userLogin
go
create login userLogin with password = 'someEasyPassword'
go
Our trigger is actually fairly simple to write:
-- Drop our trigger just in case an older version exists...
if exists(select * from master.sys.server_triggers where name like 'noUserLogins%')
drop trigger noUserLoginsDuringMonthEnd on all server
go
-- Name it
create trigger noUserLoginsDuringMonthEnd
-- Applied at the instance level
on all server
-- Execute this code as the person who cataloged it
with execute as self
-- New, fires when a logon is attempted
for logon
as begin
-- Get the range of the backout period
-- starts on the last day of the month at 8PM
-- and ends at 9 hours later
declare @bob datetime,@boe datetime
select @bob = dateadd(m,1,getdate())
select @bob = convert( datetime,
cast(datepart(yyyy,@bob) as char(4))+'-'+
cast(datepart(MM,@bob) as char(2))+'-'+
'01 00:00:00'+' ', 120)
set @bob = dateadd(hh,-4,@bob)
set @boe = dateadd(hh,9,@bob)
-- If its currently in the back period and
-- a disallowed login is tried...
if (getdate() between @bob and @boe) and
(original_login() = 'userLogin')
begin
-- Stop the login
rollback
-- Record the attempt
insert into audit.admin.logonAudit values (eventData(),1)
-- See what happens to messages...
raisError('Attempted user login during month-end blackout',16,1) with log
end else
-- Record the attempt
insert into audit.admin.logonAudit values (eventData(),0)
end
go
So what happens if somebody tries to logon using the
userLogin we created during the black-out period? They get a message like this:

Personally, I have mixed feelings about this. It is great
because it works and does a nice job NOT displaying my potentially
security-sensitive error message (from RaisError) to the user. However, the
error message shown gives away two pieces of information it doesn’t need to:
the fact that a trigger nixed the log-on ("due to trigger execution") and that
there might very well be some database on the system called "master." I’m not
really sure why the SQL Server team decided to have this dialogue show those,
so I filled a bug about it. If you agree that this isn’t good, you can show
your support by voting for my bug by voting in it at https://connect.microsoft.com/SQLServer/feedback/Vote.aspx?FeedbackID=237008
That said there’s no doubt that this is a very nice addition
to SQL Server 2005 and your bag of tricks.
Turns out that I'm spending an extra week in Los Angeles, but this time its for something new for me: I'm helping Brock Allen and Jason Diamond with next run of Guerrilla ASP.NET 2.0 at the Hilton South Bay.
In some ways, this is coming full circle for me. Many years ago, I took DevelopMentor's ASP.NET 1.0 class (when the product was still in Beta) and that's what got me excited about what DM does. Its going to be fun to getting back to ASP.NET after a year or so away from it on a day-to-day basis.
Time to fire up your favorite RSS feed reader and get to subscribing:
http://blogs.msdn.com/sqlqueryprocessing/rss.xml
No SQL Developer should leave home without it.
A commonly asked question is "how can I load an existing XSD file into an XML Schema Collection?" It's pretty easy:
declare @don xmlselect @don=bulkcolumn from openrowset(bulk 'c:\users\ktegels\some.xsd',single_blob) as screate xml schema collection dbo.collectionName as @dongoSubsituting the location for your file following the BULK provider in the OpenRowSet source.
Are you Datamart Builder with few bones to pick? Are your cubes too slow?
Ok, so I can't do anything to help you with that. But you can help Microsoft help you. You see, they are in the planning phase for the next version of SQL Server. Your input counts! Visit the SQLTIPs blog for a link to the online surveys.
One of the three best PSS SQL Server Engineers -- Bob Ward -- has written up a great set of white papers on SQL Server I/O. More about them on the SQL Blog.
The 18th. So, okay, that's not a very interesting question in and of itself. A better question might be "what's the Nth occurrence of the Mth day in each month?" Like, what are all of the third Thursdays in 2017?
The literal -- and rather boring -- answer is January 19, February 16, March 16, April 20, May 18, June 15, July 20, August 17, September 21, October 19, November 16, December 21.
So what's the interesting question here? How can one compute these using a T-SQL query. If you are using SQL Server 2005, that's not too hard as this Table Valued Function demonstrates:
create function dbo.NthGivenWeekDaysForYear(@year smallint,@gap tinyint,@dayOfWeek tinyint)
returns @d table(date datetime) as begin
-- fill up a table variable for the days in the requested year
declare @days table(date datetime,dayOfWeek tinyint,Month tinyint);
declare @ds datetime, @de datetime;
set @ds = cast(cast(@year as varchar(4)) + '-01-01' as datetime)
set @de = dateadd(yyyy,1,@ds);
while(@ds LESS-THAN @de)
begin
insert into @days values (@ds,datepart(dw,@ds),datepart(MM,@ds));
set @ds = dateadd(dd,1,@ds);
end;
-- This CTE computes the occurrence number for each weekday within the given
-- month (e.g., first Monday gets a seq of 1, second Monday is 2...) Resets
-- with the month changes. Filters down to just the desired date.
with c as (select date,row_number() over (partition by month order by date) as seq from @days where dayOfWeek=@dayOfWeek)
-- We can now insert the target dates. We know that seq has the
-- nth occurrence of that weekday in the month.
insert into @d select date from c where seq = @gap;
return;
end;
The magic here is in the row_number() function and in its over clause. Here the row_number() is computed as a sequentially increasing value for each occurrence of a given weekday within a month. We sort by date here so the first day of interest gets the value of one in the sequence. The partition by reset the sequence to one when the month changes.
So why would we want to write this as a TVF rather than say a stored procedure? The immediate answer might not be so obvious, but here's a use case: suppose your company has started a bonus program for newly hired employees. Employees are eligible to participate in the program so long as they were hired before the third Friday in the month when payroll data get generated. All it takes to refractor this function is a couple of tweaks. First, we need to have the function accept a hiredate as a parameter. Second, we need to modify the final insert statement to this:
insert into @d select date from c where seq = @gap and datepart(m,date) = datepart(m,@hireDate) and datepart(d,@hireDate) LESS-THAN-OR-EQUAL-TO datepart(d,date)
We can now leverage the TVF using the new APPLY operator, in this case, the OUTER APPLY variant thusly:
select EmployeeID,hireDate,date from adventureworks.HumanResources.Employee e outer apply dbo.NthGivenWeekDaysForYear(2002,3,6,e.hiredate) where datepart(yyyy,e.hiredate) = 2002
Because Employee 284 missed the cutoff by one day, the TVF returns a null value for that row, and we know that this employee didn't qualify for any bonus for May of 2002.
We're just wrapping a teach of Guerrilla SQL Server and I was reading the evals. While I was happy with all of the comments, but one really deserves calling out:
"niels f*cking rocks! best instructor i've ever had period."
I agree! Way to go Niels!
And just so you know, it rocks even more to work with him.
There's a Zen proverb that accurately describes this book: "If you understand, things are just as they are; if you do not understand, things are just as they are."
What do I mean by this? If you already understand writing OLAP queries, this book doesn't really introduce anything that likely to be new to you. If don't, Joe does a decent job of explaining the basics and demonstrating useful techniques in standard SQL and even, albeit briefly, Microsoft's MDX language. He presents things just as they are. There's a few things I like about this book and some I don't. Let's start with the negative.
- Its Joe being Joe. Okay, that's not bad or wrong per se, but here again he goes on his traditional rants about vendor-specific extensions, the differences between records and rows and the inappropriateness of identity based primary keys. If you are used to reading or listening to Joe, you expect this. If not, then you'll probably find these tangents interrupt the reading flow significantly.
- Its not T-SQL: If you're buying this book specifically to become a better T-SQL Developer, it can certainly help you do that. However, understand that Joe writes standard SQL, not Microsoft's T-SQL, so not all of his examples are directly usable. This is especially true for one of his ROW_NUMBER() examples. On page 73, he has a query that uses this function to compute the median of vector and references the computed values in the where clause. SQL Server 2005 doesn't support this. The example at the end of this post shows how to do write a similar query in SQL Server 2005.
- There might be a bug or two: Joe states on pages 51 and 52 that the MERGE INTO statement exists in SQL Server 2005. So far, I've been unable to prove this, at least as far the RTM version goes. I'm working with Niels Berglund to see if it was a feature in early versions (e.g., Alphas, Betas and CTPs).
Those negatives granted, what was good about this book -- and why do I suggest buy it?
- Its Joe being Joe. Personally, I appreciate his get to the point and show me code style. This is a book you can pick up and read in a few hours and come out with a better understanding of how to use SQL to do analytic queries. Its also a good -- but limited -- primer on OLAP concepts and terminology. If you're looking for a good place to take your first steps into this world, this book is a good choice.
- Chapter 11 on computing and using Correlations is outstanding: A while ago, there was another book on the market -- Data Mining and Statistical Analysis Using SQL by Lovett and Trueblood -- that was the book on doing statistical computing with SQL Server. I'm not sure that APRESS is still printing it, but I see that its available on EBay and Amazon Marketplace. Joe does a nice job covering this particular topic in five pages.
- There's lots of other chapters worth noting: Chapters 12 and 14 on distributions and regressions respectively are also very good. Chapter 18 on MDX is a too-brief but solid introduction to that topic. Joe doesn't try to teach you everything you should know on this complex topic, but again, its a great place to get your first foothold.
I think this book is a solid choice to make if you are interested in learning more about OLAP and Analytics from the SQL language point of view. It helps you understand these concepts "just as they are."
Median-by-RowNumber() Example for SQL Server 2005
-- Prepare table and fill with data
drop table dbo.foobar
go
create table dbo.foobar(X smallint not null)
go
set nocount on
declare @i int
set @i = 0
while(@i < 1001) begin
insert into dbo.foobar values (cast(rand() * 32767 as smallint))
set @i = @i + 1
end
go
-- Get the computed average
select avg(X) as Average from dbo.foobar
go
/* Replica of Joe's query
select avg(X),
row_number() over (order by X asc) as hi,
row_number() over (order by X desc) as lo
from dbo.foobar
where hi in (lo,lo+1,lo-1)
*/
-- Find the median values
with f as (
select X,
row_number() over (order by X asc) as hi,
row_number() over (order by X desc) as lo
from dbo.foobar)
select X as median from f
where hi in (lo,lo+1,lo-1)
go
A few months ago, I published a step-by-step procedure for using certificate to sign an assembly so an external access or unsafe assembly could be run without having to set the hosting database to trustworthy. The major problem with that process was its weight: a lot of steps and some still requirements in terms of certificate management. Turns out that yes, there is an easier way to make it work.
Part one -- preparing the master database
- As needed, start a series of queries against the MASTER database.
- Create the target database (meaning execute a CREATE DATABASE query).
- Code and compile the assembly you want to deploy. Make sure you sign the assembly with a strong name key file. This can be done with a PFX file generated by Visual Studio.
- Create an asymmetric key from from the compiled assembly using a statement like this: create asymmetric key <key_name> from executable file = '<path_to_dll_file>'
- Create a login based on that asymmetric key using a statement like this: create login <login_name> from asymmetric key <key_name>
- Grant that login the right to create either or both an unsafe or external access assembly (as needed) using: grant unsafe assembly to <login_name>
Part two -- preparing the hosting database
- As needed, start a series of queries against the desired user database.
- Create a user in that database mapped to the login created in part, step 5. (e.g. create user <db_user> from login <login_name>)
- Give that user the right to catalog an assembly, e.g.: grant create assembly to <db_user>
- Catalog the desired assembly using the now trusted asymmetric key with a statement like: create assembly <assembly_name> authorization <db_user> from '<path_to_dll_file>' with permission_set = unsafe
From there on, its just a matter of mapping your methods and classes to the desired T-SQL objects.
Note: the following is just to help Google find this post.
is not authorized for PERMISSION_SET = UNSAFE. The assembly is authorized when either of the following is true: the database owner (DBO) has UNSAFE ASSEMBLY permission and the database has the TRUSTWORTHY database property on; or the assembly is signed with a certificate or an asymmetric key that has a corresponding login with UNSAFE ASSEMBLY permission. RegistryReader
Jacob dually notes at http://blogs.msdn.com/jacobcy/archive/2006/08/09/693810.aspx that Omaha had a larger crowd for their August MSDN event than Boston. Good for the folks back home, it is great to see the community staying strong. If you were wondering where I was...
Boston. Not at the MSDN Event either.
Next time I'm thinking about going. --just to see if we can make a it an attendance tie.
Yesterday I was relaxing a bit at the Red Carpet Club (D7 gate) as the Dulles Airport when an Instant Message popped up. It was from one of my fellow DevelopMentor instructors. He was wondering why this query was complaining that RN is an invalid column:
SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID) AS RN,IMAGEID,IMG FROM DBO.IMAGES WHERE RN BETWEEN 10 AND 19
Unless you know how these ranking functions like ROW_NUMBER work, you might find yourself asking the same question. The trick here is that the value of the any of the ranking functions is computed after the underlying query against the table is completed. Therefore, the column RN isn't available in the WHERE predicate. So the next question is how do we work around this for making pages of data? That, in turn produced what seemed like a good from my cohort. Here's two variations on a solution:
SELECT RN,IMAGEID,IMG FROM (SELECT ROW_NUMBER() OVER(ORDER BY IMAGEID) AS RN,* FROM DBO.IMAGES) AS T WHERE RN BETWEEN 10 AND 19
This solution uses a sub-query to feed the computed result -- the query within the parens -- up to another query -- where the computed value for the row number is available. Another solution for the problem is:
WITH C(RN,IMAGEID,IMG) AS (SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID),IMAGEID,IMG FROM DBO.IMAGES) SELECT IMAGEID,IMG FROM C WHERE RN BETWEEN 10 AND 19
This uses a Common Table Experssion (the with... as() part) to do the same thing as the previous query. So the question becomes, is this a more efficient query than doing something like this:
SELECT TOP(10) IMAGEID,IMG FROM DBO.IMAGES WHERE IMAGEID >= 10 ORDER BY IMAGEID
The reason for the question was that the first two queries have two select within them, and that might lead one to believe that two queries over the data are required to produce the desired result. However this is not the case. Both the sub-query and the CTE generate operationally identical query plans, namely:
select <- filter <- top <- sequence project (compute scalar) <- compute scalar <- segment <- clustered index scan
meaning that first SQL Server finds a batch of row values from the primary key index on the table, reduces that to a set rows and then select that out to the ten records of interest. So there's while there's no second recordset created, there is an intermediate working set created. The last query, the SELECT TOP(10)..., avoid this the intermediate result set. This generates a more efficient query plan (8.5% lower cost) of:
select <- top <- clustered index scan
For a query over a four-thousand row example table.
The most important thing to note about the query efficiency here that unless these queries are definetly benefiting from having a cluster primary index key on the table. Repeating the cost analysis without these indexes. In the CTE example, the estimated query cost is roughly 25-times more expensive than the query over the indexed table. The once cheaper SELECT TOP() query becomes just as expense without the index.
 |
Congrats, Michael!
So if you've been wondering why I never wrote a book about SQL Server 2005 or XML, here's why -- there's one straight from the source.
For those of you with a passion of XML and SQL Server 2005, this is the book for you. My copy has shipped, when are you going to get yours? |
I'll be presenting to the New Jersey SQL Server user's group tommorow (http://njsql.org/blogs/user_group_news/archive/2006/05/15/30.aspx) night. Details here.
Hope to see you there!
On Sunday I confessed an interest in things GIS. Recently that interest has been rekindled by the ESRI podcasts and because the MapPoint team was kind enough to send the MVPs a copy of MapPoint 2006. I’ve used MapPoint for things like Driving Directions in the past. MP2004 as pretty good for that, but you could tell the road maps were getting a bit long in the tooth. I also felt it was a good tool for team driving: one person piloting the car, one using MP to navigate.
My first experiment with MP2006 was as an in-drive navigation aid like TomTom or NeverLost. I really only have nits to pick. First, it is hard to hear the voice guidance on my lap given where the speakers are, so I found myself pulling over and cranking up the audio. It still wasn’t as loud as I’d like, but it worked. Having used NeverLost, it really liked its tone telling you that you made correct turn or needed to make a turn now. MP doesn’t give that kind of feedback and I wished it would.
But one thing that I like MP2006 for is that it can record a trip in a trail. You can save the trail right now, but that’s it. I have noticed, however, that you can copy and paste the trail to the clipboard and with a little programming, you can get the data in the trail from the clipboard. What I’ve not fully figured out is how to parse this into useful data. Ideally I’d like to get the position and time data and I do think that’s possible. If and when I figure out how, I’ll post that here.
The other cool thing in MP2006 I’m enjoying the exportability of demographic data to Excel format. For example, I’ve been doing a couple of demos where I’ve extracted the number of households in a given zip code that frequent flyer plans and used SSIS to massage that into SQL Server. It makes for a good test of a lot of data at least.
"order has shipped."
My copy of Bob and Dan's "A Developer's Guide to SQL Server 2005 has finally departed from whatever cave Amazon loaded them into. Can't wait to see how different it is. Its going to be a lot of reading too, its accompanied by Johnson's ADO.NET Advanced Topics book that I'm also looking forward too.
Is it time for me to take up the writing effort again? Maybe. I've got ideas, but what would you be interested in?
Doh! Sorry to be late getting this on the wire but the Heartland Developers' Conference for 2006 is has been announced. This year its in Omaha and runs from 26 October and 27 October. Hopefully I won't find myself in China again...
In its 3rd year, HDC06 is the largest independent annual professional Microsoft development conference in the nation, organized by user groups, and focused on .NET development and other emerging Microsoft technologies. HDC provides regional developers the opportunity to experience sessions usually reserved for Tech-Ed and PDC at a much lower cost while retaining the same nationally known presenters. Over 2 days, 23 sessions, 2 tracks, and several networking opportunities, HDC06 is where you’re find top knowledge experts that can prepare you for next level development.
For only $125 Early Bird/$175 standard, each Attendee Pass includes:
1 Conference pass to 2 keynotes and all 23 ninety minute sessions
- Meals including breakfast, lunch, snacks, and coffee breaks
- Access to the “Developer Lounge” expo and activity center
- Wireless internet access throughout conference area
- Pre conference networking party including drinks and food
- Thursday night “Developer Jam” including food, drink, and a casino for prizes!
- Attendee Kit and a chance to win a monster 64 Bit Alienware computer
The speaker list is looking awesome too: Microsoft's own Jeff Brand, Don Bryner and Jacob Cynamon. MVPs Dave Donaldson, Robert Hurlbut, Rocky Lhotka, Javier Lozano and Andrew Troelsen and community rockstars Craig Utley, Phil Wolfe, Robert Boedigheimer, Tim Gifford, Matt Milner and Nick Parker. I'm planning of giving a couple of talks on SQL Server 2005 for Developers.
Register here. And remember, we're talking a mere $125 one of the best Microsoft-focused technical conferences in the Midwest.
File this under this how to know what you have fix before you fix an XML Schema Collection
When I’m introducing folks to XML Schema Collections in SQL Server 2005, conversation usually goes something like this:
Blah blah blah blah use to make sure that XML instances are valid according to an XML Schema blah blah blah and you create them using CREATE XML SCHEMA COLLECTION name AS blah blah blah. Naturally, you can delete such a schema collection using DROP XML SCHEMA COLLECTION but you have to make sure there are no dependent objects on that before doing so. Some more talk and the then I have to it. The part of the talk I like the second least.
“So there is an alter for XML schema collections and to nobody’s surprise that starts with ALTER XML SCHEMA COLLECTION followed by the qualified name of the schema collection and a verb. In SQL Server 2005 there is one supported verb: ADD.”
That means that the alter for an XML schema collection is really only good for doing either of things: adding additional elements to default namespace, or adding a schema for a new namespace. And that’s basically it. There’s really not a good way to fix a broken or out of date schema for an existing namespace. And that means – oh joy – that you need to alter all of the objects that use the schema to not do so before you go about fixing the schema. And the first step in that is, of course, getting an inventory of the objects you’ll need to touch.
Continued here, due to length.
Conor Cunnigham kicks it off with:
On behalf of the Query Optimization Team for Microsoft's SQL Server product, welcome to our humble virtual abode. We decided that we'd start a blog to help people better understand query plans, physical schema design in databases, making your application perform better, and anything else related to query optimization. Furthermore, we'll be posting tips and tricks that we learn from our work with customers so that others can benefit from those exchanges.
They are coming out of the gate strong, too. There's already an article up about usimg Computed Columns to optimize certain types of queries in SQL Server 2005.
RSS Feed Here.
As usual, some of the best questions come from the newsgroups. And yet again, there was one that piqued by attention. The question essentially boiled down to “can you put a trigger on a login?” That is, the questioner wanted to execute some arbitrary T-SQL code whenever a user logged into a given instance