May 2006 - Posts

Stored procedures are not parameterized views

Peter van Ooijen over at CodeBetter.com posted in his blog about some observations he had when working with stored procedures in a recent project. What I found to be interesting about his post was his comment that a stored procedure can be, "a view with parameters."  I've run into this assertion before, and it's something I think needs some clarification for a lot of developers. I do not feel that there is any real similarity between stored procedures and views -- they are entirely different types of objects in an SQL database, and should not be considered forms of one another in any way.

Following is an edited version of the response I left in Peter's blog; I thought it warranted its own post:

Stored procedures are not -- and never can be -- "parameterized views".  A view in an SQL database can be treated the same as a table in virtually every context.  Consider:

SELECT *
FROM Tbl

vs.

SELECT *
FROM View

One of the great things about working with views and tables is that the person querying the database does not need to know whether the base object being queried is a view or a table. For the sake of writing SQL queries, they are one and the same. Both a view and a table have well-defined columns, with well-defined datatypes.

These assertions cannot be made for a stored procedure as compared with a view.  A stored procedure is related to a view only in as much as both are defined using SQL syntax. But beyond there, the two diverge into completely different types of entities. First of all, consider:

SELECT *
FROM StoredProcedure

This will not work, and will only result in an "invalid object name" exception. The reason? Stored procedures expose no explicit output contract.  Thanks to conditional branching, dynamic SQL, and SELECT *, a stored procedure can output vastly different results beween invocations, or based on different input parameters. It is quite possible to code a stored procedure that will output no result sets for one set of input parameters, two result sets for another, and four for another.  Or, it's possible to change the returned result sets, e.g. by outputting different column names or datatypes. Please note, this is an extremely poor (and very dangerous) coding habit to get into -- but the point is, it is impossible to verify the output of a stored procedure for a given set of input parameters without running it.

Furthermore, a stored procedure "late binds" to the base objects being queried.  This adds to the difficulty in verifying the output of a stored procedure, and is why you can create the following stored procedure without getting an exception (until you try to run it, of course):

CREATE PROC XYZ
AS
    SELECT *
    FROM ThisTableDoesNotExist
GO

These stored procedure behaviors are in stark contrast to the way views work.  Views provide a couple of means of verification:
  • The output columns/data types can be verified, and bound to, before actually querying the view
  • A view can be "schema bound", meaning that the underlying base tables (or other views) which the view is based on cannot be changed, schema-wise, unless the view is dropped.  
For the first point, simply query the INFORMATION_SCHEMA.COLUMNS or sys.Columns views, and column information can be determined for a view without having to query it.

The second point adds to the first in a few ways. Schema binding brings to views a certain sense of "early binding," which as I mentioned is missing in stored procedures. Although no view can be created if one of its base objects does not exist, schema binding takes it one step further and guarantees that the base objects used to create the view must exist, and must not be changed, for as long as the view is present in the database. This means that if a schemabound view is created that outputs a certain set of columns with certain datatypes, it is guaranteed to do so for as long as it exists in the database -- in other words, its contract is bound to the schema, and changes to other objects cannot affect it.  This is a powerful guarantee, which stored procedures fail to make.

So now the question is, if a stored procedures isn't a parameterized view then what is? The answer, as of SQL Server 2000 (and continuing in 2005), is the table-valued UDF.  A table-valued UDF is parameterized, has an explicit and verifyable output contract*, and can be schema bound.  If you are looking to implement a solution that makes use of a form of parameterized views, stored procedures are probably not the right choice.  I think that table-valued UDFs are quite underused and deserve a second (or first!) look from many T-SQL developers who may have glossed over them in the past.

* Note: Unlike for a view, the column list for a table-valued UDF cannot be queried from the INFORMATION_SCHEMA.COLUMNS table. The column list is, however, available from sys.Columns.

T-SQL Variables: Multiple Value Assignment

Tony Rogerson brings us an interesting blog post about T-SQL variable assignment and SET vs. SELECT.  The issue?  With SELECT you can assign values to multiple variables simultaneously.  But with SET, you can set up your assignment such that you get an exception if more than one row is assigned to the variable.  Both are desirable qualities... But unfortunately, as Tony shows us, it's difficult to achieve both multiple assignment and getting the exception thrown, at the same time.  Tony shows us a solution involving checking for the number of rows affected after the assignment. Creative and effective, but it still has an issue: Unlike with SET when it throws an exception, with Tony's solution the variables will still have been affected by the assignment.

As I was reading Tony's post, I couldn't help but think that there must be another way.  And low and behold, there is -- at least, in SQL Server 2005.  Thanks to the power of windowed aggregates we can have our multiple pieces of cake and eat them, all at the same time. Wonderful stuff.

So, here's what you do: Set up a CTE that selects the columns you'd like to assign to your variables, and also get COUNT(*), partitioned by 1 (or some other arbitrary literal). By partitioning by a literal, we will end up with the row count for the entire set. In the outer query, express the assignments from the columns returned by the CTE, but add an additional WHERE clause that compares the value of the COUNT(*) column with a subquery against a table of numbers. In the following example which I've adapted from Tony's blog, I'm using master..spt_values for the numbers, but you are encouraged to use a properly-indexed table of numbers, should you decide to use this technique:

DECLARE
    @reserved INT,

    @rowcnt INT,
    @used INT

SET @reserved = -1
SET @rowcnt = -1
SET @used = -1

;WITH x AS
(
    SELECT
        reserved,
        rowcnt,
        used,
        COUNT(*) OVER(PARTITION BY 1) AS theCount
    FROM sysobjects so
    INNER JOIN sysindexes si ON si.id = so.id
    WHERE
        so.name = 'sysrowsets'

)
SELECT
    @reserved = reserved,

    @rowcnt = rowcnt,
    @used = used
FROM x
WHERE theCount =
    (
        SELECT
            number
        FROM master..spt_values
        WHERE
            TYPE = 'p'
            AND number BETWEEN 1 AND theCount
    )

SELECT @reserved, @rowcnt, @used

As you'll see if you run this on your end, an exception is thrown and the values of the variables are not affected.  This works because the subquery used in the WHERE clause will return more than one value if theCount is greater than 1, thereby violating the rule that subqueries must only return one value.

The price you'll pay for this convenience?  Extremely complex code for a simple variable assignment, in addition to a slight performance penalty.  Is it worth it?  Probably not, at least for me.  To be honest, I seriously doubt I will ever use this -- I've never been especially concerned with the chance of multiple rows screwing up my variable assignment, and those times that it has happened, I've remedied the situation other ways (e.g., defining a better primary key). That said, I think this was an interesting T-SQL challenge, and if anyone comes up with a more elegant solution than Tony's or mine, I'd love to see it!

Blogging on Amazon

I've started a new blog on Amazon, via the Connect feature that they've added recently. This feature lets authors blog on their book's page, giving readers additional information or insights (or giving authors a soapbox for their political agenda, as a certain author I know seems to be trying to use it for).

On my Amazon blog, I'll be discussing issues that pertain to the chapters I wrote in Pro SQL Server 2005.  So far, I've written two posts; one about SqlUserDefinedAggregate, and another about SqlFacet. So, if you purchased the book and want some additional content, or if you're just interested in what I have to say, check out the blog.

Speaking engagements canceled

I heard that there were some questions about my whereabouts at the Code Camp this last weekend, so I just want to give a quick update.  I unfortunately managed to get a large chunk of glass embedded deep in my foot (don't ask...) and have a surgery scheduled this Thursday to have it removed.  In the meantime, I can't walk further than to my sofa (with the aid of a cane, at that!) so there's no way I could stand in front of an audience for 90 minutes and deliver an effective presentation.  As a result, I had to cancel on both Code Camp 5 and DevTeach -- quite a disappointment for me, as I was really looking forward to both events!

Luckily, there are a couple of surrogates standing in for me at DevTeach: Paul Nielsen will be doing a T-SQL Enhancements for SQL Server 2005 talk in my stead, and Jean-Rene Roy will be hosting the SQL Server 2005 Round Table discussion that I was scheduled to lead.

I hope everyone has a great week at DevTeach, and I look forward to seeing you at the next one!