Enjoy Every Sandwich

Thoughts on SQL, XML, .NET and sometimes beer.

<December 2008>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910


Navigation

Tools

List O'Links

Kent's Other Stuff

Subscriptions

News

Please read these
Notices and Disclamiers

Post Categories

Article Categories



SQL CLR and XML: Saw the woman in two carefully

One highly underrated Warren Zevon song -- "For My Next Trick I'll Need a Volunteer" -- offers us an insightful piece of wisdom about this topic.

I can saw a woman in two, but you won't want to look in the box when I am through.

Aaron Weiker post drew my attention to Sahil Malik posting about SQLCLR UDTs, and Shawn Wildermuth's response. This really stuck in my head after reading an article Bill Sheldon had in the recent Developer .NET email update for SQL Server Magazine where he had some interesting (in the sense of unusual) things to say about SQLCLR. I thought about posting responses to that these things individually. Seems to me the issues are bigger than that. Since I could write a book about this topic but just do not have the time, let us see if I can't boil my thoughts down into Six Easy Pieces*:

  1. Suppose you need to represent a complex data type in a SQL Server database. You have at least four choices:
    1. Break the atomic bonds of the data, storing what you can in normal scalar types as individual fields. This approach tends to reduce implementation complexity while increasing computational complexity. Take the simple case of storing an imaginary number like the square root of negative four. You could do with two fields: one to hold the numeric part and a bit field to indicate is if the numeric part is an imaginary number of not. Easy, right? Not so fast my friend: suppose you need to some simple operation like square a stored value. Sure, that is easy because you just need to square the numeric part, but remember that you have to multiple the result by -1 to reflect the imaginary part. Remembering that is not always so simple. So we can see that this is simply trading computational complexity for implementation complexity: a zero-sum game at best, hugely painful at worst.
    2. Of course, you could just store the result as a varying text type. This is really a sub-class of the first choice in my mind, and it is a worse choice too because in order to do any computational work on it, you get to parse the string too. Not much joy (or performance) in that.
    3. You could choose to store the data as an XML document or fragment in using the XML type. On the up side, this allows you store and treat the data atomically. It also gives you the ability, when the instance is typed (bound to a Schema) to use it parts as their type in XQuery calculations. For our complex number example, that is probably just as painful as either of the first two types representation.
    4. You could certainly choose to write a User Defined Type to both store the data and to expose methods on it. This also allows you to treat the data atomically and use it within T-SQL queries using the .value or .ToString methods. There are limits to the amount of data you can store in a UDT and there are challenges to cope with if you have to tweak the internals of them.
  2. In my mind, there is two right times to choose XML over the other types of representation. First, use XML if the internal data is semantically order dependent.  Second, use XML when there is when the represented object itself does not have any natural methods to it (or at least ones we want to use within the SQL Server context). Take storing somebody's resume for example. I believe the XML type is a good choice for storing this type of information since you do not "add" to resumes together or "divide" one by another. You may, however, need to query them for those persons reporting they have a given skill or qualification. That is an example of use where there are no natural (meaningful) methods. Another "good fit" is storing knowledge documents (think business reports, product specs, etc) that can be represented as XML in the database. Trying to "shred" these documents into scalar fields would seem to me to be an exercise in madness. If you needed to reassemble that document in its original order, you are likely facing one heck of problem.
  3. The one time I would pick to use a UDT is when that type -- whatever it represents -- does have natural methods that I am interested using in a query. A complex number is a good example of where makes to me use a UDT since I would likely need to incorporate methods for arithmetic operations on that data.
  4. For the most part, I am against the idea of using UDTs to encapsulate business logic within the database context. Two reasons: First, if you have to tweak the logic within the UDT -- as frequently happens with a significant amount of business logic -- you're may have to first extract the value of that UDT out to some working storage, then drop any objects -- including columns -- that are bound to that type. With that done, you will get to re-register the updated assembly, the build any dropped objects needed then re-insert the data out of working storage. Do not forget to clean up your working data as well. Microsoft has talked about designs for future betas and CTPs for UDTs that will simplify that somewhat.
  5. Be careful about to use XML only when it makes sense too. It is critical to remember that while XML has a "high degree of cool" it should not be used for everything. Even with schema binding and indexing in place, query performance over XML instances is going to be at least somewhat slower than querying over the same data represented as scalars in a pure tabular form. You also have to consider the cost of potentially dealing with duplicated data and minimal RI versus the cost and complexity of shredding the XML into a tabular form and recomposing it as XML. For simple documents, that might not be much of a consideration. Beyond that, an art is best dealt with through learning and experimentation.
  6. Last but certainly not least, do not confuse one for the other: SQL Server 2005's XML features are not dependent on the SQL CLR integration features, nor do the CLR features depend on the XML features. Neither is the reason for the other. Granted, the two do work together: CLR objects can consume, create and return XML instances. You can certainly use the XML type to store XML serializations of .NET, Java and other types of objects.

If you take nothing else from this ramble, let it an understanding of Warren's wisdom:  either a CLR UDT or XML, you can do some amazing things. UDTs offer a natural way to represent objects that have useful methods. You should use them when you have need for such features as part of how you will the object as data. XML is capable of doing some equally interesting things and makes a good alternative to UDTs when you are dealing with order-sensitive data or complex data that lacks any natural methods. While both offer some advantages, they can bring new challenges too. Fixing an UDT -- especially one that is in critical production use -- is not trivial. Storing XML in a schema-bound column has similar management implications. So while you might be able to achieve magic on par with sawing a woman in two, it is equally it important to be sure you want to look in the box when you're through.

posted on Monday, February 07, 2005 3:49 AM by ktegels





Powered by Dot Net Junkies, by Telligent Systems