Michael Rys

Musings on XML, XQuery and more...

<October 2008>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678


Navigation

Papers

SQL Server XML Whitepapers

Weblogging Links

MS Bloggers

Recommended Books

Other Blogs

Recommended Links

Presentations (Upcoming)

Presentations (Recent)

Subscriptions

News


Upcoming Presentations


TechEd 2007, Orlando, June 4 to June 8, 2007


Books I co-authored



www.flickr.com
This is a Flickr badge showing public photos from Michael Rys. Make your own badge here.
eXTReMe Tracker

Post Categories

Article Categories



Does IBM really move the database goalposts?

In my ongoing quest to bring substance and correctness to web articles about XML and databases :-), I felt that I need to comment on a recent Register article.

Recently, Philip Howard, Bloor Research wrote an article for The Register on the recent announcement for the new XML support in an upcoming IBM database release. Unfortunately, I find the article badly researched and full of misinformation about the status quo and the architectural advantages and disadvantages. (Note: I will send Philip a pointer to this article and hope he will provide us with some comments from his side).

Let's first start with the technical claims of the article:

"IBM has concluded, rightly in my view, that using a relational approach is not adequate for processing XML."

As readers of my weblog know, IBM is not the only or first one to have come to this conclusion.

"Either you store it in relational format, in which case you get a major performance hit because you have to convert it to and from tabular format whenever you store or retrieve it, or you have to store it as a binary large object, in which case you can’t do any processing with it."

Now this statement is misleading. As I have outlined in my XQuery book chapter (still looking for a Christmas present? :-)) and my earlier comments on the very informative Infoworld article, you need to distinguish between the logical and physical storage model for XML. The logical model means that you want the XML to provide Infoset or XQuery data model fidelity. At the storage level, this can be achieved in a variety of ways, including relational, BLOB or some other storage form. And while it is true that there are performance trade-offs between a compact BLOB or a storage that shreds the data in some way (not necessarily relationally) regarding recomposition and repurposing usage, there is nothing that prohibits optimizations in this area. For example, SQL Server 2005 uses an internal binary representation in a BLOB as the primary storage for the XML documents, but provides XQuery queryiability on the XML nevertheless. Indices can be used to optimize the query expression transparently and based on cost.

"So, using relational storage is inadequate for one reason or another, and IBM has concluded that another approach is necessary. The company’s next generation database will therefore have two storage engines: one relational store and one native XML store. And let me be quite clear about this: these engines will be completely separate, with separate tablespaces, separate indexes (Btrees and so forth on the one hand, and hierarchical on the other), and so on."

Honestly, I fail to see why having two physical storage engines is an advantage. Does this mean that you have to back them up separately and you have to manage the separation of the XML and relational data at the user-level (ie, could not easily associate XML data with relational data and viceversa)? That I think would be really bad. Or does it mean that under the covers, XML data is not stored in a relational table but some proprietary, internal format and that some special indices and some physical layout management functionality is available? That certainly would be nice, but is nothing earth-shattering. For example, SQL Server 2005 provides XML specific indices as well (and whether they are B-, R-, H- or other trees is fairly irrelevant as long as they provide the performance).

Based on some additional information that I received from Jim Kleewein, one of the IBM DB2 architects and distinguished engineers (thanks again Jim!), it seems that they basically store the XML in some internal format and extended and utilize their existing indexing infrastructure and query engine. This is good engineering and is similar to the approach that SQL Server 2005 takes.

This leads me to the state of the art:

Oracle provides an XMLType that provides Infoset fidelity since 9iR2 which is a subset of the SQL-2003 standard XML type. SQL Server 2005 provides an XML datatype that also provides Infoset/XQuery data model fidelity which is basically an extended SQL-2003 standard XML type. So based on the information provided in that article, I would not say that either of the two vendors are "well behind the curve". Especially not, since the IBM offering talks about an alpha version, whereas both Oracle's and SQL Server offerings are being used today in production systems. It may be possible that IBM is offering additional things not mentioned in this article, but based on the information in the article, IBM seems to be playing catch-up.

Finally, let me comment on the author's expectations on the competition's reaction:

"Finally, I expect to see Oracle, in particular, to froth at the mouth at this announcement. It will no doubt declare that this is the wrong direction and the wrong road. In my opinion it will be Oracle that is wrong: you just can’t get both the necessary flexibility and performance that you need for XML unless you are prepared to move away from a purely relational approach."

I doubt that Oracle (or Microsoft) are frothing at this announcement due to the architecture. We may however get excited at reports that are somewhat clueless and imply that we are behind, while we are ahead. We certainly will evaluate the IBM architecture once more is known publicly. But based on the limited information available through such announcements and the information I received from Jim, I don't think that the IBM architecture will be radically better or worse for managing XML data per se than SQL Server 2005's.

To conclude: IBM moves the DB2 goalposts away from the rather complex and proprietary XML extender that this technology will (finally) replace, but so far, this looks more like they are moving them closer to Oracle's and Microsoft's which will be good for keeping competition and innovation going.

posted on Thursday, December 16, 2004 3:55 PM by mrys


# Does IBM really move the database goalposts? @ Thursday, December 16, 2004 7:42 PM

mrys

# XQuery support in SQL Server (soon) and .NET (maybe later) @ Friday, December 17, 2004 8:22 AM

mrys

# re: XQuery support in SQL Server (soon) and .NET (maybe later) @ Friday, December 17, 2004 1:31 PM

mrys

# XML 2004: Lated short trip report and links to my presentation @ Wednesday, December 29, 2004 5:43 PM

XML 2004: Lated short trip report and links to my presentation

mrys

# Late XML 2004 Trip report and links to my XML 2004 presentation @ Wednesday, December 29, 2004 9:16 PM

mrys

# re: XQuery support in SQL Server (soon) and .NET (maybe later) @ Thursday, January 06, 2005 5:39 PM

mrys

# On &amp;quot;native&amp;quot; XML support in databases @ Thursday, April 21, 2005 3:37 PM

A couple of months ago, I provided&nbsp;feedback on&nbsp;an article on XML support in DB2 written by...

mrys

# Authentic Analysis and Argumentation? @ Friday, April 22, 2005 7:33 AM

I've been a bit out of the habit of writing here - for the last couple of months most of my free time...

mrys




Powered by Dot Net Junkies, by Telligent Systems