Michael Rys

Musings on XML, XQuery and more...

<July 2008>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789


Navigation

Papers

SQL Server XML Whitepapers

Weblogging Links

MS Bloggers

Recommended Books

Other Blogs

Recommended Links

Presentations (Upcoming)

Presentations (Recent)

Subscriptions

News


Upcoming Presentations


TechEd 2007, Orlando, June 4 to June 8, 2007


Books I co-authored



www.flickr.com
This is a Flickr badge showing public photos from Michael Rys. Make your own badge here.
eXTReMe Tracker

Post Categories

Article Categories



Tim Anderson's interview on SQL Server Yukon: Some clarifications

In July 2003, Tim interviewed Euan Garden, our PUM for SQL Server Tools. In that interview, they also talked about the XML datatype comming up in SQL Server 2005. Since some of the answers have recently led to some confusion on the microsoft.public.sqlserver.xml newsgroup and are a bit out of date (almost a year later), I will review the XML-specific Q&A part of the interview in this post. Note that the interview is copyrighted by Tim and I reproduce the relevant questions and answers with Tim's permission.

Tim: What benefits do I get from the XML column type?

Euan: The first thing is that we can index the XML natively. Today when we store XML we actually shred it for you, or you can store it in a varchar. You can't do much interesting in terms of indexing a varchar, particularly if it's a large document. By storing it we can index it, so you index it by the different components inside the document. That's very powerful. We can very easily do inserts and updates into individual nodes and elements without loading the entire thing into the DOM. We're using SAX-based technology to run through the entire document, and find the place where you want to insert or change even one character in the document. We can be much more efficient and therefore scaleable.

Michael: What Euan is saying here is that an XML instance can be stored as either an XML datatype or a string type (such as varchar(), although varbinary() works similar as well). Once you store the XML in a string type, the system does not know whether the data is really XML. However, you can use an IFilter, full-text index the column and use the contains function to query for data in the XML. 

If you store the XML instance in an XML datatype, we will use an XML Reader that will parse the XML and store it in an internal binary format that - among other things - is efficiently mappable into an internal relational form that most queries execute over. That relational form can be generated by the primary XML index to avoid the runtime generation during queries and updates (see below). The XQuery and update expressions are being translated into internal algebra operations that work against the internal relational form. This means that we are neither using SAX-based nor DOM-based technology to query or update the document. The queries and updates should be efficient and scalable since we leverage the existing relation engine (extended with new operators) to minimize the number of accessed tuples in the internal relation form.

Note that this is a simplified explanation of what is really going on. Cost-based optimization may chose completely different strategies...

Tim: So let's say I had a Person document which has a LastName element. Can I index that LastName element and then search for LastName = "xyz"?

Euan: You can't index elements but we will index on the document itself, so it will be much more efficient. If you're doing a lookup by LastName, it will be orders of magnitude faster than it is today. We're still closing down some of the XML-based indexing. So that's one of the places where we're looking for feedback.

Michael: I already mentioned that the user can define a primary XML index to generate the internal queryable representation. Programmers also have the ability to define additional secondary indices on the primary XML index to index values, names, paths etc. The optimizer will use them to answer queries more efficiently as appropriate. Since this is version 1, we had to implement what we can deliver with this release and are planning on extending the secondary indices over time.

Euan: Supporting XQuery now allows us a much richer query language for insert, update and select operations. And we've also implemented this in a way that allows us to mix and match relational and XML data. What we've found is that people generally don't store only XML. For example, in a Customer table they might have relational columns for lastname, zipcode and things like that, but because there could be multiple mobile phone numbers, maybe that column is actually in XML. By using XQuery technologies we can pull that data out, including XML data, and present it to you as a relational result set. Or we can pull out the data as an XML document, including what is really relational data, and you can interact with it as XML. document. It comes back to choice.

Michael: On this I have no additional comment :-)

posted on Thursday, April 15, 2004 8:40 PM by mrys


# Michael Rys Clarifies Xml in Yukon @ Friday, April 16, 2004 8:01 AM

mrys

# Michael Rys on the Benefits of the XML Datatype in SQL Server 2005 @ Friday, April 16, 2004 8:14 AM

mrys

# Michael Rys Clarifies Xml in Yukon @ Friday, April 16, 2004 8:34 AM

mrys

# Yukon! @ Friday, November 19, 2004 12:30 AM

SQL Server 2005 is too cool for words, the new features rock. Go do some reading. Microsoft SQL Server- Introducing...

mrys

# Watching the Press @ Thursday, December 16, 2004 7:43 PM

mrys




Powered by Dot Net Junkies, by Telligent Systems