Michael Rys

Musings on XML, XQuery and more...

<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456


Navigation

Papers

SQL Server XML Whitepapers

Weblogging Links

MS Bloggers

Recommended Books

Other Blogs

Recommended Links

Presentations (Upcoming)

Presentations (Recent)

Subscriptions

News


Upcoming Presentations


TechEd 2007, Orlando, June 4 to June 8, 2007


Books I co-authored



www.flickr.com
This is a Flickr badge showing public photos from Michael Rys. Make your own badge here.
eXTReMe Tracker

Post Categories

Article Categories



Monday, May 10, 2004 - Posts

Some comments on “Kiss the Middle-tier Goodbye with SQL Server Yukon”

In March, Klaus Aschenbrenner published an article on SQL Server 2005's XML support titled “Kiss the Middle-tier Goodbye with SQL Server Yukon” (not that I would recommend that :-)). It is a pretty good, general overview of the Beta1 functionality. Since I started to comment on such articles, let me add some comments and clarifications to this article. I will leave out nitpicks on missing parens in the code and typos, and point out some differences introduced in Beta2.

This article, as many others, has the potential of introducing some of the previously mentioned confusions. Here are some additional, technical comments:

  1. XML Schema management has changed in Beta2. Instead of registering a single XML Schema and using its target namespace to constrain an XML datatype, you will create an XML Schema collection instead that can contain more than one schemata using a slightly modified DDL statement of the form CREATE XML SCHEMA COLLECTION [SchemaColl] AS .... The XML column can then be constrained with the collection name such as XML(SchemaColl).
  2. I don't know what the value of the table mapping XSD to SQL datatypes is. The value() method casts to the given SQL datatype. That table may indicate an internal storage representation, but it is not useful for what you can cast a value to. For example, you certainly can cast an xs:datetime value to a SQL string type and do not need to map it to a varbinary.
  3. The XML DML statement “update ... to ... “ has been changed for grammar reason's in Beta2 to be “replace value of ... with ...”. Also, stricter static typing rules have been added. And the original “update value of” had the same semantics as “update”.
  4. Regarding the mentioned limitations:
    1. XML can't be casted to text or ntext.
      This is by design. TEXT and NTEXT are being deprecated and replaced by (n)varchar(max). So there is no need to cast to these two types.
    2. An XML column can't be part of a primary—or foreign key constraint.
      Correct. However note, that this would require a clear understanding of when two XML instances are the same. A topic that even the XQuery WG is struggling with.
    3. Only strings can be casted to XML.
      Starting with Beta2 (even with earlier internal builds), you can also cast binary types to XML (this allows better transport of instance based encoding information).
    4. An XML column can't be used in a GROUP BY statement.
      This as 2 (and 5 if I understand it correctly) all come from the fact that we have no comparison operation defined on the XML datatype. And neither has the ISO SQL-2003 standard. The reason: there is no clear agreement how such an operation should be defined. Should you include or exclude comments? PIs? Should you compare typed values or string values?
    5. An XML column can't be part of an index.
      Obviously you can define an index on the XML column. And since there is no comparison defined, you cannot use it as an index key.
    6. It is only possible to create 32 XML columns per table.
      We never had such a limit.

The XML schema limitations:

    1. Annotations (like comments) are not stored in the metasystem of Yukon.
      This is correct. Although we plan to support this in a future version.
    2. The XSD ID attribute is not supported.
      This is incorrect. We support ID/IDREF(S). We do not currently support key/keyrefs.
    3. Default values can't be longer than 4000 unicode characters.
      This is correct. If there is a need for longer default values, we will look into adding support in a future release.
    4. XML schemas can't be converted to their origin state. Therefore, you need to manage XML schemas separately.
      They can be converted into a schema that is equivalent to the original one w.r.t. the supported validation. Support of the XML schema collections as collections of XML data type instances representing the original schemata will be looked at for a future release (where concepts may be added that would make this a more natural mechanism).

In summary, Klaus' article is a good overview on the SQL Server 2005 Beta1 XML functionality. My comments above point out some small mistakes and provides some information about where we improved the design from Beta1 to Beta2. Thanks to Klaus for his good write-up.

 

posted Monday, May 10, 2004 9:15 PM by mrys with 6 Comments

Putting XML support in SQL Server 2000 into perspective

Many articles I have recently read start out with saying how clumsy the XML support in SQL Server 2000 is. While this is somewhat true if you want to store XML natively, we need to be fair: The reason for that is that SQL Server 2000's design goals were to take the first step in a series of steps to XML-enable SQL Server.

SQL Server 2000 focused on enabling relational to XML publishing and on providing shredding of relational data in XML form back into tables. In doing so, SQL Server 2000 addressed the main XML use case of relational database users in the 1999 to about the years 2001/2 which was to publish their existing relational data as XML and to consume structured data published as XML in order to build loosely-coupled systems. This scenario does not need to store XML natively, the primary use case for the XML data type. 

Programmers have become more confident in their use of XML and now often want to store XML data that does not lend itself to the relational model and they want to store it natively. Thus, we - as well as our competitors - are adding native XML storage support in the database. For example, the ISO SQL-2003 standard defines an XML data type (which corresponds to the XML data type in SQL Server 2005).

By adding the XML data type (and the related technologies), SQL Server 2005 provides the next (but by far not the last) step in enabling full XML integration into SQL Server and thus making it into a general data management platform (together with WinFS, CLR objects, relational data etc.). More steps are forthcoming, both in SQL Server and the industry. For example, see the current activity in the ANSI/INCITS SQL standardization group, or read my book chapter in the XQuery from the Expert book :-)).

Adding an XML data type however does not mean that the SQL Server 2000 XML-relational mapping support is now obsolete. Instead, the SQL Server 2000 features are still an important aspect of providing XML support in SQL Server. Heck, SQLXML is planning to take over SQL Server :-). They may be clumsy if used to addressed the scenarios that they were not designed to address, but they still excel in their intended domain.

In summary, the step from SQL Server 2000 to 2005 has to be seen as an evolution and not a revolution. And the evolution continues!

posted Monday, May 10, 2004 8:40 PM by mrys with 5 Comments

What to call a conforming instance of an XML datatype

In my series on potential areas of confusions, let me cover one that has more do to with terminological correctness: What is the term for a valid instance of an XML datatype that is not constrained by an XML Schema collection?

Many writers that I have read over the last couple of weeks used the term “well-formed” (and I would not be surprised if we used the same term in some of our own literature). The term “well-formed” is an important term in XML 1.0:

2.1 Well-Formed XML Documents

[Definition: A textual object is a well-formed XML document if:]

  1. Taken as a whole, it matches the production labeled document.

  2. It meets all the well-formedness constraints given in this specification.

  3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

Document

[1]    document    ::=    prolog element Misc*

Matching the document production implies that:

  1. It contains one or more elements.

  2. [Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.] For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

However, an XML datatype according to the ISO SQL-2003 standard allows any valid element content at the top. Basically it can match:

XML datatype ::= prolog  content

Obviously, this allows instances that are not well-formed according to the definition above (such as top-level text nodes, zero or more than 1 top-level element). So what should we call such XML fragments?

Some terms that I have heard and used myself over times are:

1. XML fragment
2. well-balanced XML
3. well-balanced XML fragment
4. XML content
5. XML datatype instance

Now 5 is a no-brainer, but then it is tautological :-).

Please feel free to express your preference and propose your own terms.

posted Monday, May 10, 2004 8:12 PM by mrys with 2 Comments

Confusing XML Namespaces and XML Schemata

I have been reading lots of articles and reviewed several upcoming book chapters for books about SQL Server's XML support and other XML topics and came across a couple of potential areas for confusion.

Many people confuse the notion of an XML namespace and an XML Schema. And articles that say things like "The column is a typed XML column. Because of this, you need to provide the XML namespace declaration in the XQuery statements." do not help to clear this confusion.

An XML namespace is a syntactic mechanism to provide scoping of element and attribute names in XML markup. A namespace does not have to be associated with a schema.

An XML schema describes the structure of an XML document. As such, it can be used to describe the structure of the vocabulary for a given namespace, but it does not need to (the target namespace may be the empty namespace). Assuming that we define the empty namespace as a special namespace, we can say that every XML Schema describes a namespace, but not every namespace needs a schema.

Also, the XQuery prolog's namespace declaration is not needed for schema import (they are imported implicitly in SQL Server 2005), but just to declare a prefix that can be used in the query to refer to the namespace (regardless of whether there is a schema associated with it or not).

Is that clearer? Please let me know.

posted Monday, May 10, 2004 7:50 PM by mrys with 1 Comments

TechEd 2004 San Diego is coming closer

TechEd 2004 San Diego is getting nearer. I am currently polishing my slides and my demo code. I decided to first show SQL statements demoing the XML capabilities (using SQL workbench) before using summary and detail slides to recap and show one or two details that I did not cover during the demos. In addition, the decks will contain some hidden slides giving an example of a statement per feature. So the presentation should be about a 50/50 mix of slideware and live demo.

My first presentation, DAT319 (click on link to download a calender schedule), will cover the new server-side XML functionality in SQL Server 2005: The XML datatype, XML Schema collections, FOR XML etc, and provide some guidelines when to use XML vs relational data.

My second presentation, DAT327, will be in the last session slot on the last day. I hope that the topic of XQuery will lure enough people to stay until the end :-). My co-presenter, Arpan Desai, and I will present an overview of XQuery and how it will be made available both inside SQL Server 2005 (my part) and in the .Net Framework (Arpan's part).

Addison-Wesley agreed to send me a couple of XQuery books (XQuery from the Experts and Michael Brundage's XQuery book) that I will use as prizes for the best question asked during my presentations as judged by the audience and the presenters.

For the bloggers among us, there is a TechEd Blog aggregator (RSS) available that I just signed up to. They only seem to have either Architect or Developer, but not Program Manager categories. So I signed up as an Architect, since I have not written any large amount of code in a while :-).

See you all in San Diego.

posted Monday, May 10, 2004 7:27 AM by mrys with 2 Comments




Powered by Dot Net Junkies, by Telligent Systems