Yukon's XML type and Full Text Indexing (Search)
I had to do a double take during Michael Rys's presentation on the XML Data Type in Yukon. Suppose you have a XML-typed column and you want to perform some content or free-text (full text) searching on it. The classic example of this might be loading the XML version of a Word Documents representing resumes into an XML Typed column, then wanting to do various types of searches on the resumes for patterns of words. More likely than not, XPath and XQuery won't be efficient means to that end, but full-text searching would be. However, the indexing process ignores attributes values.
At the heart of the issue is the document-centric XML vs Data-Centric XML debate when it comes to the use of elements and attributes. I asked Michael about this after the session. His point, if I'm recalling it correctly, is that in document-centric XML, attributes aren't normally of primary interest, since they used to either indicate or modify meta data. It is unlikely that anybody really would be interested in doing a full-text search on that kind of data. But that's not always the case. Consider the following schema, then suppose you want to search on the type attribute. This could happen if the type of types is not constrained in a meaningful way.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element ref="street"/>
<xs:element ref="routing"/>
<xs:element ref="local"/>
<xs:element ref="region"/>
<xs:element ref="postalCode"/>
</xs:sequence>
<xs:attribute name="type" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="local" type="xs:string"/>
<xs:element name="postalCode" type="xs:int"/>
<xs:element name="region" type="xs:string"/>
<xs:element name="routing" type="xs:string"/>
<xs:element name="street" type="xs:string"/>
</xs:schema>
Now, I agree that you could make the type attribute an child element of the address element, but suppose that's not an option since adhering to the schema is paramount here. The next best choice? If you really need to do this kind of search, consider extracting the attribute value of interest into a simple scalar string value (use an appropriately sized nvarchar, for example) and do you searching on that. While denormalization can be considered a sin, here it may make some sense.
This isn't going to be as functional as FTS where there is a metric known as "rank" that computes the relative "goodness of fit" of an item to a search phrase. When you denormalize the attribute like this, the underlying factors that are considered when computing this rank are lost, so the ranking computation loses much of value. I'm not sure if there is a good workaround for this affect.
Yes, this is a bit of contrived example: You probably would constrain the allowed values for the address type so you could preform normal T-SQL type filtering rather than FTS in the first place. Ah, if only all such things could be discovered and addressed in design...