Bob Beauchemin Again: Early XQuery schema validation
Bob blogs about XQuery again, pointing out that if the XQueries are issued against typed XML instances in SQL Server 2005 are validated before T-SQL Query Processing, potentially leading to interesting results. Following on from Yesterday's schema management post...
-- replace with some suitably named DB
use scratch
go
-- don't panic if these fail...
drop table dbo.badExample
go
drop xml schema collection dbo.schema1
go
-- start with a plain schema
create xml schema collection dbo.schema1 as
N'<?xml version="1.0" encoding="UTF-16"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="reading" type="xs:byte"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>'
go
-- Create an example, even if its bad
create table dbo.badExample
(
pkid tinyint identity(1,1) primary key,
doc xml(schema1)
)
go
-- inserting good data works
insert into dbo.badExample(doc) values (
N'<?xml version="1.0" encoding="UTF-16"?>
<data>
<reading>1</reading>
<reading>2</reading>
<reading>3</reading>
</data>')
go
-- Is this invalid or just false?
-- XQuery: There is no element named 'error'
select pkid,doc.exist('/error')
from dbo.badExample
go
-- Unbind
alter table dbo.badExample
alter column doc xml
go
-- Is this invalid or just false?
-- The Zero tells us lots...
select pkid,doc.exist('/error')
from dbo.badExample
go
The big questions here are:
- Why isn't this considered a bug?
- Why is this behavior a feature?
To answer the first question, you need to keep in mind that XQuery is, at its core, an XML constructor. When you XQuery on a typed instance, the results are typed to that instance. Even in this simple example, we're trying to construct a sequence of elements that wouldn't conform to schema of the underlying instance.
Update: As Bob points out and as BOL agrees, XQuery's return is always untyped shooting my assertion to pieces. Now that I think about it again, the output couldn't be schema-bound since one of the goals of using XQuery to, in a sense, transform from schema to schema. Duh! My Bad, Sorry
As to the second question: I suspect, but don't really know, that this happens because a typed "buffer" for intermediate storage of results is preallocated at the start of XQuery processing. When querying against a typed instance, that pre-allocation is based on the type of instance. When querying against untyped instances, a more generic buffer might be allocated. The more structured this buffer is, the better decisions a cost-based query processor could make. As soon as the "buffer" would break, you can give up on the whole query.
Update: I'm digging into why example does what it does.
Subble, but very cool indeed.