Michael Rys

Musings on XML, XQuery and more...

<December 2008>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910


Navigation

Papers

SQL Server XML Whitepapers

Weblogging Links

MS Bloggers

Recommended Books

Other Blogs

Recommended Links

Presentations (Upcoming)

Presentations (Recent)

Subscriptions

News


Upcoming Presentations


TechEd 2007, Orlando, June 4 to June 8, 2007


Books I co-authored



www.flickr.com
This is a Flickr badge showing public photos from Michael Rys. Make your own badge here.
eXTReMe Tracker

Post Categories

Article Categories



An Introduction to the XQuery (and XPath 2.0) Type System: The Impact on XQuery and XPath

In the first two installments of the posting series on the XQuery Type System, we looked at the general concepts and terminology and at the XQuery types. In this posting, we will discuss the impact of the type system on the XQuery/XPath language (the Sequence Type production and the type-based operations), and the reasons for dynamic and static typing and how they relate. In the last part of the series, we will finally see how XQuery in SQL Server 2005 will implement the type system and how I think we should evolve our type implementation and behaviour over time.

Impact of the type system on XQuery/XPath

XQuery and XPath 2.0 provide several type related expressions:

  1. Type assertions (XQuery only):
    Every variable declaration (e.g., as global variable or a variable bound by let, for, some, or every) has an optional “as Type“ clause that allows a query writer to assert that the inferred type of the result of the expression that is bound to the variable has to be a subtype of the asserted (or using our earlier terminology: required) type. If no type assertion is being made, the variable's type will be the same as the inferred type.

    Parameter types and function return types also type assertions that define the required types for the input and return respectively.
  2. Type inspections:
    The instance of (XQuery and XPath) and typeswitch (XQuery only) expressions allow runtime inspections of an item's dynamic type. castable (XQuery and XPath) on the other hand allows to check whether an atomic value can be cast to the indicated type. For the actual semantics of these operators, I refer to the W3C XQuery specification.
  3. Type casting:
    XQuery/XPath also provides three (or four, depending on how you count) mechanisms to change either the static or dynamic type of an expression or value.

    The first expression is called treat as (XQuery and XPath). This does not affect the dynamic type or value, but changes the static type of the expression to be the indicated static type and guarantees that at runtime, an error will be raised if the value is not an instance of that type. This is useful in a statically typed system when a more precise type needs to be provided that the static typing has inferred.

    Unlike treat as, which can take any item or atomic type as argument, the casting expression cast as (XQuery and XPath) can only take a concrete atomic type as its type argument and it actually will change the value and the dynamic type of the instance besides guaranteeing the static type of the expression. cast as comes in two flavors of cardinality: Expr cast as T which guarantees that the result is not empty and Expr cast as T? which allows the empty result.

    A semantic sibling to the cast as T? expression is the notion of the constructor functions for atomic values (XQuery and XPath). For every concrete atomic type (currently with the exception of the type xs:QName and any of its derived types) that is available in the static processing context, there exists a constructor function of the same name as the type (including the namespace) that takes an atomic value and will return the value cast to the specified type. This includes all the atomic types that are implicitly or explicitly imported via schema import. Some people may as why then do we have both syntaxes. The reason has to do with the name resolution of casting function names without a prefix. Since these are functions, the default function namespace will apply. So in order to cast to a type without a given namespace, you need to use the explicit cast as syntax.

    Finally, the validate expression (XQuery only) will retype whole subtrees according to their validation semantics. Note that as part of the last call feedback, element construction does not perform implicit validation anymore but provides the option between untyping and preserving the types of contained nodes (but not atomic values). Also, validation got simplified by removing the schema context specification.

All these type expressions (except for validate) takes a type expression as one of its argument. The XQuery/XPath specification calls it the SequenceType expression. As promoted in my earlier posting on the XQuery design controversy, the syntax and semantics of the element and attribute node type expressions in the SequenceType have (fortunately) changed. I will hopefully find some time in another posting to explain the new syntax and semantics. In any case, the SequenceType allows the user to specify a named item type (either a node type with name and structure information or an atomic type name) with an optional occurrence indication.

As we can see, XQuery and XPath have been extended with many type related functionality. But that is not the only impact we can observe. Many of the operations have specific, type-appropriate behaviour in that it provides strong type semantics for the explicit given types, automatic type promotion, and weak typing behaviour in the presence of untyped atomic values as discussed in the previous post.

Dynamic vs Static typing

As mentioned earlier, XQuery provides implementers and the users the choice between a conservative static typing and a dynamic typing behaviour (which may make use of a partial static typing). It does this in a way that provides improved interoperability between a static and a dynamic typing implementation. It achieves that by mandating that a static typing implementation may be conservative in its detection and raising of static errors, but that if it returns a non-error result, it needs to return the same result as the dynamically typed implementation. This basically means that static type information cannot lead to a different result than dynamic type information.

While this seems like a reasonable design, it has some interesting consequences: All type-based decisions have to be dynamic. This means that an operation that accepts a static union type and has different semantics based on the members of the union has to defer its operational dispatch to runtime. For example, an expression such as (if ($cond) then 4.2e1 else 42) + 1 will be processed as follows by a static typing system: The static type inference for the if condition yields the union type xs:double | xs:integer. The addition will infer a static type xs:double | xs:integer since the dynamic semantics of a dynamically typed implementation (and thus the dynamic semantics of all XQuery implementations) will either return a double or an integer value depending on the Boolean value of the conditional. I expect that this requirement for dynamic dispatches will lead some performance oriented XQuery implementations to raise static errors instead of performing such dynamic dispatches (at least in the beginning).

So what are some of the advantages of the different approaches?

The advantages of the dynamic typing approach is that it can more easily deal with open-world type systems, where types may appear in instance documents, of which the static environment has no knowledge. It also makes it easier to write queries in situations where occurrence and type information at runtime is more in line with user-knowledge and -expectations than the inferred static type would be.

On the other hand, the static typing approach moves type checking from the runtime execution to the compile time phase. This can lead to much better performance and scalability of the query. Several of the static typing rules in XQuery also make it easier to catch programming errors. Besides the general type safety guarantee of a conservative static typing approach, XQuery provides the rule that if a static type empty() is being inferred (some exceptions exists to deal with expressions of the form ()), an error is raised. This allows for example to catch typos in path expressions that with dynamic typing cannot be detected (although partial static typing could help here).

Conclusion

XQuery and XPath 2.0 have been designed from the ground-up to work well with both typed and untyped data and mixtures thereof. However, there are still some type related features missing. For example, there is no support to cast to a named complex or non-atomic simple type, there is no type introspection and some of the more complex aspects of local, anonymous and structural typing have not been provided. Some of the missing functionality should probably never be made available, since the cost of complexity will be too big, other functionality may come in a future release after gathering more experience with the current functionality.

In the next (and most likely last) post on the XQuery Type System, I will describe how SQL Server 2005's XQuery implementation fits into the type system outlined so far and how I personally see our implementations and the type system evolve over the next decade.

posted on Saturday, June 12, 2004 9:34 PM by mrys


# An Introduction to the XQuery (and XPath 2.0) Type System: The Impact on XQuery and XPath @ Sunday, June 13, 2004 5:07 AM

A fairly high level introduction.

mrys

# An Introduction to the XQuery (and XPath 2.0) Type System: The Impact on XQuery and XPath @ Sunday, June 13, 2004 5:08 AM

A fairly high level introduction.

mrys

# Take Outs: The Omega Installment @ Sunday, June 13, 2004 8:54 PM

Take Outs: The Omega Installment

mrys

# Take Outs: The Omega Installment @ Sunday, June 13, 2004 8:57 PM

Take Outs: The Omega Installment

mrys




Powered by Dot Net Junkies, by Telligent Systems