In the first installment of the posting series on the XQuery Type System, we looked at the general concepts and terminology. In this posting, we will look at the XQuery type system a bit more closer to see which types exist and how they fit into the different categories of types. In upcoming postings, we will discuss the impact of the type system on the XQuery language (the Sequence Type production and the type-based operations), the reasons for dynamic and static typing, and finally, how XQuery in SQL Server 2005 will implement the type system and how I think we should evolve our type implementation and behaviour over time.
As always, please feel free to ask questions or comments by replying to this posting.
Type Hierarchy Overview
The following image that I did for the XQuery specification shows the type hierarchy (it is now copyright W3C). The current version is the Nov 2003 last call draft version. There has been two changes in that xs:anyType has become also a concrete instance type (see below for the reasons) and xdt:untypedAny is now being called xdt:untyped.

As you can see, the type hierarchy does not really have a single root. This is because XQuery (and XML Schema) has the notion of item types (node and atomic types) and structure types. The node types such as elements and attributes have a name and a structure type. The atomic types (such as xs:integer) are also structure types and then we have the remaining structure types that are either complex types or simple list or union types. The most general supertype of the item types is called item() (using the XQuery sequence type syntax) and the most general structure type is called xs:anyType. Let's take closer look...
Node Types
Every one of the node types is an item type. The generic node type is called node().
Some of them have names (attribute(), element()), others don't (text(), comment(), document-node(), processing-instruction()). Actually a processing instruction has a name, but it is not part of the static type, it is a runtime check instead.
Some node types have a structural type component. They are element(), attribute() and document-node().
Node types are strongly typed towards each other. This means if your operation expects an attribute and gets an element, a type error is being raised. Node types however are partially weakly typed towards the atomic types. This behaviour is also called atomization in the XQuery spec and basically means that if a function requires an atomic type (such as xs:integer), and a node type is being passed, the typed-value of the node is being extracted. Then the type is being handled based on the relationship of the type of the typed value and the required type (see below).
Structure Types
There are three main categories of structure types: atomic types, simple types and complex types. They are all named (if XML Schema has an anonymous type, a name is invented) and named typing is being used to determine the subtyping relationship.
Atomic Types
Atomic types are not only structure types but also item types (thus they have two most general supertypes, depending on how they are being used). If they are being used as a structure type, they always appear as a structural component of a node type, such as element(name, xs:integer), where xs:integer is being used as a structure type. If they appear on their own in a sequence type production (such as xs:integer), they serve as an item type. The most specific, abstract atomic type is xdt:anyAtomicType, that can be used in function signatures and type tests to check for any atomic type.
Most atomic types are strongly typed (while allowing subtypes to substitute). The exceptions are as follows:
- The so-called untyped atomic type (xdt:untypedAtomic) is weakly typed in that if it is passed to a required type, most of the time, it is converted to the required type (or an operator specific type for some built-in operations). However, other types are never implicitly converted to xdt:untypedAtomic in XQuery except in construction (see the XQuery spec). See the section on untyped types below for some more information about this type.
- The so-called numeric types (xs:decimal, xs:float, xs:double) provide a type promotion hierarchy that treats xs:decimal (or any of its subtypes) as weakly typed towards xs:float or xs:double and treats xs:float as weakly typed towards xs:double.
- The type xs:anyURI will most likely become weakly typed towards the type xs:string (in the next version of the XQuery spec).
Simple Types
Every atomic type is also a special case of a simple type and thus a subtype of xs:anySimpleType. However, there are other simple types: list and union types. Unlike atomic types, these simple types are named structure types that can only appear as a structural component of one of the node types and cannot be used as an item type.
Atomization of a node type with such a structure type will lead to a type expression that defines the simple type. For example, atomization of a type element(name, xs:IDREFS) will lead to the type xs:IDREF+, which means a list of at least one instance of the atomic type xs:IDREF.
Non-atomic simple types are all strongly typed.
Complex Types
Finally, every simple type and all so-called complex types derive from the most general structure type xs:anyType. All complex types, except for xs:anyType and xdt:untyped are user-defined types that have been defined in an XML Schema.
All complex types are structure types and thus can only appear as structural components of node types. As such they are all strongly typed.
The outcome of atomization of node types with complex type structures depends on the complex types:
- If the type is xdt:untyped, the atomized value will have the type xdt:untypedAtomic.
- Static typing will make atomization of any other complex type a type error.
- The current last call working draft allows atomization of mixed content complex types during dynamic typing, returing an instance of xdt:untypedAtomic. However, based on last call comments, there are discussions going on, whether it should instead be of the stronger type xs:string or whether it should be a type error. Please let me know what you prefer!
Untyped vs Typed
Finally, let me quickly address the function of the two types xdt:untyped and xdt:untypedAtomic and the relationship of xdt:untyped to xs:anyType.
xdt:untyped is a type that denotes the structural content type of elements that have not been validated. This type gives a guarantee, that any element or attribute, that may appear further below in the tree, will also be untyped/not validated. This is different from xs:anyType, which indicates that the given element does not have a more precise, known type, but that elements or attributes below it may indeed have a known type other than xdt:untyped/xdt:untypedAtomic (although there is no guarantee). xs:anyType is basically used to indicate partially validated structure such as in a lax validation section. While this distinction seems somewhat complex, it is needed to guarantee type safety when performing static typing.
xdt:untypedAtomic is the type given as the structural type of an attribute node that has not been validated, or for which no precise type information beyond xs:anySimpleType could be found. In this, it is not quite the same as xdt:untyped, since a validated attribute node actually may get the type xdt:untypedAtomic. The XQuery working group felt, that since there is no way to preserve the actual instance type for an attribute node where the schema declares it as type xdt:untypedAtomic in the serialization, we can as well map the instance type to each other. Thus, in the XQuery type system, the type xs:anySimpleType becomes a pure abstract type without an instance having this type as its most specific type.
Both untyped types serve as the bridge to support weakly typed semantics of XQuery and XPath 2.0 on non-validated, untyped XML data and thus provide meaningful and useful query semantics even in the absence of XML Schema or otherwise typed data.