An Introduction to the XQuery (and XPath 2.0) Type System: The general concepts
One of the most controversial, and in my opinion most often misunderstood aspect of XQuery 1.0/XPath 2.0 is their relationship to types. Many people seem to understand that the type system is based on the W3C XML Schema language (which is true) and then conclude that it is complex (which is only partially true). Also, too many people mix up the notions of weak, strong, static, and dynamic typing. In this and some following posts, I would like to bring some order into the confusion and explain how the type system in XQuery and XPath 2.0 works. First we will introduce a whole bunch of terms and talk about the general concepts. In subsequent posts I will look at the actual XQuery types and some examples and scenarios for weak, strong, static, and dynamic typing.
Terminology
First let's define some terms.
The actual type of a value is the type that a value possesses at runtime.
The inferred type of an expression is the type that a type inference system has inferred based on the types of the input arguments and the type rules of the operations forming the expression.
The required type is a type that an operation expects for one of its arguments. For example, + expects, that each of its arguments is a numeric type.
A type A is a subtype of a type B, if all values of type A are also instances of type B. For example, in the XQuery type system, xs:int is a subtype if xs:integer, since every xs:int value is also an instance of type xs:integer. A subtype can be determined based on structure or name. The first is also known as structural typing, the second as named typing.
Structural typing means that if a type S has the structure {a,b,c} and a type T has the same structure as S, they are considered to be the same. And subtyping is determined whether an instance matches the structure of both types.
Named typing means that a type S with structure {a,b,c} is not the same type as any other structurally equivalent type with a different name. Subtyping is determined by the relationships of named types. For example, in the XQuery type system, S only a subtype of T, if S has been explicitly associated as a derived type to the type T, regardless of their structure.
A weak type is a type that implicitly coerces into the required type of the operation that is being applied to the value. In the XQuery type system, the type xdt:untypedAtomic, that is used to represent the type of atomic values that have not been associated a type through the validation process (an untyped atomic value), is a weak type. Adding 1 to an untyped value 41 will - depending on the coercion rules of the operator - result in the value 42.
A strong type is a type that leads to a type error unless it is a subtype of the required type of the operation that is being applied to the value. In the XQuery type system, the type xs:string is for example a strong type, that can only be used where a value of type xs:string (or a supertype thereof) is required. Adding 1 to a string value 41 will raise a type error.
A promotable type is a strong type that for certain required types acts like a weak type by implicitly coercing into the required type. For example, in the XQuery type system, the numeric types xs:decimal (and its subtypes) and xs:float are promotable if the required type is another numeric type (xs:float, xs:double). For example, passing the integer value 42 to a function requiring a value of type xs:double, the value 42 will be promoted to the double 42e0 and be passed to the function.
Dynamic typing means that type errors are being detected at runtime by checking the actual types of the values against the required types (the XQuery spec uses the term that the value needs to match the required type).
Static typing means that type errors are being detected during compilation by checking that the inferred type is a subtype of the required type. There are two categories of static typing: conservative static typing and - what I call - partial static typing.
Conservative static typing means that all type errors are raised statically and that they are raised if the inferred type is not a subtype of the required type. This means that even cases that may sometimes or even most but not all of the time work at runtime may be blocked. For example, a function that requires a type T for its argument will statically fail if the inferred type is an optional type T (meaning the empty value may be passed as well).
Partial static typing on the other hand is more lenient. It only raises the the errors statically, if there is no case at runtime where the operations would not raise a type error as well. Thus it checks that the intersection of the inferred type and the required type is empty. All other type checks are deferred to runtime. Since this leads to the same type errors as dynamic typing, partial static typing can be seen as an optimization of the dynamic typing case, where type errors that can be detected statically will be reported early.
What is XPath 1.0's type system?
XPath 1.0 had a very limited type universe without subtyping. Thus the distinction between named vs structural types does not apply. The types themselves were all weak types that would automatically coerce, sometimes even based on how the instance looked like.
So what does XQuery 1.0's and XPath 2.0's type system provide?
XQuery's type system is based on named typing. It provides weak types, promotable types and strong types and provides implementations the choice to either implement dynamic typing or conservative static typing with partial static typing an optimization option (which therefore is not explicitly called out as an option in the XQuery specification).
XPath 2.0's backward-compatibility adds the ability that some type errors are being recovered from and thus make strong types to behave weak to be closer to the weak typing semantics of XPath 1.0.
In the next posting, we will look at the XQuery type system a bit more closer to see which types exist and how they fit into the different categories of types. We will also look at the reason's for dynamic and static typing and finally, how XQuery in SQL Server 2005 will implement the type system and how I think we should evolve over time.
Stay tuned and send me your questions in the mean time.