XQuery design controversy: What should element(foo) mean in SequenceType
During the last call phase of XQuery, many interesting issues have been brought up. Some of which I agree with and some that I don't.
One of the most controversial ones that I care about is the semantics of the so-called SequenceType production element(foo). The comment is MS-XQ-LC1-041. A discussion about that topic starts here. In the following, I will give you a quick introduction to the issue and urge you to provide the feedback on which design you prefer as the general XQuery user by either commenting here, sending me an email, or contributing to the discussion in the W3C forum. Questions are welcome as well.
If I see an expression such as
/a/*[. instance of element(b)] (: E1 :)
or
(: E2 :)
declare function foo($x as element(b)?) as element(c)?
{ $x/c }
foo(doc(“some-uri“)/a/b)
what would you expect?
Would you expect this to work if you operate on a non-validated document?
I would. I would for the following reasons:
1. The simpler the semantics of the construct, the simpler the language syntax should be and vice versa.
2. Users that operate only on untyped (non-validated) documents should not be forced to think about types.
However, the W3C spec currently says (I removed some bugs and some non-essential wording, note point 3):
An
ElementTest may take one of the following forms:
-
element(), element(*), and element(*,*) match any single element node, regardless of its name or type.
-
element(ElementName, TypeName) matches a given element node if:
-
the name of the element node matches ElementName, and:
-
type-matches(TypeName, AT) is true, where AT is the type of the given element node. However, if the given element node has the nilled property, then this rule is satisfied only if TypeName is followed by the keyword nillable.
For this form, there is no requirement that ElementName be defined in the in-scope element declarations.
Example: element(person, surgeon) matches an non-nilled element node whose name is person and whose type annotation is surgeon.
Example: element(person, surgeon nillable) matches an element node whose name is person and whose type annotation is surgeon, and permits the element node to have the nilled property.
-
element(ElementName) matches an element node if:
-
the name of the element node matches ElementName or matches the name of an element in a substitution group headed by an element with the name ElementName, and:
-
type-matches(ST, AT) is true, where ST is the simple or complex type of element ElementName in the in-scope element declarations, and AT is the type of the given element node. However, if the given element node has the nilled property, then this rule is satisfied only if ST includes the nillable option.
Example: element(person) matches an element node whose name is person and whose type matches the type of the top-level person element declaration in the in-scope element declarations.
-
element(ElementName, *) matches an element node of any type if the name of the element matches ElementName or matches the name of an element in a substitution group headed by an element with the name ElementName.
For this form, there is no requirement that ElementName be defined in the in-scope element declarations.
Example: element(person, *) matches any element node whose name is person, regardless of its type.
-
...
So this means that the above examples E1 and E2 would only work with a declaration of a global element b loaded into the ISSD (the static type context) and the expressions that are passed need to return an element that is validated with that ISSD element declaration.
Now maybe I don't understand language design, but this definition clearly violates the two design principles I listed above. Instead you have to write either element(b, *), element(b, xs:anyType) (both meaning any element with any type) or element(b, xdt:untyped) (only unvalidated elements).
I would prefer if we get the following rules, that adhere to the two design principles outlined above, make it syntactically clear in all (and not just some cases as today) when one depends on the static context for element name matching and incidentially simplifies the syntax somewhat by not requiring a wildcard on types anymore.
The ISSD unaware element tests, i.e., for these forms, there is no requirement that
ElementName be defined in the in-scope element declarations, are:
- element() match any single element node, regardless of its name or type.
- element(ElementName) matches a single element node, if the name of the element node matches ElementName.
- element(ElementName, TypeName) matches a single element node if:
- the name of the element node matches ElementName, and
- derives_from(AT, TypeName) is true (AT is derived from TypeName), where AT is the type of the given element node. Note that since we do not access the ISSD to look up an element declaration, we cannot require an element to be nillable.
Example: element(person, surgeon) matches an element node whose name is person and whose type annotation is surgeon or a derived type thereof.
- element(*, TypeName) matches a single element node regardless of its name, if derives_from(AT, TypeName) is true (AT is derived from TypeName), where AT is the type of the given element node.
Example: element(*, surgeon) matches any non-nilled element node whose type annotation is surgeon, regardless of its name.
The following element tests require that the ElementName is defined in the in-scope element declarations. In order to call out the relationship to the tests in 2 and 3, the case of a single global element name is called out explicitly.
- element(global ElementName) [Alternate syntax could be element(/ElementName)] matches a single element node if:
- the name of the element node matches ElementName or matches the name of an element in a substitution group headed by an element with the name ElementName, and.
- derives_from(AT, ST) is true, where ST is the simple or complex type of element ElementName in the in-scope element declarations, and AT is the type of the given element node. However, if the given element node has the nilled property, then this rule is satisfied only if ST includes the nillable option.
Example: element(global person) matches an element node whose name is person or has an element name that is a member of the substitution group of person and whose type is derived from the type of the top-level person element declaration in the in-scope element declarations.
- element(global ElementName, TypeName) matches a single element node if
- the name of the element node matches ElementName, or matches the name of an element in a substitution group headed by an element with the name ElementName, and
- derives_from(AT, TypeName) and derives_from(TypeName, ST) are both true, where ST is the simple or complex type of element ElementName in the in-scope element declarations, and AT is the type of the given element node. However, if the given element node has the nilled property, then this rule is satisfied only if TypeName is followed by the keyword nillable.
Example: element(global person, surgeon) matches a non-nilled element node whose name is person or has an element name that is a member of the substitution group of person and whose type annotation is surgeon or a derived type thereof.
While this proposal introduces a new global ISSD indicator (either global or /), it gets rid of the * in the type position. It also makes it syntactically clear when an ISSD context is requested and integrates with the notion of the schema context (which is an issue that I will write about at a later time).
Since Jonathan Robie has argued against this proposal and does not believe that users will find this simpler than the current specification, I would like to ask you-asprospective users-to provide me with your comments and preference.