Implementing XQuery Cardinalities
June 2, 2006
How cardinalities is implemented is one of the most influential parts of an XQuery implementation's type system. Patternist, KDE's XQuery 1.0/XSL-T 2.0/XPath 2.0, is about to get its cardinality handling rewritten for the second time.
What is a cardinality in the context of XQuery? In the XQuery/XPath Data Model, data, which always is a sequence of "items", has a type, a so called sequence type, which consists of two parts: an item type, and a cardinality. An item type, for example xs:integer, describes what kind each item is, while the cardinality describes the size of the sequence. For example, the following sequence type:
xs:base64Binary+
describes a sequence of items where each item is a chunk of xs:base64Binary data, and where the sequence's length is one or larger. text()?, to take a second example, describes none or exactly one text node.
In the XPath 2.0/XQuery 1.0 language, sequence types are used for casting between different types, declaring a function's argument(s), and so on. Behind the curtains of an implementation, cardinalities has a more visible role than in the language, because it is the life and bread of the type inferencing and optimization, areas XQuery was explicitly designed for.
Originally, Patternist's implementation was rather similar to Saxon's implementation. By being an enum with the flags Empty, ExactlyOne, TwoOrMore OR'd together to create the cardinalities zero-or-one, exactly-one, zero-or-more, it was very simple, capable, and performance-wise fast.
But after having brought it to use, I was nevertheless not very happy with it. Bitwise-operators aren't very readable(and in C++, easily error prone) and if one wants to build functionality around it one must do it in a functional way, since object oriented thinking is a no go. So I switched the cardinality handling to a set of singleton instances of a class that used an enum internally and object-address comparison for equalness testing. This surely increased readability. However, it was probably a bit slower, the implementation messy, and it was not more expressive than using an enum directly.
Most regular expression-dialects allows cardinalities to be expressed in ranges, instead of just the kleene operators(+, ?, *). XQuery doesn't allow ranges to be expressed(but perhaps it will in a later version). Nevertheless, there are two areas where ranges are useful: in data that is modelled with W3C XML Schema, and inside the implementation.
So, I sketched a third incarnation, which allows arbitrary ranges to be expressed. It has primarily two advantages: it is sufficiently expressive to handle data types in W3C XML Schema if we ever get such a stack, and it opens up for a whole range of optimizations and helpful compiler messages. For example, the following expression:
for $i in (1, 2, 3) return ($i, $i + 1)
would have the static type xs:integer[6], which is quite a bit more exact than "one or more integers".
And that would rock.