Implementing XPath’s Builtin Functions

October 31, 2006

XPath 2.0/XQuery has, as many other languages, a set of builtin functions, that the implementation somehow needs to provide. One way is to implement them in the host language. Another approach is to directly use XPath constructs, to some degree. What approach is best?

The more academic correct question is perhaps: should a builtin function be an opaque, black box in the intermediate representation(IR) or should the implementation be a first-class citizen in the IR? Of course, the answer is not as simple as the question.

In Patternist, an XPath/XQuery implementation in C++, functions are exclusively coded in C++ as opposed to XQuery in some form. For example, for fn:resolve-uri() this function implements the core functionality(the comment is unfortunately true!):

Item::Ptr ResolveURIFN::evaluateSingleton(const DynamicContext::Ptr &context) const
{
    const Item::Ptr rel(m_operands.first()->evaluateSingleton(context));

    /* TODO: Verify that base and relative are valid xs:anyURI-values. */

    if(rel)
    {
        const Item::Ptr base(m_operands.last()->evaluateSingleton(context));
        return AnyURI::resolveURI(rel->stringValue(), base->stringValue());
    }
    else
        return Item::Ptr();
}

In addition this is run at compile time(of the query):

Expression::Ptr ResolveURIFN::typeCheck(const StaticContext::Ptr &context,                                        const SequenceType::Ptr &reqType)
{
    Q_ASSERT(m_operands.count() == 1 || m_operands.count() == 2);

    if(m_operands.count() == 1)
    {
        if(context->baseURI())
            m_operands.append(wrapLiteral(context->baseURI()));
        else
        {
            context->issueStaticError(i18n("No base URI was supplied and the static base "
                                                                   "URI is undefined."),
                                                            ReportContext::FONS0005);
            return Expression::Ptr(this);
        }
    }

    return FunctionCall::typeCheck(context, reqType);
}

And this snippet sets up the function signature, and declares that the cardinality of the return type can be infered from the first argument(among other things):

s = addNewFunctionWithProp("resolve-uri", 1, 2, CommonSequenceTypes::ZeroOrOneAnyURI,
                                                      Expression::EmptynessFollowsChild | Expression::RewriteToEmptyOnEmpty);
s->appendArgument(QLatin1String("relative"), CommonSequenceTypes::ZeroOrOneString);
s->appendArgument(QLatin1String("base"), CommonSequenceTypes::ExactlyOneString);

So, that’s the required code to set things up, adapt the function call based on how the user called, do the necessary error checking and infer the static types.

Another approach is to write as much as possible in XQuery. Here’s fn:resolve-uri() again, this time implemented in XQuery:

declare function fn:resolve-uri($relative as xs:string?,
                                $base as xs:string = fn:static-base-uri()) as xs:anyURI?
{
    if($relative)
    then if($relative castable as xs:anyURI)
            then if($base castable as xs:anyURI)
                    then internal:core-resolve-uri($relative, $base)
                    else error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:FORG0002'),
                                     concat($base, " is not a valid xs:anyURI value."))
            else error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:FORG0002'),
                             concat($relative, " is not a valid xs:anyURI value."))
    else ()
};

So, the only thing the hypotethical function internal:core-resolve-uri() would do, is to check if $relative is relative or absolute, and then resolve it against $base. The rest is taken care of from the XQuery code.

The disadvantage of the C++ implementation is that it is rather dumb. Even if it is statically known(XQuery jargon for compile time) that $relative will never be the empty sequence it will always test whether it is empty: if(rel) … . It will also always validate that the arguments are valid xs:anyURI-values no matter what that is known at compile time.

However, when the XQuery version is in the abstract syntax tree and possibly also inlined, all this changes because Patternist’s optimizations kicks in. The best-case scenario is a single call to internal:core-resolve-uri():

  • The if($relative)-test is removed if $relative‘s static type is exactly-one
  • The castable as xs:anyURI tests are removed based on the item type
  • The return type is automatically infered by the first if-expression
  • Defaulting to the static base URI is automatically handled by the function argument.
  • The body is automatically rewritten to the empty sequence if $relative is that too, even if $base isn’t. Not that this is a common scenario.

That’s for the advantages. Unfortunately, I see spots too:

  • It’s not trivial to implement. Each query acting as function-implementation would be stored in the binary with QResources and compiled on request. Getting error reporting right is a topic in itself.
  • The compilation step is extremely expensive. Instead of some simple function call binding, it now requires the parser to be invoked and the costs associated with having a larger AST tree, and what not.
  • The medium and worst-case scenarios can be slower. A simple conditional test in C++ is cheap. Having an AST node that performs the test instead, involving virtual call dispatching and what not, is incomparably slower.

What seems to shoot down this idea is that even if the generated C++ code is inefficient and this can rather easily be detected at compile time, it is in many scenarios faster than the correctly generated code. Brute force wins.

In the end, whether it’s faster at all, depends on what kind of queries that’s written. Primarily how well-typed they are.

There are many attempts to improve the compilation of XQuery queries. Michael Kay has gotten far and apparently rather successfully with compiling XQuery into Java code in Saxon. But unfortunately this move doesn’t seem to be one of the smarter ones. At least not in Patternist’s case. But perhaps if you generate Java code from the resulting compilation?

6 Responses to “Implementing XPath’s Builtin Functions”

  1. Vincent Says:

    Very interesting entry…

  2. Aron Stansvik Says:

    As always 🙂

  3. Michael Kay Says:

    Yes, it’s an interesting idea – not one that I’ve pursued. In XPath 1.0 Saxon started off with custom Java implementations of all the 25 or so functions – which was probably necessary as there was very little redundancy present. I’ve stuck to that approach in 2.0 even though there’s much more redundancy now. It would certainly be nice to find better ways of doing it; but it’s difficult in practice because there are so many opportunities for special-casing, e.g. in the static type inferencing. I just fall back to using inheritance and delegation and table-driven code as much as possible to get reuse of common logic where possible.


  4. Nice post indeed, its quite informative.

  5. Idetrorce Says:

    very interesting, but I don’t agree with you
    Idetrorce


  6. I’ve read some just right stuff here. Certainly price bookmarking for revisiting. I surprise how a lot effort you place to make this sort of wonderful informative web site.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: