<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Profound Titles No One Gets</title>
	<atom:link href="http://englich.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://englich.wordpress.com</link>
	<description>A well-defined blog.</description>
	<lastBuildDate>Wed, 13 Apr 2011 21:46:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='englich.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Profound Titles No One Gets</title>
		<link>http://englich.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://englich.wordpress.com/osd.xml" title="Profound Titles No One Gets" />
	<atom:link rel='hub' href='http://englich.wordpress.com/?pushpress=hub'/>
		<item>
		<title>XML to QObjects: QXmlToQObjectCreator</title>
		<link>http://englich.wordpress.com/2008/10/23/xml-to-qobjects-qxmltoqobjectcreator/</link>
		<comments>http://englich.wordpress.com/2008/10/23/xml-to-qobjects-qxmltoqobjectcreator/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 08:44:29 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>
		<category><![CDATA[Qt]]></category>
		<category><![CDATA[QtXmlPatterns]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/?p=58</guid>
		<description><![CDATA[Thank you, to all who attended Dev Days 2008 in Munich. For me it was really great to talk to so many users and hear about all the baffling projects that people pull off with Qt. And of course, to hear how people use and what people need, in terms of Qt&#8217;s XML support. One [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=58&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Thank you, to all who attended Dev Days 2008 in Munich. For me it was really great to talk to so many users and hear about all the baffling projects that people pull off with Qt. And of course, to hear how people use and what people need, in terms of Qt&#8217;s XML support.</p>
<p>One customer told about how sub-classes of <a href="http://doc.trolltech.com/4.4/qobject.html">QObject</a> are used for representing data, and are converted to such from XML. So, why not add a little helper class to Qt for this?</p>
<p>The class, which currently only is a research idea and missed the feature freeze for Qt 4.5, is called QXmlToQObjectCreator, and hopefully the documentation explains it all:</p>
<p><a href="http://englich.files.wordpress.com/2008/10/qxmltoqobjectcreator.pdf">QXmlToQObjectCreator API Documentation</a></p>
<p>In other words, it&#8217;s a very simple class that builds a QObject tree corresponding to the output of <a href="http://doc.trolltech.com/qtextended4.4/qxmlquery.html">QXmlQuery</a>. The current sketched code is pasted <a href="http://rafb.net/p/n0tkM811.html">here</a>, for those interested.</p>
<p>In what way can this class be made more useful?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/58/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/58/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/58/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=58&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2008/10/23/xml-to-qobjects-qxmltoqobjectcreator/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>XSL-T and Qt</title>
		<link>http://englich.wordpress.com/2008/09/10/xsl-t-and-qt/</link>
		<comments>http://englich.wordpress.com/2008/09/10/xsl-t-and-qt/#comments</comments>
		<pubDate>Wed, 10 Sep 2008 14:35:53 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>
		<category><![CDATA[Qt]]></category>
		<category><![CDATA[QtXmlPatterns]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/?p=49</guid>
		<description><![CDATA[A couple of weeks ago, I merged the development branch for XSL-T into our main line, heading for Qt 4.5. The idea is that Qt will carry an XSL-T 2.0 implementation with as usual being cross-platform, having solid documentation, and easy of use. Using it is should straightforward. Either on the command line: xmlpatterns yourStylesheet.xsl [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=49&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago, I merged the development branch for <a href="http://en.wikipedia.org/wiki/XSL-T">XSL-T</a> into our main line, heading for Qt 4.5. The idea is that Qt will carry an XSL-T 2.0 implementation with as usual being cross-platform, having solid documentation, and easy of use.</p>
<p>Using it is should straightforward. Either on the command line:</p>
<pre>xmlpatterns yourStylesheet.xsl yourInputDocument -param myParam=myValue</pre>
<p>Or using the C++ API[1]:</p>
<pre>QXmlQuery myQuery(QXmlQuery::XSLT20);
myQuery.bindVariable("myParam", QVariant("myValue");
myQuery.setQuery("http://example.com/myStylesheet.xsl");
QFile out("outFile.xml");
out.open(QIODevice::WriteOnly);

myQuery.evaluateTo(&amp;out);</pre>
<p>See the <a href="http://doc.trolltech.com/main-snapshot/qxmlquery.html">documentation for the QXmlQuery</a> class on the overloads available for setQuery() and evaluateTo(), for instance.</p>
<p>However, due to the beast XSL-T 2.0 is &#8212; I agree that it&#8217;s larger than XQuery &#8212; we&#8217;ve decided to do this according to the &#8220;release early release often&#8221; approach. The first, in Qt 4.5, will carry a subset, and subsequently be complemented in Qt 4.6. The current status is documented in the main page for the QtXmlPatterns module, which can be viewed in the <a href="http://doc.trolltech.com/main-snapshot/qtxmlpatterns.html#features-and-conformance">documentation snapshot</a>.</p>
<p>Therefore, while the current implementation probably falls short on more complex applications(such as Docbook XSL), it can run simpler things, users can plan ahead, and we trolls can receive feedback on what features/APIs that are missing, and what needs focus. So feel free to do that: send a mail to qt-bugs@trolltech.com, or say hello on IRC(FransE, on Free Node).</p>
<p>The code is accessible through the <a href="http://trolltech.com/developer/downloads/qt/snapshots">Qt snapshots</a>.</p>
<h1>What is XSL-T anyway?</h1>
<p>XSL-T is a programming language for transforming XML into XML, HTML or text. Some implementations,  such as QtXmlPatterns or Saxon, provides mechanisms to map XML to other data sources and hence widens the scope of the language by letting the XML act as an abstract interface. Wikipedia has a good article on <a href="http://en.wikipedia.org/wiki/XSL-T">XSL-T</a>. Version 2.0 of XSL-T extends the language heavily by putting a rigid type system and data model in the backbone, and adds many features that was a pain to miss when programming in XSL-T 1.0. XSL-T 2.0 use XPath 2.0, and shares the same large <a href="http://www.w3.org/TR/xpath-functions/">function library</a> as XQuery.</p>
<p>1.</p>
<p>Over time, Java bindings through QtJambi and ECMAScript bindings through QtScript, will likely arrive.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/49/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/49/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/49/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=49&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2008/09/10/xsl-t-and-qt/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>QIODevice and QXmlQuery</title>
		<link>http://englich.wordpress.com/2007/12/11/qiodevice-and-qxmlquery/</link>
		<comments>http://englich.wordpress.com/2007/12/11/qiodevice-and-qxmlquery/#comments</comments>
		<pubDate>Tue, 11 Dec 2007 14:12:16 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/12/11/qiodevice-and-qxmlquery/</guid>
		<description><![CDATA[I have not yet seen an API for XQuery in which integrating the data model, atomic values, nodes and all, into the interfacing language has been a walk in the park. At the top of the list of things people tend to ask on the forums around is &#8220;How do I get XML represented as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=46&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have not yet seen an API for XQuery in which integrating the data model, atomic values, nodes and all, into the interfacing language has been a walk in the park.</p>
<p>At the top of the list of things people tend to ask on the forums around is &#8220;How do I get XML represented as a sequence of bytes in Java/C++ into my query?&#8221;, whose result is clear &#8212; a tree fragment for the query to operate on &#8212; but whose method for reaching is not that given if you ask me.</p>
<p>There is no &#8220;bytestream&#8221; type in XQuery. Should the user build the tree herself and then pass the tree to the query? Should the implementation in some voodoish way be instructed how to treat a string or custom type? Shouldn&#8217;t the query engine do it such that its scope of analysis is increased and its done the way it prefers it?</p>
<p>What I sense have been the problem with some solutions is that they mix the data, the bytestream, with interpretation.</p>
<p>In Qt this manifestate itself with that the content of a <a href="http://doc.trolltech.com/4.1/qiodevice.html">QIODevice</a> should appear in a <a href="http://doc.trolltech.com/main-snapshot/qxmlquery.html">QXmlQuery</a>. The way it&#8217;s now provided, is that when a QIODevice is bound to a variable using <a href="http://doc.trolltech.com/main-snapshot/qxmlquery.html#bindVariable-2">QXmlQuery::bindVariable(),</a> the query sees a URI(an instance of <code>xs:anyURI</code>) which behind the scenes maps to the QIODevice the user bound. Hence, if the purpose is to build an XML document, one passes the URI to the builtin <a href="http://www.w3.org/TR/xpath-functions/#func-doc">fn:doc()</a> function.</p>
<p>I hope this is clean. Since it&#8217;s handled like any other URI, custom extensions stays at a minimum, error reporting is consistent, and the interpretation hasn&#8217;t been coupled with the data. For instance, later on I hope to merge in support for XInclude and XQuery Update, and in those cases the URI is again simply passed to for instance <a href="http://www.w3.org/TR/xquery-update-10/#id-func-put">fn:put()</a>.</p>
<p>One can weight quite well on URIs and the abstraction the XPath Data Model provides, it seems.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/46/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/46/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/46/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/46/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/46/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=46&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/12/11/qiodevice-and-qxmlquery/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>Query Your Toaster</title>
		<link>http://englich.wordpress.com/2007/11/15/query-your-toaster/</link>
		<comments>http://englich.wordpress.com/2007/11/15/query-your-toaster/#comments</comments>
		<pubDate>Thu, 15 Nov 2007 10:52:34 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>
		<category><![CDATA[Qt]]></category>
		<category><![CDATA[QtXmlPatterns]]></category>
		<category><![CDATA[Software development]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/11/15/query-your-toaster/</guid>
		<description><![CDATA[People have asked for Qt&#8217;s XQuery &#38; XPath support to not be locked to a particular tree backend such as QDom, but to be able to work on arbitrary backends. Any decent implementation(such as XQilla or Saxon) provide that nowadays in someway or another, but I&#8217;d say Patternist&#8217;s approach is novel, with its own share [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=45&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>People have asked for Qt&#8217;s XQuery &amp; XPath support to not be locked to a particular tree backend such as QDom, but to be able to work on arbitrary backends.</p>
<p>Any decent implementation(such as XQilla or Saxon) provide that nowadays in someway or another, but I&#8217;d say Patternist&#8217;s approach is novel, with its own share of advantages. So let me introduce what Qt&#8217;s snapshot carries.</p>
<blockquote>
<pre>&lt;ul&gt;
    {
        for $file in $exampleDirectory//file[@suffix = "cpp"]
        order by xs:integer($file/@size)
        return &lt;li&gt;
                    {string($file/@fileName)}, size: {string($file/@size)}
                  &lt;/li&gt;
    }
&lt;/ul&gt;</pre>
</blockquote>
<p>and the query itself was set up with:</p>
<blockquote>
<pre><code>QXmlQuery query;</code>FileTree fileTree(query.namePool());
query.setQuery(&amp;file, QUrl::fromLocalFile(file.fileName()));
<code>query.bindVariable("exampleDirectory", fileTree.nodeFor(QLibraryInfo::location(QLibraryInfo::ExamplesPath)));</code>
<code>if(!query.isValid())</code>
<code>     return InvalidQuery;</code>
<code>QFile out;</code>
<code>out.open(stdout, QIODevice::WriteOnly);</code>
<code>query.serialize(&amp;out);</code></pre>
</blockquote>
<p>These two snippets are taken from the example found in examples/xmlpatterns/filetree/, which with about 250 lines of code, has virtualized the file system into an XML document.</p>
<p>In other words, with the tree backend FileTree that the example has, it&#8217;s possible to query the file system, without converting it to a textual XML document or anything like that.</p>
<p>And that&#8217;s what the query does: it finds all the .cpp files found on any level in Qt&#8217;s example directory, and generate a HTML list, ordered by their file size. Maybe generating a view for image files in a folder would have been a tad more useful.</p>
<p>The usual approach to this is an abstract interface/class for dealing with nodes, which brings disadvantages such as heap allocations and that one need to allocate such structures and hence the possibility to affect the implementation of what one is going to query.</p>
<p>But along time ago Patternist was rewritten to use Qt&#8217;s items &amp; models pattern, which means any existing structure can be queried, without touching it. That&#8217;s what the FileTree class does, it subclasses <a href="http://doc.trolltech.com/main-snapshot/qsimplexmlnodemodel.html">QSimpleXmlNodeModel</a> and handles out <a href="http://doc.trolltech.com/main-snapshot/qxmlnodemodelindex.html">QXmlNodeModelIndex</a> instances, which are light, stack allocate values.</p>
<p>This combined with that the engine tries to evaluate in a streamed and lazy manner to the degree that it thinks it can, means fairly efficient solutions should be doable.</p>
<p>So what does this mean? It means that if you would like to, you can relatively cheaply be able to use the XQuery language on top of your custom data structure, as long as it is somewhat hierarchical.</p>
<p>For instance, a backend could bridge the QObject tree, such that the XQuery language could be used to find Human Interface Guideline-violations within widgets; molecular patterns in a chemistry application can concisely be identified with a two or three liner XPath expression, and the <a href="http://doc.trolltech.com/main-snapshot/qtxmlpatterns.html">documentation</a> carries on with a couple of other examples. No need to convert QWidgets to nodes, or force a compact representation to sub-class an abstract interface.</p>
<p>A to me intriguing case would be a web robot that models the links between different pages as a graph, and finds invalid documents &amp; broken links using the <a href="http://www.w3.org/TR/xpath-functions/#func-doc-available">doc-available()</a> function, or reported URIs that a website shouldn&#8217;t be linking to(such as a public site referencing intranet pages).</p>
<p>Our API freeze is approaching. If something is needed but missing, let me know.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/45/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/45/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/45/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/45/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/45/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=45&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/11/15/query-your-toaster/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>Integrating Compiler Messages</title>
		<link>http://englich.wordpress.com/2007/10/23/integrating-compiler-messages/</link>
		<comments>http://englich.wordpress.com/2007/10/23/integrating-compiler-messages/#comments</comments>
		<pubDate>Tue, 23 Oct 2007 09:26:32 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[Qt]]></category>
		<category><![CDATA[QtXmlPatterns]]></category>
		<category><![CDATA[Software development]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/10/23/integrating-compiler-messages/</guid>
		<description><![CDATA[Attention to details is ok, but compiler messages has historically not received it. Here&#8217;s an example of GCC&#8217;s output: qt/src/xml/query/expr/qcastingplatform.cpp: In member function 'bool CastingPlatform::prepareCasting(): qt/src/xml/query/expr/qcastas.cpp:117: instantiated from here qt/src/xml/query/expr/qcastingplatform.cpp:85: error: no matching function for call to 'locateCaster(int)' qt/src/xml/query/expr/qcastingplatform.cpp:93: note: candidates are: locateCaster(const bool&#38;) Typically compiler messages have been subject to crude printf approaches and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=43&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Attention to details is ok, but compiler messages has historically not received it. Here&#8217;s an example of GCC&#8217;s output:</p>
<blockquote><p><code>qt/src/xml/query/expr/qcastingplatform.cpp: In member function 'bool CastingPlatform::prepareCasting():<br />
qt/src/xml/query/expr/qcastas.cpp:117:   instantiated from here<br />
qt/src/xml/query/expr/qcastingplatform.cpp:85: error: no matching function for call to 'locateCaster(int)'<br />
qt/src/xml/query/expr/qcastingplatform.cpp:93: note: candidates are: locateCaster(const bool&amp;)</code></p></blockquote>
<p>Typically compiler messages have been subject to crude <code>printf</code> approaches and dignity has been left out: localization, translation, consistency in quoting style (for instance), adapting language to users (e.g, to not phrase things preferred by compiler engineers), good English, and just generally looking sensible.</p>
<p>To solve that it requires quite some work, and that&#8217;s probably the explanation to why it often is left out. To have line numbers, error codes, names of functions, and whatever available and flowing through the system requires quite some plumbing and room in the design.</p>
<p>Another thing is that nowadays we really should expect that compiler messages within IDEs or other graphical applications should be sanely typeset. If not, we&#8217;ve lost ourselves in all this UNIX stuff. Keywords and important phrases should be italic, emphasized, colorized depending on the GUI style.</p>
<p>For shuffling compiler messages around it is customary to pass a set of properties: a URI, line number, column number, a descriptive string, and possibly an error code. Apart from that it falls short reaching the goals outlined in this text, it encounters a problem which I think is illustrated in the above example with GCC. What does one do if the message involves several locations?</p>
<p>Even if a message involves several locations, it is still one message and should be treated so, and presented as so. The approach of using a struct with properties falls short here, and chops the message into as many parts as it has locations.</p>
<p>For <a href="http://labs.trolltech.com/page/Projects/Internet/Patternist">Patternist</a> I wanted to make an attempt at improving messages. So far it is an improvement at least. For instance, for this message that the command line tool <code>patternist</code> outputs:</p>
<p><a href="void(0)" id="file-link-42" title="cli.png" class="file-link image">  			</a><a href="http://englich.files.wordpress.com/2007/10/cli.png" title="Direct link to file"><img src="http://englich.files.wordpress.com/2007/10/cli.png?w=645&#038;h=239" alt="cli.png" height="239" width="645" /></a></p>
<p>the installed <a href="http://doc.trolltech.com/main-snapshot/qabstractmessagehandler.html">QAbstractMessageHandler</a> was passed a <a href="http://doc.trolltech.com/main-snapshot/qsourcelocation.html">QSourceLocation</a> and a message which read:</p>
<blockquote><p><code>&lt;p&gt;Operator &lt;span class='XQuery-keyword'&gt;+&lt;/span&gt; is not available between atomic values of type &lt;span class='XQuery-type'&gt;xs:integer&lt;/span&gt; and &lt;span class='XQuery-type'&gt;xs:string&lt;/span&gt;.&lt;/p&gt;</code></p></blockquote>
<p>It was subsequently converted to local encoding and formatted with ECMA-48 color codes. (The format is not spec&#8217;d yet, it will probably be XHTML with specified class ids.)</p>
<p>While using markup for the message is a big improvement, it opens the door for formatting and all, this API still has the problem of dealing with multiple locations.</p>
<p>What is the solution to that?</p>
<p>Striking the balance between programmatic interpretation(such that for instance source document navigation is doable) and that the message reads naturally as one coherent unit is to&#8230; maybe duplicate the information, but each time tailored for a particular consumer?</p>
<blockquote><p><code>&lt;p xmlns:l="http://example.com/"&gt;In my &lt;l:location href="myDocument.xml" line="57" column="3"&gt;myQuery.xq at line 57, column 3&lt;/l:location&gt;, function &lt;span class="XQuery-keyword"&gt;fn:doc()&lt;/span&gt; failed with code &lt;span class="XQuery-keyword"&gt;XPTY0004&lt;/span&gt;: the file &lt;l:location href="myDocument.xml" line="93" column="9"&gt;myDocument.xml failed to parse at line 93, column 9&lt;/l:location&gt;: unexpected token &lt;span class="XQuery-keyword"&gt;&amp;&lt;/span&gt;.&lt;/p&gt;</code></p></blockquote>
<p>This is complicated by that language strings cannot be concatenated together since that prevents translation. But I think the above paragraph is possible to implement. As above, the message reads coherently, but still allows programmatic extraction. A language string and formatted data sits in opposite corners of extremity, and maybe markup is the balance between them.</p>
<p>Would this give good compiler messages and allow slick IDE integration? If not, what would?</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/43/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/43/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/43/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=43&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/10/23/integrating-compiler-messages/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>

		<media:content url="http://englich.files.wordpress.com/2007/10/cli.png" medium="image">
			<media:title type="html">cli.png</media:title>
		</media:content>
	</item>
		<item>
		<title>XPath &amp; XQuery in Qt</title>
		<link>http://englich.wordpress.com/2007/09/18/xpath-xquery-in-qt/</link>
		<comments>http://englich.wordpress.com/2007/09/18/xpath-xquery-in-qt/#comments</comments>
		<pubDate>Tue, 18 Sep 2007 10:03:38 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[Qt]]></category>
		<category><![CDATA[QtXmlPatterns]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/09/18/xpath-xquery-in-qt/</guid>
		<description><![CDATA[The Qt snapshots now includes support for XPath 2.0 and XQuery 1.0. Being part of the XML library, the idea is that Qt 4.4 will ship with a C++ API for running and evaluating such queries. On the side too, is a command line tool called patternist, for quickly testing queries, scripting and old-school web [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=41&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://trolltech.com/developer/downloads/qt/snapshots">Qt snapshots</a> now includes support for XPath 2.0 and XQuery 1.0.</p>
<p>Being part of the XML library, the idea is that Qt 4.4 will ship with a C++ API for  running and evaluating such queries. On the side too, is a command line tool called <code>patternist</code>, for quickly testing queries, scripting and old-school web solutions. But who cares, blogs with screenshots is the thing:</p>
<p><a href="http://englich.files.wordpress.com/2007/09/cli.png" title="Direct link to file"><img src="http://englich.files.wordpress.com/2007/09/cli.png?w=480" alt="cli.png" /></a></p>
<p>Stronger XML support in Qt has been consistently asked for by users over a long time, with XPath being one of the main requests. Hopefully Patternist, with the help of KDE folks, users, and customers expressing what&#8217;s missing, will please those needs. Considering the similarities of XQuery and XSL-T, Patternist also serves as a foundation for implementing XSL-T, if so decided.</p>
<p>For KDE folks all this might ring a bell. Patternist was indeed first developed for a long time in the KDE repository, as part of KDOM. We just thought it would make a lot more use as part of Qt.</p>
<p>And I think exactly that makes this exciting. W3C&#8217;s XQuery working group has registered an astonishing number of exciting implementations. But for users, reliability is what matter in the end. Whether bugs will be fixed, whether people can answer questions, whether the piece is maintained and documented. Persistency. Trolltech swiftly carries this on its shoulders(assuming I brush my teeth and all that).</p>
<p>Combined with that Qt is open source and the Patternist SDK used for development is <a href="http://labs.trolltech.com/page/Projects/Internet/Patternist">as well</a>, this is like eating some <a href="http://hovbyno9.se/">nasty chocolate</a> while at <em>the same time</em> singing a little duet with <a href="http://en.wikipedia.org/wiki/Miss_Piggy">Miss Piggy</a>. I can&#8217;t sing, nor can Piggy (although she tries), but you get my point.</p>
<p>Humble modesty aside, it is worth to mention that this still needs work. About 94% of the test suite is passed, the API needs more work, and there is performance issues.</p>
<p>Nailing test cases and trimming code paths are problems that have known solutions (though typically horrible to carry out). Harder is to know what people need and how they need it. It&#8217;s hard to guess what kind of APIs or extensions Amarok or KOffice or a GNOME or web application need.</p>
<p>If you got input, feel free to add a comment to the blog,  send <a href="http://trolltech.com/bugreport-form">a report</a> to Trolltech, grab me(FransE) on the Open Projects IRC network, or ask a question or two on the <a href="http://lists.trolltech.com/qt-interest/">qt-interest</a> mailing list.</p>
<p>The documentation starts over <a href="http://doc.trolltech.com/main-snapshot/qtxquery.html">here</a>.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/41/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/41/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/41/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=41&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/09/18/xpath-xquery-in-qt/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>

		<media:content url="http://englich.files.wordpress.com/2007/09/cli.png" medium="image">
			<media:title type="html">cli.png</media:title>
		</media:content>
	</item>
		<item>
		<title>Representing XML</title>
		<link>http://englich.wordpress.com/2007/01/11/representing-xml/</link>
		<comments>http://englich.wordpress.com/2007/01/11/representing-xml/#comments</comments>
		<pubDate>Thu, 11 Jan 2007 11:18:17 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/01/11/representing-xml/</guid>
		<description><![CDATA[Patternist, the XQuery/XPath/XSL-T framework, is abstracted to be able to use different tree-implementations, in concept like Saxon. Up until now, Patternist has been using one that wrapped Qt&#8217;s QDom. When I started writing that very first tree backend it was with the purpose to boot strap the rest of the code, a temporary solution that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=39&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://patternist.sourceforge.net/documentation/API/">Patternist</a>, the XQuery/XPath/XSL-T framework, is abstracted to be able to use different tree-implementations, in concept like Saxon. Up until now, Patternist has been using one that wrapped <a href="http://doc.trolltech.com/4.2/qtxml.html#the-qt-dom-classes">Qt&#8217;s QDom</a>. When I started writing that very first tree backend it was with the purpose to boot strap the rest of the code, a temporary solution that got the job done until the solution for production use arrived. QDom&#8217;s massive memory usage &#8212; my measurements says roughly 18 times the document size &#8212; is people&#8217;s usual complaint. The reason I stalled was that the XPath Data Model, simply couldn&#8217;t be implemented with QDom, let alone efficiently. So what now?</p>
<p>This blog entry is tinkering &#8212; although without accompanying code &#8212; on how to represent XML.</p>
<p><span id="more-39"></span></p>
<h1>Introduction</h1>
<p>For the purpose of writing a pleasing backend, I started reading up on the research. There is a lot of work in this area and I have still no trouble finding contributing papers. The focus of the papers I&#8217;ve read so far(and if taking a broader perspective, I suspect) is representing XML with respect to storage size and efficient evaluation of XPath paths, with a strong eye on axes.</p>
<p>One thing worth to notice, even though XQuery&#8217;s support for static typing of sequences&#8217; cardinality is ardently supported and readily implemented, I fail to see the literature that utilizes it. Well, perhaps that will arrive later on.</p>
<p>Although it may sound obvious, &#8220;storing XML&#8221; is not a homogen term, which surely must be taken into account when starting to pick and choose between the different proposals. There is a clear dominance towards persistent, relational storage(not that it is surprise to anyone), which tailors the discussions &#8212; table normalization, what to index, data base optimizations to exploit, etc &#8212; in a direction that typically isn&#8217;t that useful for the more old-fashioned in-memory approaches.</p>
<p>However, one do find significant amounts in other directions. For example, <a href="http://db.cis.upenn.edu/DL/04/CMDP.pdf">Efficient Path Query Processing on Encoded XML</a> discusses how to query streamed XML on resource-constrained devices, and <a href="http://adrem.ua.ac.be/~hidders/talks.php5">Efficient XPath Axis Evaluation for DOM Data Structures</a> discusses pre/post numbering close to DOM.</p>
<p>A common pattern through all is diversity, which perhaps can be derived from the genericy of XML and its organic structure. People use XQuery and XML in such different ways that ideas poorly travels between different scenarios.</p>
<p>Sometimes I think this goes so far as losing touch with reality; for example, one paper, in the competition of implementing XPath axes the best way, does so at the cost of as much as four full-document indexes and subsequent size ratio. From my impression document size is one of the most common problems people have, judging from for example vendor forums. Even if a certain proposal for axes is very fast, who can in practice take the storage hit it requires?</p>
<p>Therefore, the techniques one decides for is extremely dependent on what the scenarios are. If one is so lucky to constrain what kind of documents one are reading &#8212; count out Patternist &#8212; the best one can do is perhaps to adapt at runtime.</p>
<p>One of the key elements that the &#8220;XML era&#8221; of data management brings, is the increased freedom and diversity, which is as concrete as XML&#8217;s syntax compared to a tabular format. &#8220;<a href="http://xml.apache.org/xindice/faq.html#faq-N1002C">While it&#8217;s generally very easy to map relational data into XML, trying to map XML, into relational data can be incredibly difficult.</a>&#8220;</p>
<p>However, one trend is not to support XML&#8217;s unpredictability, but the opposite way, through excluding node types, XPath axes and features such as pessimistic typing of cardinalities. What makes me excited about XML, is this increased flexibility it gives users on behalf of computing power. And that&#8217;s the right direction, although it does mean implementors have to make an effort.</p>
<h1>Requirements</h1>
<p>For Patternist, the requirements are as follows:</p>
<ul>
<li>Implement the <a href="http://www.w3.org/TR/xpath-datamodel/">XPath Data Model</a>. This means for example that for a given node, it must be able to determine the namespaces in scope, that processing instructions and comments are represented and that text nodes are text nodes, not only the string value of an element. However, CDATA and regular text nodes aren&#8217;t distinguished(which is a good thing) and system entities and system entity references are not part of the data model.</li>
<li>Be XML 1.0/Namespaces 1.0 based, as opposed to 1.1, for a plethora of reasons.</li>
<li>Support serialization/roundtripping to the degree <a href="http://www.w3.org/TR/xslt-xquery-serialization/">XSLT 2.0 and XQuery 1.0 Serialization</a> says.</li>
<li>Memory consumption must for users be acceptable, which means that it must be a radical improvement compared to QDom. However, it&#8217;s not the only requirement, and it hence can&#8217;t go blind on this at the cost of the other requirements.</li>
<li>The faster XQuery queries are, the better. For example, if the time of sorting in document order for two arbitrary nodes is linear to the distance between the two in document order(as in the case with QDom and nost other DOM implementations), one can pretty much declare game over.</li>
<li>It&#8217;s about representing and writing arbitrary documents, in arbitrary locations, in arbitrary amounts. For that reason bulk loading into a persistent storage is not an option.</li>
<li>Updates must be possible to the degree that the <a href="http://www.w3.org/TR/xqupdate/">XQuery Update Facility</a> requires. Why not even more fine grained write support? Because I believe XQuery Update makes a good job at taking into account users needs while providing implementors with good optionsfor doing things efficiently.</li>
</ul>
<p>What tradeoffs to do on this game board will be readily discussed. Except for conformance.</p>
<h1>Solutions</h1>
<h2>Node Numbering Schemes</h2>
<p>&#8220;Node Numbering is one of the more interesting, and important design choices in an XML database,&#8221; says <a href="http://www.oracle.com/webapps/dialogue/dlgpage.jsp?p_dlg_id=5039737&amp;src=4945225&amp;Act=5">Anatomy of an XML Database: Oracle Berkeley DB XML</a>(which gives a good overview of databases and XML) and I agree. It seems to be the key, and largely affects how the rest is wired.</p>
<p>The reason stems from XPath&#8217;s requirement of that results of path expressions should be free from duplicates and be sorted in document order. In order to optimize those operation(de-duplication, sorting), implementors attempt to code the structural relationship(e.g, if a node is a forward-sibling of another, and so forth) in the node idenfifier in order to avoid additional lookups. The otherwise needed indirection is relatively cheap for in-memory models(since it&#8217;s typically a pointer being dereferenced, or somekind of array lookup), while for implementations on top of relation databases it requires an additional join for performing a structural join.</p>
<p>One widely popular node numbering scheme is <a href="http://portal.acm.org/citation.cfm?id=802184">Dietz&#8217;s pre-post numbering</a>, which is excellently explained on the <a href="http://pathfinder-xquery.org/research/xpath-accel">MonetDB project&#8217;s</a> site. Its major advantage is that the node relationship can be determined in constant time, but does so at the cost of requiring a re-numbering of the whole tree when nodes are inserted.</p>
<p>There are many variants of pre-post numbering, such as replacing the post number with an <a href="http://portal.acm.org/citation.cfm?id=672035">appromixated child count</a> to reduce the number of re-numberings on updates. The problem with the different variants is that they all have somekind of flaw, still leaving one with a bad taste.</p>
<p>However, the <a href="http://dbs.uni-leipzig.de/files/projekte/XML/Boehme_DLN_DIWeb_CR.pdf">Dynamic Level Numbering</a>(DLN) scheme, nicely explained on <a href="http://exist.sourceforge.net/xmlprague06.html">eXist&#8217;s site</a>, which essentially extend <a href="http://en.wikipedia.org/wiki/Dewey_decimal">Dewey&#8217;s Decimal Classification</a> with support for incremental updates and encodings for efficient storage, seems to be freed from the other schemes&#8217; issues, at the cost of being of variable length. In practice this means that each identifier needs a heap allocation to support the identifier, which is costly considering how node identifers are used.</p>
<p>However, in C++ I believe this can be tackled quite well with a classic <code>union</code>-trick: it is first at abnormal tree depths or very large node counts that identifiers grows to considerable sizes, meaning one can store it in 63 bits and let the last bit signal whether the first word is used as a pointer to a heap allocated structure(otherwise the identifier is simply stored in the 63 bits).</p>
<p><a href="http://www.cs.toronto.edu/db/WebPage/files/www2003.pdf">The XML Web: a First Study</a>, says that of the roughly 200 000 XML documents they scanned, &#8220;99% of the documents have fewer than 8 levels. The average depth is 4,&#8221; which sounds promising for the DLN scheme. The hit taken by large documents is more realistic, but on the other hand first occurs when having run out of &#8220;local&#8221; bits.</p>
<h2>Compression &amp; Text Nodes</h2>
<p>KDE developer Ariya Hidayat reach impressive size ratios in his <a href="http://ariya.blogspot.com/2006/11/memory-efficient-dom-part-2.html">KoXmlReader experiments</a> with the use of heavy compression(among other things). The requirements for KoXmlReader and an XQuery implementation are vastly different. Although for functionality that is supposed to actually access(query) the nodes, one could quickly conclude that the tradeoff of compression(slow access) is strongly present, but I don&#8217;t think that necessarily holds.</p>
<p>I believe it was <a href="http://saxonica.blogharbor.com/">Michael Kay</a> who somewhere wrote that lazy parsing of source documents could be sensible for XSL-T transforms because often not all nodes are needed(and hence doesn&#8217;t need to be represented). By the same reasoning compressing text nodes could be sensible as well. Afterall, selecting a whole document is not a very useful query.</p>
<p>It&#8217;s also worth considering what is being compressed. If I&#8217;ve read the KoXmlReader experiments correctly, it was compressing western codepoints in UTF-16 encoding, which wouldn&#8217;t surprise me if is thankful input for many compression algorithms. Another possibility is to store UTF-8 instead of UTF-16 if that encoding as well as western content is common &#8212; A Web Study reporting on encoding distribution would be helpful.</p>
<p>Another possibility is to let the parser not decode at all(performing validity checks only) and store the text in the document&#8217;s encoding, which possibly could speed up serialization, which is notably expensive. However, doing so requires rather heavy surgery, and rules out features such as system entities(which on the other hand are absent in the XPath Data Model anyway), to mention one of the few things one can consider.</p>
<p>Another possibility is to compress text nodes consisting of only whitespace and gain by that knowledge when doing string matches as well. However, implementing this with Qt/C++ is tricky, considering QString&#8217;s non-polymorphic design. Unfortunately The XML Web: a First Study doesn&#8217;t discuss the content of text nodes, which makes it difficult to tell how relevant this discussion is, although folklore and <a href="http://englich.wordpress.com/2007/01/09/xmlstat/">sporadic tests</a> suggests whitespace is of significant size.</p>
<p>One approach is to compensate the difficulty of accessing the compressed nodes with a full-text index, which as upside is designed for actually querying text nodes. Or rather, perhaps the compressed nodes is the payment for a full text feature. One could also attempt to base primitive searches such as <a href="http://www.w3.org/TR/xpath-functions/#func-contains">fn:contains()</a> on the full-text index. Not that I have started implementing <a href="http://www.w3.org/TR/xquery-full-text/">Full-Text</a>.</p>
<h2>Name Pools</h2>
<p>Name pools, typically implemented as that each name is stored as a string only once and subsequently referred to with an id, are popular both in relation databases and in-memory representations. The reason seems  to simply be that structural content is overwhelming in an XML document(as the Web Study discusses in section 3.2 Node Distribution), as well as that name comparisons are optimized. Name pooling seems to one of the most efficient space optimizations one can apply.</p>
<p>Worth to comment on is populating the name pool. Saxon&#8217;s <a href="http://www.saxonica.com/documentation/javadoc/net/sf/saxon/om/NamePool.html">NamePool</a> loops over the name index each time an id is requested, in order to determine the index of the previously identical inserted string, if any. This is likely as slow as an insert gets, which others decide to optimize by keeping a hash with the string index as value.</p>
<p>The index costs space(although not much since it&#8217;s linear to the amount of different names, as opposed to count of all names), which possibly could be skipped if one could approximate what nodes that appear often. For example, the <code>html</code> element in an XHTML element appears only once and the same applies more or less for the following 3-4 elements, but they are nevertheless inserted first in the name index and is therefore iterated over for every following name for no useful reason.</p>
<p>If one could approximate where the common elements starts &#8212; the Web Study finds many intriguing, strong patterns &#8212; one could offset the first names further up the index, or to store in two different indexes(&#8220;rare&#8221; and &#8220;common&#8221;).</p>
<p>Being able to approximate what names that will appear often is also of interest for doing pre-allocations. For example, as can be seen in the <a href="http://englich.wordpress.com/2007/01/09/xmlstat/">xmlstat blog entry</a>, there is about 13 names that occurs a vast amount of times. An index that has name as key and node identifiers as values will likely use growth strategies that are non-optimistic for those names.</p>
<h2>File Size as Parameter</h2>
<p>For a wide variety of choices, the XML representation consists of integers whose required range they are required to express, depends on the document size. For example, the maximum amount of nodes, name distribution or tree depth. For example, using 64 bits for a document order numbers is excessive if the actual amount of nodes easily fits in 16 bits.</p>
<p>One approach here could be to use the file size(which is relatively cheap to retrieve for a local file which is to be parsed), if such a one is available, as hint for the XML representation which subsequently could use more appropriate storage types. More practically, this could for example be implemented in C++ with a template implementation which is instantiated for a set of common sizes, &#8220;small&#8221;, &#8220;medium&#8221;, and &#8220;large&#8221;.</p>
<p>This space optimization applies for probably small documents, which can be significant for resource constrained devices or when many small documents are opened, such as collection of small documents used as a database(a usage scenario which <a href="http://xml.apache.org/xindice/">Apache&#8217;s XIndice</a> seems to be optimized for).</p>
<p>The first step would be to determine what parameters in the implementation that are dependent on the document size. For example, to be able to answer the question &#8220;What is the maximum file size that can be handled if I store node count in 16 bits?&#8221; and other peculiar pussles with the XML syntax.</p>
<p>Even if one decides to trust ones calculations, there is the possiblity of that a file is encoded in a codec with a smaller per-character encoding than a byte(possibly as a malicious attack by installing such a codec), so it is probably a risky business to skip bounds checking for fixed sizes.</p>
<p>File size can also be used as a hint for pre-allocations, especially if ratios for the different stores in the representation are known.</p>
<h2>Indexes</h2>
<p>Due to the requirements for Patternist&#8217;s backend, indexes aren&#8217;t that relevant, except for a particular case: a map from name to attribute and element nodes.</p>
<p>One reason is it allows threaded execution of path expressions such as <code>a/b/c/d</code> because <code>a/b</code> and <code>c/d</code> can be executed concurrently, to afterwards be joined. Many XPath operators can be threaded in typical implementations(not that I can name any who does) because the operands do not depend on each, but that is the case with path expressions without any such index.</p>
<p>Another reason is the common abuse but also valid use of the descendant axis which should be signficantly faster for many scenarios when using a name index.</p>
<p>It is hard to tell whether the storage for such an index is worth it, for several reasons. For starters, the size of such an index could be of significant size; for the document in the x<a href="http://englich.wordpress.com/2007/01/09/xmlstat/">mlstat blog-entry</a> it would probably be around 2 MB assuming each node identifier consumes 64 bits(although it should be taken into account that the source document is relatively big: 8.5 MB).</p>
<h2>Query as Parameter</h2>
<p>One way to improve both space and time consumption is to use query analysis to adapt the XML representation for documents to be loaded. Examples follows.</p>
<h3>Node Construction</h3>
<p>Path expressions tells in a very clear way what nodes that will be used. Hence one can selectively build representations for processing instructions, comments, attributes, and text nodes. For instance, the Web Study concludes that &#8220;for documents larger than 4096 bytes, there are proportionally more attribute nodes than element nodes(51.13% vs. 37.83%),&#8221; which suggests that significant space gains can be made.</p>
<p>Doing such a query analysis is straight forward, but needs consideration. For instance:</p>
<ul>
<li>If positional predicates are present, skipped nodes must be compensated by &#8220;artificial&#8221; place holders in order to preserve node numberings.</li>
<li>Access to text nodes or the string value property is commonly implicit through operators and functions, which is one reason why coding as much as possible of an XQuery implementation in XQuery itself is of interest, since it simplifies this kind of analysis(as discussed in a previous <a href="http://englich.wordpress.com/2006/10/31/implementing-xpaths-builtin-functions/">blog entry</a>).</li>
</ul>
<h3>Index Building</h3>
<p>Selectively building name indexes based on what&#8217;s needed can reduce the associated space costs. For example, if no nametests for attributes are present but the attribute axis is nevertheless used in some way, a name index doesn&#8217;t have to be populated for attributes.</p>
<p>If one assumes that the <code>attribute</code> and <code>child</code> axes are relatively fast, a node test such as <code>@name</code> or <code>name</code> are relatively inexpensive. However <code>//name</code> or <code>//*[@name]</code> are of a different magnitude. One way to improve the decision for index building is to use weighting, as I call it, of constructs.</p>
<p>For example, considering the high representation of attribute nodes in documents, notably expensive path expressions involving attributes must be present before populating an index with attribute nodes. This could make the cost of a name index bearable. Similarly, if a query containing a path expression with elements tests that can be evaluated by a sequential walk(no complex axes, no sorting or de-duplication) it perhaps neither qualifies for an index build.</p>
<h1>Where to Go</h1>
<p>Vaporware and speculation is what characterizes this text. Inventing approaches is not difficult, the hard, time consuming but not very interesting part is implementing and measuring. Possibly, for the mere result of realizing that the long path led to a dead end.</p>
<p>A yet wide issue is how users should access the result of a query(as verbosely discussed in <a href="http://englich.wordpress.com/2006/11/18/xml-apis/">XML APIs</a>). A traditional approach as seen in <a href="http://www.w3.org/TR/DOM-Level-3-XPath/">W3C&#8217;s XPath DOM Module</a> or <a href="http://jcp.org/en/jsr/detail?id=225">XQuery API for Java</a> is the use of iterators. Another approach could be more in the direction of the <a href="http://www.w3.org/TR/xproc/">XProc pipeline language</a>, where for a query one would declare and name output pipes which could be sent to with a <code>to-pipe("qualified-name-of-pipe", expression)</code> function. I believe it could possibly:</p>
<ul>
<li>Allow more operations to be carried out within the query instead of that the user inspects nodes from an iterator, for example</li>
<li>With an improved understanding of how the results will be used, hence have a stronger foundation for further optimizations</li>
<li>Designing practical APIs for retrieving the result.</li>
</ul>
<p>However, I think one thing can be strongly concluded: declarative approaches is the way to go. One reason I am opposed to exposing functionality to users through the traditional tree API or low-level APIs in general is that they kill every attempt at doing clever things, in addition to delegating the task of conforming to standards to the user.</p>
<p>One can&#8217;t apply a pipeline/early-exit strategy if the user can retrieve child nodes as a list; one cannot attempt efficient lookups if the user is doing it with a <code>for</code>-loop; threading becomes the user&#8217;s responsibility if a higher perspective is absent; and it is difficult to adapt document building if all the user has told is that a document needs to be loaded, and so on. That&#8217;s why I believe XQuery and pipeline languages like XProc are good ways to go.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/39/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/39/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/39/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/39/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/39/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=39&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/01/11/representing-xml/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>xmlstat</title>
		<link>http://englich.wordpress.com/2007/01/09/xmlstat/</link>
		<comments>http://englich.wordpress.com/2007/01/09/xmlstat/#comments</comments>
		<pubDate>Tue, 09 Jan 2007 16:09:10 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/01/09/xmlstat/</guid>
		<description><![CDATA[I wrote a small tool for extracting statistics about XML documents. If I was less lazy, it could be more useful. Still, to some use I think it is. With XML documents, as with many other things, it is difficult to perceive actual circumstances. Hence we construct measuring devices. Abc. Here&#8217;s the output from invoking [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=40&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I wrote a small tool for extracting statistics about XML documents. If I was less lazy, it could be more useful. Still, to some use I think it is.</p>
<p><span id="more-40"></span></p>
<p>With XML documents, as with many other things, it is difficult to perceive actual circumstances. Hence we construct measuring devices. Abc. Here&#8217;s the output from invoking <code>xmlstat</code> on the catalog file that describes the tests in <a href="http://www.w3.org/XML/Query/test-suite/">W3C &#8216;s XQuery Test Suite</a>:</p>
<blockquote>
<pre><code>Document:         file:///home/fenglich/kde/src/playground/utils/xmlstat/XQTSCatalog.xml
File size(bytes): 8904307
-------------------------------------------
Elements:                  69062
Attributes:                170741
Non-Whitespace Text nodes: 32936
Whitespace Text nodes:     80810
Total text-node count:     113746
Comments:                  0
Processing Instructions:   2
Total:                     239805
-------------------------------------------
XQueryXQueryOffsetPath                                                        1
{http://www.w3.org/2005/02/query-test-XQTSCatalog}contextItem                 2
{http://www.w3.org/2005/02/query-test-XQTSCatalog}scenario                    4
{http://www.w3.org/2005/02/query-test-XQTSCatalog}input-document              5
{http://www.w3.org/2005/02/query-test-XQTSCatalog}role                        6
schema                                                                        8
{http://www.w3.org/2005/02/query-test-XQTSCatalog}citation-spec               9
input-document                                                                11
{http://www.w3.org/2005/02/query-test-XQTSCatalog}note                        14
{http://www.w3.org/2005/02/query-test-XQTSCatalog}input-URI                   23
{http://www.w3.org/2005/02/query-test-XQTSCatalog}context-property            30
{http://www.w3.org/2005/02/query-test-XQTSCatalog}implementation-defined-item 37
namespace                                                                     43
{http://www.w3.org/2005/02/query-test-XQTSCatalog}input-query                 44
{http://www.w3.org/2005/02/query-test-XQTSCatalog}source                      49
{http://www.w3.org/2005/02/query-test-XQTSCatalog}module                      58
FileName                                                                      75
ID                                                                            77
featureOwner                                                                  94
last-mod                                                                      138
{http://www.w3.org/2005/02/query-test-XQTSCatalog}test-group                  358
{http://www.w3.org/2005/02/query-test-XQTSCatalog}title                       359
{http://www.w3.org/2005/02/query-test-XQTSCatalog}expected-error              1141
{http://www.w3.org/2005/02/query-test-XQTSCatalog}output-file                 9638
{http://www.w3.org/2005/02/query-test-XQTSCatalog}test-case                   10519
is-XPath2                                                                     10544
date                                                                          10563
Creator                                                                       10585
{http://www.w3.org/2005/02/query-test-XQTSCatalog}input-file                  10636
variable                                                                      10703
{http://www.w3.org/2005/02/query-test-XQTSCatalog}description                 11380
section-title                                                                 13823
spec                                                                          13860
role                                                                          20303
name                                                                          21540
Total name count:                                                             71
-------------------------------------------
{http://www.w3.org/2005/02/query-test-XQTSCatalog}
{http://www.w3.org/2001/XMLSchema-instance}xsi
Total bindings count: 2
-------------------------------------------
Non-Whitespace chars: 1346407
Whitespace chars:     1706529</code></pre>
</blockquote>
<p>While raw and with blunt and confusing labeling, there are some things one can see:</p>
<ul>
<li>The document has in total 71 different qualified names, but they appear almost a quarter of million, 239803 times(element + attribute count).This is partly why namepools are popular both among relational storage models as well as tree implementors, since they reduce memory usage by an indescribable amount.</li>
<li>The attribute count is more than twice as big as the element count. That put things in perspective, at least for me. That relationship is a bit off in a broader scale, but not that far, as we shall see.</li>
<li>Text nodes containing only whitespace(always formatting basically) is larger than actual content(<code>Non-Whitespace chars</code> versus<code> Whitespace chars</code><code>).</code></li>
</ul>
<p>These &#8220;conclusions&#8221; aren&#8217;t that interesting since they&#8217;re only based on one document, but I nevertheless state them because they roughly mimmic what <a href="http://citeseer.ist.psu.edu/mignet03xml.html">The XML Web: a First Study</a> says.</p>
<p>That paper contains statistics over XML documents found on the web, but is for these purposes relatively old(from 2003). I would for that reason not be surprised if the numbers has changed since then, considering the widespread use of feeds, for instance. Another point is that their statistics in some cases are quite heavily influenced by individual sites(such as http://rpmfind.net/ or http://w3.org/), which to me suggests that a too small data sampling is used.</p>
<p>Here&#8217;s a very concentrated list of their conclusions:</p>
<ul>
<li>&#8220;WAP and RDF make up 26% and 17% of all document on the XML Web, respectively.&#8221;</li>
<li>&#8220;Our results show that XML documents are in fact relatively shallow: 99% of them have less than 8 levels of element nesting. Also, 15% of the documents we analyzed have recursive content, in which there is much regularity.&#8221;</li>
<li>&#8220;Only 75 different DTDs are referenced in our sample&#8221;(which is about 200,000 documents). &#8220;92% of all DTD references are made to norms 1.1 or 1.2 of the WAP protocol.&#8221;</li>
<li>&#8220;Only 0.09% of the document suse either the attribute label SchemaLocation or noNameSpaceSchemaLocation&#8221;(but as in the case with DTDs, not referencing a schema from the document is not equivalent to not using a schema for that document).</li>
<li>&#8220;For documents up to 4096 bytes, the number of element nodes dominates the distributions&#8221;</li>
<li>&#8220;For documents larger than 4096 bytes, there are proportionally more atttribute nodes than element nodes(51.13% vs. 37.83%).&#8221;</li>
<li>&#8220;These observations led us to conclude that the structural information found in XML documents is in fact dominant over the textual content.&#8221;</li>
<li>&#8220;It turns out that 782,602 elements(5% in total) have mixed content. Surprisingly, these elements belong to 138,298 documents(72% of all documents).&#8221;</li>
<li>&#8220;The prevailing assumption in this community[database community] is that attributes and mixed element content are not as important as element content.&#8221;</li>
<li>&#8220;99% of the documents have fewer than 8 levels[level refers to tree depth]. The average depth is 4, and the deepest document has 135 levels.&#8221;</li>
<li>&#8220;On average, the second level contains more attributes than any other level. In fact, 89% of all attribute are found in the first 3 levels of the documents.&#8221;</li>
<li>&#8220;77% of all element nodes and 6% of all text are found in the first 3 levels of the documents.&#8221;</li>
<li>&#8220;28,208 XML document(14.81% of the total) contain recursive elements.&#8221;</li>
<li>&#8220;The average document size is 4kb&#8221;</li>
</ul>
<p>I find statistics like these very useful and I believe they can play an important role in discussing implementation approaches.</p>
<p>I would surely not mind a second such study. Perhaps it should have a larger document sampling in order to not be thrown off by individual sites. It would also be nice to see the distribution of encodings used, and the relationship between whitespace-only and regular text nodes.</p>
<p>And of course, <code>xmlstat</code> could be a lot more useful. Essentially that an XHTML page is produced with bar charts describing name distributions, node type distributions, concentrations in relation to depth, and so on. It would help with making decisions for implementations. It wouldn&#8217;t surprise if it&#8217;s useful for things like XSL-T debugging as well.</p>
<p>In either case, feel free to use or improve this simple and primitive tool. It&#8217;s in KDE&#8217;s SVN repository, <code>playground/utils/xmlstat</code>, GNU GPL licensed and based on QtCore, QtXml and qmake.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/40/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/40/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/40/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=40&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/01/09/xmlstat/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>XQuery Papers</title>
		<link>http://englich.wordpress.com/2007/01/08/xquery-papers/</link>
		<comments>http://englich.wordpress.com/2007/01/08/xquery-papers/#comments</comments>
		<pubDate>Mon, 08 Jan 2007 09:44:22 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[HTML/XML/XHTML]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/01/08/xquery-papers/</guid>
		<description><![CDATA[I&#8217;ve been reading research papers about XQuery recently and I am impressed. I&#8217;ve always had the impression that the amount of papers have significantly increased during the XPath 2.0/XQuery 1.0 &#8220;era&#8221;, but my conviction that the organic nature of XML is hopeless to query and store efficiently has withstood until now &#8212; to mention one [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=38&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading research papers about XQuery recently and I am impressed. I&#8217;ve always had the impression that the amount of papers have significantly increased during the XPath 2.0/XQuery 1.0 &#8220;era&#8221;, but my conviction that the organic nature of XML is hopeless to query and store efficiently has withstood until now &#8212; to mention one of the few interesting discoveries I&#8217;ve done while scanning papers.</p>
<p>But that&#8217;s the positive side of it.</p>
<p><span id="more-38"></span>The major problem is that scientists seems to live in somekind of denial.</p>
<p>For example, <a href="http://citeseer.ist.psu.edu/liefke99xmill.html">XMill: an Efficient Compressor for XML Data</a> discusses various techniques for efficiently compressing XML documents. It compares a log of HTTP traffic compressed and uncompressed in a custom text format, to the same data stored in an XML format, and discusses the effectiveness of different compression techniques applied on the custom format, the XML document, and the XML document. Their compression applied on the XML document(which is a lot more verbose than the custom text format) yeilds a smaller file than the custom format compressed, which is impressive &#8212; nice numbers.</p>
<p>The problem is that in order to reach those compression ratios, one has to use custom compression algorithms and in addition feed a configuration file that teaches the compressor about the XML format being compressed. In other words, it turns the XML document into a custom, binary format which counteracts the precise reason to why XML was chosen in the first place. So no matter how good those compression ratios are, they can&#8217;t be deployed except for within a closed, proprietary network, and hence they will most likely stay on that paper.</p>
<p>Another problem is presentation. Many papers spends a good portion of their text translating terms into formalism and simply other names for things. Sometimes this formalism is necessary, but it always results in requiring quite some effort to decypher a single paper&#8217;s home made symbols, conventions and terms. If all I did was reading papers I presume it wouldn&#8217;t be problematic, but until that happens it is.<a href="http://www.w3.org/TR/xmlschema-1/"></a></p>
<p><a href="http://www.w3.org/TR/xmlschema-1/">W3C XML Schema</a> perhaps gets <a href="http://nothing-more.blogspot.com/2005/06/xsd-aka-w3c-xml-schema.html">criticized</a> the most for being an academic exercise. That&#8217;s for example why I (relatively) enjoyed reading <a href="http://www.w3.org/TR/xproc/">XProc: An XML Pipeline Language</a>. Even though its subject is technical, it doesn&#8217;t try to sound so or academic. It adapts to me as a reader.</p>
<p>A related topic is how good papers are at explaining their theories. Some papers I grasp after the first read, while others I am still trying with. The same applies to Wikipedia entries. Is my brain having a latent glitch? Can it all be explained as that some theories are inherently more difficult to express? I&#8217;d say that a scientist&#8217;s success is not only dependent on what he quotes or factual data he or she have managed to consume, it also depends on this person&#8217;s pedagogical skills.</p>
<p>I have lost count of how many papers I&#8217;ve read that somewhere have sneaked into their discussion about algorithms for axes and storing XML, that processing instructions, comments, namespace instructions, text nodes or something else is ignored. Hello? I&#8217;m sure that&#8217;s a very comfortable way to do things, but it&#8217;s quite difficult to implement that super nifty storing scheme or that super cool axis algorithm for XQuery if the cost is that&#8230; you don&#8217;t implement XQuery.</p>
<p>The same can be seen with some XML parser &#8220;optimizations&#8221;. When it turns out that it&#8217;s done at the cost of not conforming to the XML specification, it&#8217;s not impressive any longer. It&#8217;s just wrong. It&#8217;s not cool to write an algorithm or implement an optimization that ignore reality. It&#8217;s <em>cool</em> to play by the rules and on top of that bring actual improvements, instead of bending the world towards what is practical for the researcher.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/38/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/38/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/38/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/38/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/38/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=38&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/01/08/xquery-papers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
		<item>
		<title>Plain and Heard Before</title>
		<link>http://englich.wordpress.com/2007/01/07/plain-and-heard-before/</link>
		<comments>http://englich.wordpress.com/2007/01/07/plain-and-heard-before/#comments</comments>
		<pubDate>Sun, 07 Jan 2007 17:15:29 +0000</pubDate>
		<dc:creator>englich</dc:creator>
				<category><![CDATA[Human Computer Interaction]]></category>

		<guid isPermaLink="false">http://englich.wordpress.com/2007/01/07/plain-and-heard-before/</guid>
		<description><![CDATA[Celeste is doing historical research on KDE&#8217;s usability. To her request for my comment on things, I modestly replied: I think my effective contributions are modest, although one could say I&#8217;ve tried. But I can of course always express my view. Her response, which made me think, was: Your view is what matters to me, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=37&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Celeste is doing historical research on KDE&#8217;s usability. To her request for my comment on things, I modestly replied:</p>
<blockquote><p>I think my effective contributions are modest, although one could say I&#8217;ve tried. But I can of course always express my view.</p></blockquote>
<p>Her response, which made me think, was:</p>
<blockquote><p> Your view is what matters to me, not some generalized or idealistic view from the usability contributors themselves. I have certainly learned some interesting things from the developers.</p></blockquote>
<p>One could of course take that as a negative comments towards &#8220;the usability contributors&#8221; but I think it was to address a certain problem.</p>
<p><a href="http://aseigo.blogspot.com/2006/12/website-does-not-make-you-smart.html">Aaron wrote</a> in a recent blog entry:</p>
<blockquote><p>[...] It punishes developers like Tim for speaking openly about the challenges we face. the free software community relies on our ability to speak openly and honestly to each other; if we start to get punished for it then we have a real problem.</p></blockquote>
<p>Although the blog entry is in general about a certain article, Aaron is in that paragraph simply pointing out that being able to talk and address issues is dead important.</p>
<p>As reply to one of Celeste&#8217;s questions on KDE&#8217;s usability, I wrote:</p>
<blockquote><p>One thing I admire the GNOME project much of, is their ability to change. They manage to get ideas /implemented/ in their main<br />
line, without getting shot down at the proposal-stage. Those ideas might one disagree with or they are perhaps even downright wrong, but the ability to<br />
change, to test new ideas, is a prerequisite for reaching the right ideas. Progress isn&#8217;t a linear progression of constantly correct changes, and the<br />
working process must be adapted for that.</p>
<p>I can&#8217;t name a particular achievement, but each time a usability idea advances from being a proposal to being tried on the practical level, progress<br />
is happening.</p></blockquote>
<p>which as well merely says &#8220;don&#8217;t shoot down ideas just because they&#8217;re different or sound bad.&#8221; I&#8217;m of course only speculating from my view on things, but it wouldn&#8217;t surprise me if many nods to that it can be difficult to not have an idea stalled as early as when it is a suggestion.</p>
<p>Open Source and Free Software, at least if we go back in time, was a liberator for sick things in the IT industry, and will continue to be so, as long as those values are withheld. But perhaps the community is too consumed with its achievements on the democratic side, to see the sides of itself that fights its own mindset.</p>
<p>Belief fucks up mankind in spectacular ways. &#8220;We just need a revolution from system X to system Y and we will have no more corruption&#8221;, &#8220;It is ok to reduce the democratic rights for Them because They are not Us&#8221;, &#8220;We don&#8217;t have to listen because we are right&#8221;, and other countless examples that demonstrates people thinking there is a difference between people as long as they have a different skin color, operating system,  religion, political system, desktop environment, and so on.</p>
<p>My point is simple and well repeated: openess is important. This time, it&#8217;s being emphasized for the open source community. Things will stall if ideas from GNOME are on mailing lists tuted as evilness, if less technically minded users are What&#8217;s Wrong, if KDE is considered to always be perfect, or if new ideas are shot down for not being what we have. And blogs isn&#8217;t the only way ideas are expressed, what ideas that are implemented in software, is another way as well.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/englich.wordpress.com/37/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/englich.wordpress.com/37/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/englich.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/englich.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/englich.wordpress.com/37/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=englich.wordpress.com&amp;blog=212810&amp;post=37&amp;subd=englich&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://englich.wordpress.com/2007/01/07/plain-and-heard-before/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d6f9c624bb5dedc3b2a1d26c7e33c451?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">englich</media:title>
		</media:content>
	</item>
	</channel>
</rss>
