Vectorising and Querying XML

26 April, 2004 04:00 PM - 05:30 PM

e-Science Institute, 15 South College Street, , Edinburgh


Dr Dave Berry


Any slides or other material generated as a result of this event can be found at:

Vectorising and Querying XML

Most current implementations of XQuery are either "toy" -- they break
on large documents or they are "fake" -- they are SQL masquerading as
XQuery. We shall describe some recent work on a native XML store and an interpreter for a useful subset of XQuery that scales in the way one would expect of a database query language. Preliminary results on a large-ish (80GB) data set show that the techniques produce performance which is comparable with well-tuned SQL queries running on the same data in a commercial RDBMS.

The technique is based on a combination of two existing ideas. The first is to extend a very old idea of using column-based storage of tabular data to the storage of XML. An XML document is separated into a "skeleton", which describes the structure of the document and a set of "vectors", which are the sequences of data values appearing under all paths bearing a given sequence of tag names. The second idea is to generate a query-friendly compressed version of the skeleton.

The talk will describe vectorisation, skeleton compression, query evaluation and some preliminary results. It will also include a brief dicussion of issues with XML schematas (small "s".) The work has involved contributions from a substantial fraction of the Database Group at Edinburgh: Peter Buneman, Byron Choi, Wenfei Fan, Rob Hutchison, Bob Mann, and Stratis Viglas. The talk will probably be given by Peter Buneman and Byron Choi.


The National e-Science Centre, e-Science Institute in Edinburgh is a centre for education and research for e-Science, and provides new state-of-the art facilities including an Access Grid system.


Registration for this event is now closed. To enquire about an application or to cancel a previous application please contact NeSC Administration.


Enquiries should be made directly to our Conference Administrator.

Travel: The e-Science Institute is less than 15 minutes walk from Waverley rail station, and from St Andrews square bus stations. It is approximately 20 minutes by taxi from Edinburgh airport (40 minutes by bus). Please see our web site for a map of the area.


This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.