Vectorising and Querying XML

26 April, 2004 04:00 PM - 05:30 PM

e-Science Institute, 15 South College Street, , Edinburgh


Dr Dave Berry


Any slides or other material generated as a result of this event can be found at:

Vectorising and Querying XML

Most current implementations of XQuery are either "toy" -- they break
on large documents or they are "fake" -- they are SQL masquerading as
XQuery. We shall describe some recent work on a native XML store and an interpreter for a useful subset of XQuery that scales in the way one would expect of a database query language. Preliminary results on a large-ish (80GB) data set show that the techniques produce performance which is comparable with well-tuned SQL queries running on the same data in a commercial RDBMS.

The technique is based on a combination of two existing ideas. The first is to extend a very old idea of using column-based storage of tabular data to the storage of XML. An XML document is separated into a "skeleton", which describes the structure of the document and a set of "vectors", which are the sequences of data values appearing under all paths bearing a given sequence of tag names. The second idea is to generate a query-friendly compressed version of the skeleton.

The talk will describe vectorisation, skeleton compression, query evaluation and some preliminary results. It will also include a brief dicussion of issues with XML schematas (small "s".) The work has involved contributions from a substantial fraction of the Database Group at Edinburgh: Peter Buneman, Byron Choi, Wenfei Fan, Rob Hutchison, Bob Mann, and Stratis Viglas. The talk will probably be given by Peter Buneman and Byron Choi.


