Data Provenance and Annotation

1 December 12:30am - 3 December 12:00pm 2003

e-Science Institute, 15 South College Street, Edinburgh


Dave Berry (NeSC), Peter Buneman, Michael Wilde, and Yannis Ioannidis


This workshop is a follow-up to a workshop at Argonne National Labs in October 2002 ( It will further investigate the issues of data provenance, data derivation, and data annotation.

These issues are important to many aspects of scientific computation. In molecular biology, where data is repeatedly copied, corrected, and transformed as it passes through numerous genomic databases, understanding where data has come from and how it arrived in the user's database is of crucial to the trust a scientist will put in that data, yet this information is seldom captured properly. In astronomy, useful results may have been been obtained by filtering, transforming, and analyzing some base data by a complex assemblage of programs, yet we lack good tools for recording how these programs were connected and the context in which they were run.

The importance of provenance goes well beyond verification. It is closely related to archiving and annotation, also important in the context of scientific data. Moreover it may be used in data discovery. Knowing the provenance of a data item may help the biologist to make connections with other useful data. The astronomer may want to understand a derivation in order to repeat it with modified parameters, and being able to describe a derivation may help a researcher to discover whether a particular kind of analysis has already been performed.

Annotation is closely related to provenance. Researchers do more than produce and consume data: they comment on it and refer to it, and to the results of queries upon it. Annotation is therefore an important aspect of scientific communication. One researcher may want to highlight a point in data space for another to investigate further. They may wish to annotate the result of a query such that similar queries show the annotation.

All these issues raise fundamental questions of data management and integration. This workshop will bring together researchers who are actively working ono these questions.


This event is by invitation only. Participants will include researchers who have confronted issues of data derivation, data provenance or data annotation, either in specific situations or in the development of generic principles and technology. If you have received an invitation and would like to attend please register using the link below.

Please note online applications will not be available after the 24th November 2003. Thereafter registration enquiries should be made directly to our Conference Administrator or the event organiser.

Draft programme:

Monday 1 December 2003

12:30pm - 1:30pmLunch and Registration
1:30pm - 3:00pmWelcome (30 minutes)
Survey. Coarse-grain and workflow characterisations of provenance (1 hour)
3:30pm - 4:00pmTea
4:00pm - 6:00pmSurvey. Fine-grain and database characterisations of provenance (30 minutes)
Survey. Annotations of special structures: Images, Sequences, Ontologies(?). (30 minutes)
Survey. Archival characterisations of provenance Breakout groups and planning. (30 minutes)
7:30pmConference Dinner
Tuesday 2 December 2003

9:00am - 10:30amSurvey. Legal and Ownership issues. Relationship to provenance. (30 minutes)
Breakout Sessions (1 hour)
10:30am - 11:00amCoffee
11:00am - 12:30pmBreakout Sessions (90 minutes)
12:30pm - 1:30pmLunch
1:30pm - 3:00pmPanel discussion 1. What is Data Curation? (45 minutes)
Panel discussion 2. Producers and consumers of annotation tools (45 minutes)
3:00pm - 3:30pmTea
3:30pm - 5:00pmTalks, discussions and demos.
Wednesday 2 December 2003

9:00am - 10:30 amReports from the breakout sessions.
10:30am - 11:00 amCoffee

The event will be hosted by the e-Science Institute in Edinburgh which is a centre for education and research for e-Science, and provides new state-of-the art facilities including an Access Grid system.

Travel: The e-Science Institute is less than 15 minutes walk from Waverley rail station, and from St Andrews square bus stations. It is approximately 20 minutes by taxi from Edinburgh airport (40 minutes by bus). Please see our web site for a map of the area.