Toward a common data and command representation for quantum chemistry

05 April, 04 10:00 - 06 April, 04 16:00

e-Science Institute, 15 South College Street, , Edinburgh


Dr Philip Couch


Any slides or other material generated as a result of this event can be found at:

The realisation of Grid technologies has provided a strong technological framework for the interoperability of computer codes. This has significant benefits for many scientific communities, including those of quantum chemistry. However, communication between such codes is hindered by the use of their many different file formats. Consideration of a common data and command representation would alleviate this difficulty.

In addition, the storage of data in a 'universal' format with suitable meta-data would simplify its analysis, interpretation and appropriate re-use. This meeting aims to address issues associated with the implementation of such a representation, including: design implications, software tools and existing efforts such as the XML-based Chemical Markup Language.



Registration for this event is now closed. To enquire about an application or to cancel a previous application please contact NeSC Administration.


Provisional Agenda

Please note that this agenda is subject to change.

Monday 5th April

09.30 - 10.00 Refreshments and registration
10.00 - 10.35 Introduction to the meeting - Dr. Philip Couch, Daresbury Laboratory, CLRC
10.35 - 11.10 Prof. Kim Baldridge, San Diego Supercomputer Centre
  Representation of Computational Quantum Chemistry Data in a Structured Format and Incorporation into a Scientific Workflow
Accessing, processing, and querying data obtained from computational chemistry simulations requires that the data be stored in a rational, structured form. At present, data is typically output in a form that is "human" readable, but is not suitable for database storage or for transferring data from one program to another. XML documents provide a format that is readily adaptable to database storage and may serve as a framework for representing data as a serialized object that may be readily transferred from databases to a series of simulation and analysis codes. Here, we will describe our efforts in designing and implementing a mechanism for producing XML output from a quantum chemistry program (GAMESS), passing the data in the form of an XML document to associated analysis codes, our initial efforts in designing a quantum chemistry database, and our implementation of a scientific workflow environment in order to initiate a series of computations on remote resources and return the data to the user.
11.10 - 11.25 Refreshments
11.25 - 12.00 Dr. Theresa Windus, Pacific Northwest National Laboratory, Richland, Washington

Data Management and Representations in Ecce and CMCS

The Extensible computational chemistry environment (Ecce) is a sophisticated problem solving environment that enables scientists to efficiently set up and run calculations and store, retrieve, and analyze data produced by computational chemistry studies. Data representations are available for the setup parameters, runtime environment, as well as the output properties. The Collaboratory for Multi-scale Chemical Science (CMCS) is an environment for enabling chemical information to be communicated, translated and annotated across several chemical scales. Enabling a dynamic environment in which to perform new informatics based manipulations is the ultimate goal of this project. The initial scales are the molecular (computational, ab initio data), thermochemical, kinetic, kinetic mechanism, and the numerical simulation scales (including computational and experimental data). This talk will present the data involved, the formats used to describe this data, the pedigree information associated with the data, and the collaboratory infrastructure and portal that enable researchers to access, annotate and manipulate the data.

12.00 - 12.35 Dr. Elda Rossi, CINECA, Bologna, Italy
  Looking for a (Standard) Common Format for Computational (Quantum) Chemistry
  This talk is about the activity recently carried out in the framework of "COST in Chemistry", EC funded project The final aim is to provide a "workflow" tool to allow researchers to collaborate by exchanging different programs. To this end, the first problem we faced, and that is the core of the presentation, was that of defining a common format for Quantum Chemistry programs. An XML-based format is proposed, designed to describe in a quite general way a Quantum Mechanical system. This format is used for a repository where all data on the system under investigation are maintained. From the repository, data are retrived and converted to the input stream of the specific program to be run. The conversion is done by a wrapper code, specifically designed for each single program. Two possible ways to write the wrappers are discussed, using respectively the Fortran and Python programming language.
12.35 - 13.35 Lunch
13.35 - 14.10 Prof. Peter Murray-Rust, Unilever Center for Molecular Informatics, Cambridge University
  An Architecture for Computational Chemistry
  Computational chemistry is crippled by non-interoperability at all levels: encoding, syntax, semantics and ontological. XML addresses all these problems, solving the first two completely and providing a framework for collaborative action in the others. Chemical Markup Language CML) supports many subdomains (Molecules, Reactions, Crystallography and Condensed Matter, Spectra, and Computation "CMLComp"). These are described as components in XMLSchema and the current design allows a "mix-and-match" approach to creating a schema for a given purpose (e.g. computational solid state). Ontology is added through XMLSchema-like "dictionaries", which are flexible and extensible. Each code (MOPAC, GULP, SIESTA, ...) has its own dictionary describing the concepts and constraints on information, and much of this is through involvement of authors and community activity. This leads directly to dictionary-aware libraries that are embedded in the codes which communicate with a CMLDOM In this way codes can communicate directly without information loss. Dictionaries can be combined or linked so the community can grow its ontology in an evolutionary fashion. Language-independence is provided by automatic generation of custom CMLDOMs from the MLSchema; so far Java, C++, Python and F90 have been explored. When the source of a code is not accessible (e.g. for legal reasons) we have developed template-driven parsers (JumboMarker) that translate output logs into structured XML and hence to CMLComp. [All material ("Jumbo") will be available and OpenSource].
14.10 - 15.40 Practical sessions
15.40 - 16.00 Refreshments
16.00 - 17.30 Practical sessions
19.00 Conference dinner


Tuesday 6th April

09.00 - 09.35 Dr. Jon Wakelin, Department of Earth Sciences, Cambridge University
  Markup for Computational Chemistry and Physics: Implementation and Issues
09.35 - 10.10 Wayne Boucher, Department of Biochemistry, Cambridge University
10.10 - 10.25 Refreshments
10.25 - 11.00 Shoaib Sufi, Daresbury Laboratory, CLRC
  CCLRC Scientific Metadata Model
11.00 - 11.35 Prof. Alberto Garcia, Departamento de Fisica de la Materia Condensada, Universidad del Pais Vasco.

A native Fortran XML parser: Design and applications in scientific computing.

The parser has been designed to be a useful tool in the extraction and analysis of data in the context of scientific computing. There are two programming interfaces. The first is based on the SAX model: the parser calls routines provided by the user to handle certain events, such as the encounter of the beginning of an element, or the end of an element, or the reading of character data. This interface is enhanced with routines to build data arrays directly from the data. The other approach is based on the XPATH standard. Only a limited set of the full XPATH specification has been implemented, but it is already enough to make it quite useful. Two examples of applications will be discussed: the processing of an output data file from the Siesta ab-initio code, and the efforts towards implementation of a unified pseudopotential file format and handling library for first-principles calculations. The talk will also discuss the implications of the recent work by Jon Wakelin on an implementation of a DOM framework on top of the parser's SAX API.

More information:

11.35 - 12.10 Dr Martin Westhead, EPCC, Edinburgh 
  The Data Format Description Language (DFDL)
  The Data Format Description Language is a standard under development in the GGF to provide a uniform XML description of data formated in non-XML ways such as binary or text. The aim of the language is to capture structural and semantic information about the contents of binary and text files to facilitate conversion, manipulation, archiving and application specific description of data. This presentation outlines the aims of the working group and place it in context with other technologies in this area, including BinX. It will also describe the status of the standard. More information on DFDL can be found at: More information on BinX can be found at:
12.10 - 13.10 Lunch
13.10 - 14.25 Open discussion
14.25 - 14.40 Refreshments
14.40 - 16.00 Open discussion
16.00 Meeting close


The event will be hosted by the e-Science Institute in Edinburgh which is a centre for education and research for e-Science, and provides new state-of-the art facilities including an Access Grid system.

Travel: The e-Science Institute is less than 15 minutes walk from Waverley rail station, and from St Andrews square bus stations. It is approximately 20 minutes by taxi from Edinburgh airport (40 minutes by bus). Please see our web site for a map of the area.


This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.