e-Science Workflow Services

3 December 12:00pm - 5 December 3:30pm 2003

e-Science Institute, 15 South College Street, Edinburgh

Organisers:

Dave Berry (NeSC), Savas Parastatidis

 

Abstracts

Putting Workflow and BPM Standards into Context
Sharon Boyes-Schiller, WfMC

 

Workflow Architecture and its Streaming Infrastructure
Geoffrey Fox, University of Indiana

We define a simple workflow architecture and identify the management of the streams linking components as an interesting feature. We discuss an approach to this with distributed control and hence scaling performance.

Pegasus: Planning for Execution in Grids
Ewa Deelman, Center for Grid Technologies, USC Information Sciences Institute

Many of today's Grid applications can be viewed as complex workflows that consist of various transformations performed on the data. In this talk, we describe Pegasus, a workflow management system developed as part of the GriPhyN project. Pegasus is a configurable system that can map complex workflows on the Grid. The workflow description provided to Pegasus is an abstract workflow, where the activities are described as logical transformations performed on logical files. This type of abstract workflow can be generated using the GriPhyN Chimera system. Pegasus maps this abstract description onto an executable workflow and hands it off to Condor's DAGMan for execution. Pegasus can take into account dynamic information about existing data products as well as the available system resources. To date Pegasus has been used in a variety of data-intensive applications ranging from high-energy physics, gravitational-wave physics, astronomy and others.

Kepler: A Workflow Tool for Heterogenous Ecological Data Analysis
Chad Berkley, USA National Center for Ecological Analysis and Synthesis (NCEAS)

The Science Environment for Ecological Knowledge (SEEK) project, in collaboration with the Scientific Data Management (SDM) Center, is currently developing a flexible workflow system called Kepler to process and ingest heterogeneous ecological data from ecologists and other domain scientists. Kepler will provide access to data and computational services through emerging Grid technologies and will provide services for optimizing workflow execution. Finally, it will introduce an advanced system for semantic typing and ontological reasoning to assist users in workflow creation and execution.

Kepler: Scientific Workflows Based on Dataflow Process Networks
Bertram Ludaescher, SDSC

Based on experiences with scientific workflows from different domains (genomics, ecology, neuroscience, and geosciences) gained in various projects, we argue that actor-oriented dataflow process networks provide a more suitable formalism than conventional business-oriented workflow approaches. The technical challenges in scientific workflows include the heterogeneity, complexity, volume, and physical distribution of scientific data. In addition, the scientific workflow designer is faced with the difficult task to compose complex "analysis pipelines" from analytical steps, as well as data transformation and querying steps. This composition of (legacy) components into larger scientific workflows based on specific models of computation is precisely the strength of dataflow process networks (and a weakness of business workflow languages). We also introduce a cross-project collaboration called "Kepler" that is building an open source scientific workflow system based on the Ptolemy-II system.

Specifying Scientific Applications by Means of Scientific dataflow
Eric Simon, INRIA

In scientific applications, a typical task of a scientist is to specify and implement a "Data Processing chain" (DP chain) that consists of input data sets and successive data transformation steps accomplished by scientific programs such as image processing programs, modelling and simulation programs, or visualization programs. After reviewing the problems met by scientists with the specification and implementation of DP chains, we propose scientific dataflow as a possible common notation for the specification of DP chains. Scientific dataflow add flexibility to scientific applications by enabling a declarative specification of DP chains and facilitating their partial reuse. Scientific dataflow also facilitate the publication of data transformation programs and derived data sets, and can be a vehicle to assess the quality of derived data sets. The talk will present work in progress on scientific dataflow that is performed in the frame of two European projects on environmental information systems (Thetis and Decair). Several examples taken from concrete situations will illustrate and motivate our approach.

Requirements for Complex Interactive Workflows in Biomedical Research
Jeff Grethe, UCSD

The sequence of steps needed to conduct biomedical research spans a broad range and an often complicated sequence of activities ranging from data collection via specialized instruments to data processing and analysis to deposition of annotated results into community databases. A general requirement of any computing infrastructure is to provide a flexible solution for researchers to assemble and manage these scientific workflows. These may bring data from disparate resources, process and analyze that data, visualize it, and then perhaps repeatedly iterate through parts of the cycle. Workflows may take many forms, ranging from persistent processes with a stable sequence of actions to dynamic or iterative processes with multiple instances for conditional refinement. These workflows present unique requirements (e.g. interactivity, human subject's protection, data validation, etc.) that must be addressed to provide researchers with transparent access to a computing environment that supports their natural working paradigm while taking advantage of the emerging international cyberinfrastructure.

Providing Web Service Coordination to Bioinformaticians
Matthew Addis, IT Innovations

As web service technology matures there is growing interest in exploiting workflow techniques to coordinate web services. Bioinformaticians are a user community who combine web resources to perform in silico experiments. These users are scientists and not information technology experts; they require workflow solutions which have a low cost of entry for service users and providers, including ease of use and open source tool support.

As a result, the EPSRC funded myGrid project has, in collaboration with the European Bioinformatics Institute and the Human Genome Mapping Project, developed a graphical toolset and workflow enactor which uses its own high level representation of a process flow; the Simple conceptual unified flow language (Scufl). The extensibility of Scufl, supported by these tools, means that workflow and use of web services can be matched to how users view their problem. Users see Scufl through the Taverna workbench (http://taverna.sourceforge.net) for authoring, editing and testing workflows. Taverna uses the Freefluo enactment engine (http://freefluo.sourceforge.net) for workflow execution. Taverna gives users an environment for browsing resources available on the web, constructing workflows that combining these resources, and testing them out in a way that is excellent for the exploratory, information gathering workflows that are our prime concern.

The alignment of workflow to the users' conceptual model for expressing their problem is key to bridging the gap between the needs of the scientist and the current world of Web Services, which is replete with multiple, overlapping and low-level standards for Web Service coordination, yet relatively barren in terms of easy-to-use open-source tool support. This is where the real success of Scufl, Taverna and Freefluo lies since users need very little explanation on how to use the tool and language; they just see a tool that uses applications on the Web as they would expect.

Service Workflow : Programming the Grid
Prof. Yike Guo, Department of Computing, Imperial College London

Workflow is becoming an important research field in various areas including analytical computing, business process management and high performance computing. On the one hand, workflow provides a mechanism of representing process knowledge. On the other hand, workflow offers a very flexible model for programming distributed computational resources such as the GRID. In this presentation, Service Workflow is proposed as a uniform model for building compositional services in the context of grid computing. Especially, we will discuss on various key technical issues in developing a practical Service Workflow system. We will also demonstrate some real work applications being developed in the UK e-science pilot project, Discovery Net.

DAME: Workflow Requirements and Implementation
Tom Jackson, University of York

The presentation will describe the workflow requirements for the demonstrator system being built for the DAME (Distributed Aircraft Maintenance Environment) project. As well as describing some example workflows, we will discuss the mechanisms used to specify and capture workflow requirements. We also describe the current workflow implementation which deploys a workflow engine and resource broker, illustrating how the workflow engine is responsible not only for workflow enactment but also for security and role management. We will conclude by identifying a number of emerging requirements for workflow management that will need to be addressed in the future versions of the demonstrator system.

Hierarchical Task Network Planning for Grid/Web Services Composition and Workflow
Austin Tate, AIAI, University of Edinburgh

Hierarchical Task Network (HTN) Planning offers an approach for web and grid services composition. It allows a task specification or outline "plan" to be expanded or refined using entries from a library of well worked out components (example abstract workflows) or descriptions of primitive services steps. AI planning methods allow for interactions between steps to be resolved, unsatisfied (pre-)conditions to be achieved, temporal, resource and other constraints to be checked and catered for, etc. two example planners which could form the basis for a new web and grid services workflow composition tool will be described: O-Plan and I-X.

Underlying these systems is the (Issues, Nodes, Constraints and Annotations) ontology which offers an effective and extendible representation to support the generation, refinement, analysis and enactment of workflows. O-Plan is an HTN planner created in the period 1983-1999 that is already running as a web service with an HTTP interface and which can support various plan and workflow composition tasks. I-X is a new conceptually simpler approach that can also use HTN planning in a composition and dynamic enactment context. It is based on a simple architecture of handling issues and checking constraints.

Ordering and Time-dependency in Workflows
John Brooke, SVE group Manchester Computing

Workflow management systems have the ability to express ordering of tasks in workflows via support for Directed Acyclic Graphs (DAG). However scientific applications may have needs beyond this, in particular the ability to express synchronicity between components of the workflow. We examine some real situations in which this is necessary, looking at some of the use cases presented by the RealityGrid EPSRC pilot project. These cases involve workflow where different tasks or components are deployed for computation, visualization and steering control. There is explicit synchronicity between such components which must be expressed in the workflow in an abstract manner since the components may be dynamically migrated between different physical resources during a running enactment of the workflow.

We discuss the support necessary for time-dependency of workflows at both compile-time and run-time and issues arising from the storage and re-enactment of such workflows. We briefly discuss the importance of Grid schedulers and resource brokers in the context of providing this support.

JISGA: A Jini-Based Service-Oriented Grid Architecture and Its Workflow Language
Yan Huang, Cardiff University

This presentation will introduce a Service-based Workflow Description Language (SWFL), and explain how an application composed of interacting services can be described in SWFL. The conversion of a composite application specified in an SWFL document into an executable Java code, and its subsequent execution, are also discussed. JISGA, a workflow engine and supporting execution environment for SWFL-specified applications, will be introduced, and its functionality, main services and components will be described. The presentation will look at the processing of scientific workflows that involve parallelism.

'Workflow' Issues in Data Access and Integration: An OGSA-DAI/DAIS perspective
Mario Antonioletti, EPCC, University of Edinburgh

OGSA-DAI and the GGF DAIS WG are in the process of specifying and implementing a framework that allows data resources to be exposed to OGSA Grids. To reduce data movement an extensible document based interface has been developed that attempts to minimise the number of interactions with a data service by allowing data to flow between activities that represent the data service capabilities. With data integration, more complex scenarios arise which involve elements of workflow. This presentation will give an overview of the solutions that have been adopted, their strengths, and their weaknesses.

Workflow and Triana Services
Matthew Shields, Cardiff

Triana is a workflow based data analysis tool also called a Problem Solving Environment. This presentation will follow the use of Triana in composing and executing workflows through Triana's distributed architecture. We will describe Triana's workflow format and mechanisms as well as Triana's ability to read and write other workflow formats such as BPEL4WS.

Virtual Triana services, the basic building block of TrianaÕs distributed architecture, are described along with an overview of how these resources are managed within a distributedenvironment. The middleware-independent nature of this implementation, through the use of an application-driven API called the GAT, is then outlined. The implementation can work within both P2P and Grid computing scenarios through the use of GAT adapter (called the GAP) for advertising, discovering and communication with Web and P2P services, and soon through the Globus bindings within the GAT engine. The GAP adapter supports bindings to Web Services and P2P infrastructures, such as JXTA and a simplified P2P technology, called P2PS (P2P Simplified). High-level Triana distribution mechanisms, which use this underlying infrastructure, are given for both the high throughput and pipelined distributed execution of sub-task-graphs.

Workflow and Job Control in Astrogrid
Jeff Lusted, Leicester University

Astrogrid is the UK's first virtual observatory. It's prime aim is to provide astronomer's with a system that can be used to design and run complex workflows to search astronomical archives, and to process the results of those searches. The archives are inherently distributed between data centers. Workflows consist of multiple steps that can be executed in parallel or in sequence, or in any combination of such, and potentially across different sites. Each step executes a tool on a datacenter, whether these are search tools or tools aimed at manipulating the results of searches. Resources are passed from step to step via a virtual distributed resource system within Astrogrid known as VOSpace. The talk will outline the techniques we have developed so far to describe a workflow and also to control the execution environment.

Complex job-shop scheduling problems and applications
Peter Brucker

A semantic based workflow management in a virtual organization
Jessica Chen-Burger, AIAI, University of Edinburgh

Modern organizations are virtual entities. People working in organizations are located in different places, each with different capabilities and responsibilities. They need to work collaboratively to accomplish tasks and together to achieve common organizational goals. Tasks that are required to be accomplish are often not trivial but require specialized expertise and sophisticated technologies that are based on local knowledge and experiences. It, therefore, can not be taken for granted that the co-ordination among distributed sites are always carried out smoothly and effectively. Workflow and Business Process Modeling techniques are well-recognised for their values of promoting and achieving effectiveness and efficiency of co-ordination of distributed organizational operations. In this talk I will present Edinburgh's effort in providing a semantic based workflow management to help such collaborative effort within a virtual organization.

Web Services Choreography
Guus Ramackers, ORACLE

This presentation describes the goals underlying the recent work of the W3C Web Services Choreography Working Group in defining a standard for the coordination of web service transactions that involve multiple parties. Web Service Choreography is expected to be a major component of future web architectures. The basic concepts of the WS-CDL proposal made by Oracle and others will be discussed, and illustrated with an example based on a simple Buyer/Seller use case. Detailed concepts such as Token, Channel, State and Reaction that are necessary to define and monitor the dynamics of a business choreography are also discussed.

Chimera: a virtual data model for workflow specification
Mike Wilde, Argonne National Laboratory

Much scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. The explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data").

To explore this idea, we have developed the Chimera Virtual Data System which provides a catalog that can be used by application environments to describe a set of application programs ("transformations"), and then track all the data files produced by executing those applications ("derivations"). Chimera contains the mechanism to locate the "recipe" to produce a given logical file, in the form of an abstract program execution graph.

These abstract graphs are then turned into a Grid-executable form by the Pegasus planner which is described in the following talk.