Paper ID: 1362

Grid Based Conversion of Unstructured Data using a Common Semantic Model
Sarah,Bearder Paul,Donachy Terry,Harmer Karen,Loughran Ron H,Perrott Mark,Prentice Jens,Rasch

Appeared in: Proceedings of the UK e-Science All Hands Conference 2004 website: http://www.allhands.org.uk/2004/
Page Numbers:1039 - 1043
Publisher: Engineering and Physical Sciences Research Council
Year: 2004
ISBN/ISSN: 1-904425-21-6
Contributing Organisation(s):
Field of Science: e-Science

URL: http://www.allhands.org.uk/2004/proceedings/papers/166.pdf

Abstract: Managing unstructured data is a problem that has been around for as long as people have been using computers to electronically store and retrieve information. As commercial and social demands for data collection increases so also does the number of formats and structures in which it is stored. Additionally, the sheer volume of data presents challenges for access and conversion in a timely manner. To further compound this problem it is expected that the size of datasets will increase exponentially in the near future with ever increasing demands for information. There is therefore a need to access and convert large quantities of data from a variety of formats in a common, parallel and structured manner. This paper presents the background, motivation and experiences of developing a Common Semantic Model (CSM) to assist in the conversion of unstructured data within the industrial UK E-Science project GEDDM. The model will facilitate the conversion of data residing in a range of formats including email, PDF, web log and various database formats into a common format for subsequent data mining operations. A common approach to this problem along with an architecture for implemention within a Grid environment is presented. A roadmap will be outlined for implementing the model under an OGSA-DAI based framework with a view to supporting access to and integration of a wider range of data sources.

Keywords: e-Science, AHM 2004



Last Updated: 22 Jun 12 11:02
