e-Science logo Nesc logo
About NeSC
e-Science Institute
e-Science Hub
e-Science Events
Presentations & Lectures
Technical Papers
Global Grid Links
UK e-Science Centres
UK e-Science Teams
Career Opportunities
Bibliographic Database


Paper ID: 1362

Grid Based Conversion of Unstructured Data using a Common Semantic Model
Sarah,Bearder Paul,Donachy Terry,Harmer Karen,Loughran Ron H,Perrott Mark,Prentice Jens,Rasch

Appeared in: Proceedings of the UK e-Science All Hands Conference 2004 website: http://www.allhands.org.uk/2004/
Page Numbers:1039 - 1043
Publisher: Engineering and Physical Sciences Research Council
Year: 2004
ISBN/ISSN: 1-904425-21-6
Contributing Organisation(s):
Field of Science: e-Science

URL: http://www.allhands.org.uk/2004/proceedings/papers/166.pdf

Abstract: Managing unstructured data is a problem that has been around for as long as people have been using computers to electronically store and retrieve information. As commercial and social demands for data collection increases so also does the number of formats and structures in which it is stored. Additionally, the sheer volume of data presents challenges for access and conversion in a timely manner. To further compound this problem it is expected that the size of datasets will increase exponentially in the near future with ever increasing demands for information. There is therefore a need to access and convert large quantities of data from a variety of formats in a common, parallel and structured manner. This paper presents the background, motivation and experiences of developing a Common Semantic Model (CSM) to assist in the conversion of unstructured data within the industrial UK E-Science project GEDDM. The model will facilitate the conversion of data residing in a range of formats including email, PDF, web log and various database formats into a common format for subsequent data mining operations. A common approach to this problem along with an architecture for implemention within a Grid environment is presented. A roadmap will be outlined for implementing the model under an OGSA-DAI based framework with a view to supporting access to and integration of a wider range of data sources.

Keywords: e-Science, AHM 2004



Last Updated: 22 Jun 12 11:02
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.