|NeSC Bibliographic Database|
GEDDM: Comparisons of OGSA-DAI and GridFTP for access to and conversion of remote unstructured data in legal data mining.
Karen,Loughran Mark,Prentice Paul,Donachy Terry,Harmer Ron H,Perrott Sarah,Bearder Jens,Rasch
Appeared in: Proceedings of the UK e-Science All Hands Conference 2005 website: http://www.allhands.org.uk/2005/
Publisher: Engineering and Physical Sciences Research Council
Field of Science: e-Science
Abstract: Managing unstructured data is a problem that has been around for as long as people have been using computers to electronically process information. As demands for data collection increases so does the number of formats and structures in which it is stored, presenting inherent problems for data mining applications. Additionally, the sheer volume of data presents challenges for access and conversion in a timely manner. To further compound this problem the size of datasets will increase exponentially in future. There is therefore a need to access and convert large quantities of data from a variety of formats in a common, parallel and structured manner. GEDDM is a collaborative industrial e-Science project in conjunction with BESC and industrial partners Datactics Ltd. A Common Semantic Model (CSM) is defined to assist with the representation and conversion of data from various sources. This model facilitates the conversion of data residing in a range of formats into a common format for subsequent data mining. The project exposes CSM conversion capabilities via a suite of Grid Services called Data Conversion Services (DCS). This paper presents two implementations of the DCS. One under OGSA-DAI and another under GridFTP. Implementation and results are discussed, evaluated and conclusions presented.
Keywords: e-Science, AHM 2005
|Last Updated: 22 Jun 12 11:02|