e-Science logo Nesc logo
 
 
About NeSC
e-Science Institute
e-Science Hub
TOE
Contacts
e-Science Events
Resources
Newsroom
Presentations & Lectures
Technical Papers
Global Grid Links
Projects
UK e-Science Centres
UK e-Science Teams
Career Opportunities
Bibliographic Database
 

 

Paper ID: 1667

Scalable clustering on the Data Grid
Patrick,Wendel Moustafa,Ghanem Yike,Guo

Appeared in: Proceedings of the UK e-Science All Hands Conference 2005 website: http://www.allhands.org.uk/2005/
Page Numbers:
Publisher: Engineering and Physical Sciences Research Council
Year: 2005
ISBN/ISSN: 1-904425-53-4
Contributing Organisation(s):
Field of Science: e-Science

URL: http://www.allhands.org.uk/2005/proceedings/papers/440.pdf

Abstract: Even within the e-Science Grid infrastructures, mining distributed data sets still remains a challenge. We present a framework for distributed clustering where the data set is partitioned between several sites and the output is a mixture of Gaussian models. The data providers generate clustering models using different clustering techniques and return it to one central site, which then uses these models as starting observations for EM iterations to build the final model. An initial version of the framework has been implemented and deployed on the Discovery Net infrastructure. We present empirical results that show the advantages of this approach and that the accuracy of the final model is preserved for very large distributed data sets.

Keywords: e-Science, AHM 2005


BIB DOC HTM HTML PDF PPT PS RTF TEX TXT ZIP




 

Last Updated: 22 Jun 12 11:02
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.