e-Science logo Nesc logo
About NeSC
e-Science Institute
e-Science Hub
e-Science Events
Presentations & Lectures
Technical Papers
Global Grid Links
UK e-Science Centres
UK e-Science Teams
Career Opportunities
Bibliographic Database


Paper ID: 1525

Automating Metadata Extraction: Genre Classification
Yunhyong,Kim Seamus,Ross

Appeared in: Proceedings of the UK e-Science All Hands Conference 2006 website: http://www.allhands.org.uk/2006/
Page Numbers:385 - 389
Publisher: National e-Science Centre
Year: 2006
ISBN/ISSN: 0-9553988-0-0
Contributing Organisation(s):
Field of Science: e-Science

URL: http://www.allhands.org.uk/2006/proceedings/papers/663.pdf

Abstract: A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF. Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

Keywords: e-Science, AHM 2006



Last Updated: 22 Jun 12 11:02
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.