|NeSC Bibliographic Database|
Using hand-crafted rules and machine learning to infer SciXML document structure
Appeared in: Proceedings of the UK e-Science All Hands Conference 2007 website: http://www.allhands.org.uk/2007/
Publisher: National e-Science Centre
Field of Science: e-Science
Abstract: SciXML is designed to represent the standard hierarchical structure of scientific articles and promote interoperability in text-mining components. We describe a new system for inferring SciXML from a presentational level of description, such as PDF. General purpose components, including expert hand-coded rules, supervised machine learning and the intuitive tags of SciXML combine to provide an effective adaptation strategy for new unseen journal styles. The error reduction rate is almost 50%.
Keywords: e-Science, AHM 2007
|Last Updated: 22 Jun 12 11:02|