Title: CHIC - Converting Hamburgers Into Cows
Authors: Townsend, Joseph A
Keywords: workflow
semantics
conversion
SAF
XML
Issue Date: Dec-2009
Abstract: We have developed a methodology and workflow (CHIC) for the automatic semantification and structuring of legacy textual scientific documents. CHIC imports common document formats (PDF, DOCX and (X)HTML) and uses a number of toolkits to extract components and convert them into SciXML. This is sectioned into text-rich and data-rich streams and stand-off annotation (SAF) is created for each. Embedded domain specific objects can be converted into XML (Chemical Markup Language). The different workflow streams can then be recombined and typically converted into RDF (Resource Description Format).
URI: ftp://pubftp.computer.org/Press/Outgoing/proceedings./Patrick/e-science09/data/3877a337.pdf
http://www.dspace.cam.ac.uk/handle/1810/223837
Appears in Collections:Scholarly Works - Unilever Centre for Molecular Informatics

Files in This Item:

File Description SizeFormat
IEEE-talk.pptx2.55 MBUnknownView/Open
Additional resources for this item
retrieve citation metadata in EndNote format

This item has been accessed 1272 times.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.