<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Collection:</title>
    <link>http://www.dspace.cam.ac.uk:80/handle/1810/196184</link>
    <description />
    <pubDate>Wed, 22 May 2013 13:15:39 GMT</pubDate>
    <dc:date>2013-05-22T13:15:39Z</dc:date>
    <item>
      <title>SPECTRa-T  / TheOREM Test Corpus</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/219208</link>
      <description>Title: SPECTRa-T  / TheOREM Test Corpus
Authors: Day, Nick; Townsend, Joseph A
Description: These theses were used as test documents in the JISC sponsored SPECTRa-T and TheOREM projects, the former looking at text mining from thesis documents, the latter researching techniques for describing the structure of theses in the OAI ORE standard.</description>
      <pubDate>Mon, 01 Jan 2007 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/219208</guid>
      <dc:date>2007-01-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>An Introduction to SPECTRa</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/219189</link>
      <description>Title: An Introduction to SPECTRa
Authors: Downing, Jim
Abstract: SPECTRa is delivering tools that enable chemists to prepare and submit Open Data into DSpace Institutional Repositories
Description: This poster was used as a presentation to the NEREUS consortium at the University of Maastricht.</description>
      <pubDate>Sat, 30 Sep 2006 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/219189</guid>
      <dc:date>2006-09-30T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Maximum Entropy Models for Text mining from the Life Sciences Literature</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/218855</link>
      <description>Title: Maximum Entropy Models for Text mining from the Life Sciences Literature
Authors: Nikolov, Nikolay
Abstract: The life sciences nowadays are characterized by rapid growth. Due to the huge number of publications per year – in the hundreds of thousands and growing – it is becoming increasingly difficult for the researchers to stay abreast of the latest developments. Thus, automated methods of analysing the scientific information grow in importance. &#xD;
Text mining in the Life Sciences aims at extracting information from textual data (usually abstracts or full texts of scientific publications, but also non-publications like clinical histories or patents). It normally involves some kind of machine learning technique that requires training data from the given thematical domain. &#xD;
Our case study concerns the automatic identification of chemical named entities (e.g. compounds, reaction names) from the life science literature. We investigate the impact of the data heterogeneity on the performance of Maximum Entropy Markov models and explore possible solutions to this problem. &#xD;
This is, to the best of our knowledge, the first study to explore thematical heterogeneity in the chemistry-related life science literature and its impact on named entity recognition. Thus it is necessarily general - its role is to collect evidence, establish basic facts and explore possible solutions.&#xD;
In doing so, our study suggests that the genre structure is especially important for high precision recognition. It also suggests that a system aiming at recall, rather than precision, transferring training data from one domain to another is a useful strategy (especially in respect to the domains having smaller training datasets). &#xD;
But, most importantly, this study provides motivation for a model that explicitly models the thematic heterogeneity of the life science literature. It explores possible solutions and the practical issues of such implementation.
Description: This is supporting data and software for an MPhil project report submitted on 2009-08-18 by Nikolay Nikolov. The data should be used in conjunction with the OSCAR3 software as described in the project report</description>
      <pubDate>Thu, 17 Sep 2009 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/218855</guid>
      <dc:date>2009-09-17T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Web Feeds and Repositories</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/206424</link>
      <description>Title: Web Feeds and Repositories
Authors: Downing, Jim
Abstract: Web feeds are an important way of adding value to repository resources. This presentation introduces web feeds, shows some examples of what it's possible to publish and consume, and some technical details of using conditional GET, archived feeds etc.</description>
      <pubDate>Tue, 09 Dec 2008 12:18:02 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/206424</guid>
      <dc:date>2008-12-09T12:18:02Z</dc:date>
    </item>
    <item>
      <title>Embedding Metadata and Other Semantics In Word-Processing Documents</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/206423</link>
      <description>Title: Embedding Metadata and Other Semantics In Word-Processing Documents
Authors: Sefton, Peter; Barnes, Ian; Ward, Ron; Downing, Jim
Abstract: This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base) and OpenOffice.org (because of its free availability). Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing.&#xD;
&#xD;
Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a "flat" word processing file.&#xD;
&#xD;
The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.
Description: This paper was presented at the International Digital Curation Conference in Edinburgh in Dec 2008.</description>
      <pubDate>Mon, 08 Dec 2008 16:29:41 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/206423</guid>
      <dc:date>2008-12-08T16:29:41Z</dc:date>
    </item>
    <item>
      <title>Results files for organic solid-state PM6 calculations from Nick Day's PhD thesis</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/197582</link>
      <description>Title: Results files for organic solid-state PM6 calculations from Nick Day's PhD thesis
Authors: Day, Nicholas E
Abstract: Results files for organic solid-state PM6 calculations from Nick Day's PhD thesis.  For each calculation, there are CIF, MOP, OUT and CML files available.</description>
      <pubDate>Thu, 31 Jul 2008 05:35:20 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/197582</guid>
      <dc:date>2008-07-31T05:35:20Z</dc:date>
    </item>
    <item>
      <title>Results files for inorganic solid-state PM6 calculations from Nick Day's PhD thesis</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/197581</link>
      <description>Title: Results files for inorganic solid-state PM6 calculations from Nick Day's PhD thesis
Authors: Day, Nicholas E
Abstract: Results files for inorganic solid-state calculations using the PM6 method in MOPAC2007 in Nick Day's PhD thesis.  Contains CIF, MOP, OUT and CML files for each calculation.</description>
      <pubDate>Thu, 31 Jul 2008 05:30:59 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/197581</guid>
      <dc:date>2008-07-31T05:30:59Z</dc:date>
    </item>
    <item>
      <title>CrystalEye - From Desktop to Data Repository</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/196186</link>
      <description>Title: CrystalEye - From Desktop to Data Repository
Authors: Downing, O J; Day, Nicholas E; Murray-Rust, Peter
Abstract: CrystalEye is a public data system consisting of processed open crystallographic data. It's development and evolution has some lessons for the development of other data repositories.
Description: This presentation was delivered at the Open Repositories 2008 conference in Southampton, on the 2nd April, 2008</description>
      <pubDate>Fri, 11 Apr 2008 09:33:17 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/196186</guid>
      <dc:date>2008-04-11T09:33:17Z</dc:date>
    </item>
    <item>
      <title>A preview of the TheOREM project</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/196185</link>
      <description>Title: A preview of the TheOREM project
Authors: Downing, O J
Abstract: This presentation was delivered at the European roll-out meeting of the Open Archives Initiative standard for Object Re-use and Exchange (ORE), to introduce a Joint Information Systems Committee (JISC) funded experiment looking to apply ORE to doctoral theses.
Description: The presentation was originally written (and delivered) in the Macintosh Keynote software.</description>
      <pubDate>Fri, 11 Apr 2008 09:12:49 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/196185</guid>
      <dc:date>2008-04-11T09:12:49Z</dc:date>
    </item>
  </channel>
</rss>

