<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Collection: Published papers and preprints</title>
    <link>http://www.dspace.cam.ac.uk:80/handle/1810/739</link>
    <description>Published papers and preprints</description>
    <pubDate>Fri, 24 May 2013 11:09:30 GMT</pubDate>
    <dc:date>2013-05-24T11:09:30Z</dc:date>
    <item>
      <title>Predicting the mechanism of phospholipidosis</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/241669</link>
      <description>Title: Predicting the mechanism of phospholipidosis
Authors: Lowe, Robert; Mussa, Hamse Y; Nigsch, Florian; Glen, Robert C; Mitchell, John BO
Abstract: Abstract The mechanism of phospholipidosis is still not well understood. Numerous different mechanisms have been proposed, varying from direct inhibition of the breakdown of phospholipids to the binding of a drug compound to the phospholipid, preventing breakdown. We have used a probabilistic method, the Parzen-Rosenblatt Window approach, to build a model from the ChEMBL dataset which can predict from a compound's structure both its primary pharmaceutical target and other targets with which it forms off-target, usually weaker, interactions. Using a small dataset of 182 phospholipidosis-inducing and non-inducing compounds, we predict their off-target activity against targets which could relate to phospholipidosis as a side-effect of a drug. We link these targets to specific mechanisms of inducing this lysosomal build-up of phospholipids in cells. Thus, we show that the induction of phospholipidosis is likely to occur by separate mechanisms when triggered by different cationic amphiphilic drugs. We find that both inhibition of phospholipase activity and enhanced cholesterol biosynthesis are likely to be important mechanisms. Furthermore, we provide evidence suggesting four specific protein targets. Sphingomyelin phosphodiesterase, phospholipase A2 and lysosomal phospholipase A1 are shown to be likely targets for the induction of phospholipidosis by inhibition of phospholipase activity, while lanosterol synthase is predicted to be associated with phospholipidosis being induced by enhanced cholesterol biosynthesis. This analysis provides the impetus for further experimental tests of these hypotheses.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 26 Jan 2012 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/241669</guid>
      <dc:date>2012-01-26T00:00:00Z</dc:date>
    </item>
    <item>
      <title>CML: Evolution and Design</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/241597</link>
      <description>Title: CML: Evolution and Design
Authors: Murray-Rust, Peter; Rzepa, Henry S
Abstract: Abstract A retrospective view of the design and evolution of Chemical Markup Language (CML) is presented by its original authors.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/241597</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>The semantics of Chemical Markup Language (CML): dictionaries and conventions</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239929</link>
      <description>Title: The semantics of Chemical Markup Language (CML): dictionaries and conventions
Authors: Murray-Rust, Peter; Townsend, Joe A; Adams, Sam E; Phadungsukanan, Weerapong; Thomas, Jens
Abstract: Abstract The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239929</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Semantic science and its communication - a personal view</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239927</link>
      <description>Title: Semantic science and its communication - a personal view
Authors: Murray-Rust, Peter
Abstract: Abstract The articles in this special issue represent the culmination of about 15 years working with the potential of the web to support chemical and related subjects. The selection of papers arises from a symposium held in January 2011 ('Visions of a Semantic Molecular Future') which gave me an opportunity to invite many people who shared the same vision. I have asked them to contribute their papers and most have been able to do so. They cover a wide range of content, approaches and styles and apart from the selection of the speakers (and hence the authors) I have not exercised any control over the content.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239927</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Open Bibliography for Science, Technology, and Medicine</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239926</link>
      <description>Title: Open Bibliography for Science, Technology, and Medicine
Authors: Jones, Richard; MacGillivray, Mark; Murray-Rust, Peter; Pitman, Jim; Sefton, Peter; O'Steen, Ben; Waites, William
Abstract: Abstract The concept of Open Bibliography in science, technology and medicine (STM) is introduced as a combination of Open Source tools, Open specifications and Open bibliographic data. An Openly searchable and navigable network of bibliographic information and associated knowledge representations, a Bibliographic Knowledge Network, across all branches of Science, Technology and Medicine, has been designed and initiated. For this large scale endeavour, the engagement and cooperation of the multiple stakeholders in STM publishing - authors, librarians, publishers and administrators - is sought.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239926</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239925</link>
      <description>Title: The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age
Authors: Adams, Sam; de Castro, Pablo; Echenique, Pablo; Estrada, Jorge; Hanwell, Marcus D; Murray-Rust, Peter; Sherwood, Paul; Thomas, Jens; Townsend, Joe A
Abstract: Abstract Computational Quantum Chemistry has developed into a powerful, efficient, reliable and increasingly routine tool for exploring the structure and properties of small to medium sized molecules. Many thousands of calculations are performed every day, some offering results which approach experimental accuracy. However, in contrast to other disciplines, such as crystallography, or bioinformatics, where standard formats and well-known, unified databases exist, this QC data is generally destined to remain locally held in files which are not designed to be machine-readable. Only a very small subset of these results will become accessible to the wider community through publication. In this paper we describe how the Quixote Project is developing the infrastructure required to convert output from a number of different molecular quantum chemistry packages to a common semantically rich, machine-readable format and to build respositories of QC results. Such an infrastructure offers benefits at many levels. The standardised representation of the results will facilitate software interoperability, for example making it easier for analysis tools to take data from different QC packages, and will also help with archival and deposition of results. The repository infrastructure, which is lightweight and built using Open software components, can be implemented at individual researcher, project, organisation or community level, offering the exciting possibility that in future many of these QC results can be made publically available, to be searched and interpreted just as crystallography and bioinformatics results are today. Although we believe that quantum chemists will appreciate the contribution the Quixote infrastructure can make to the organisation and and exchange of their results, we anticipate that greater rewards will come from enabling their results to be consumed by a wider community. As the respositories grow they will become a valuable source of chemical data for use by other disciplines in both research and education. The Quixote project is unconventional in that the infrastructure is being implemented in advance of a full definition of the data model which will eventually underpin it. We believe that a working system which offers real value to researchers based on tools and shared, searchable repositories will encourage early participation from a broader community, including both producers and consumers of data. In the early stages, searching and indexing can be performed on the chemical subject of the calculations, and well defined calculation meta-data. The process of defining more specific quantum chemical definitions, adding them to dictionaries and extracting them consistently from the results of the various software packages can then proceed in an incremental manner, adding additional value at each stage. Not only will these results help to change the data management model in the field of Quantum Chemistry, but the methodology can be applied to other pressing problems related to data in computational and experimental science.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239925</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Ami - The Chemist's Amanuensis</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239924</link>
      <description>Title: Ami - The Chemist's Amanuensis
Authors: Brooks, Brian J; Thorn, Adam L; Smith, Matthew; Matthews, Peter; Chen, Shaoming; O'Steen, Ben; Adams, Sam E; Townsend, Joe A; Murray-Rust, Peter
Abstract: Abstract The Ami project was a six month Rapid Innovation project sponsored by JISC to explore the Virtual Research Environment space. The project brainstormed with chemists and decided to investigate ways to facilitate monitoring and collection of experimental data. A frequently encountered use-case was identified of how the chemist reaches the end of an experiment, but finds an unexpected result. The ability to replay events can significantly help make sense of how things progressed. The project therefore concentrated on collecting a variety of dimensions of ancillary data - data that would not normally be collected due to practicality constraints. There were three main areas of investigation: 1) Development of a monitoring tool using infrared and ultrasonic sensors; 2) Time-lapse motion video capture (for example, videoing 5 seconds in every 60); and 3) Activity-driven video monitoring of the fume cupboard environs. The Ami client application was developed to control these separate logging functions. The application builds up a timeline of the events in the experiment and around the fume cupboard. The videos and data logs can then be reviewed after the experiment in order to help the chemist determine the exact timings and conditions used. The project experimented with ways in which a Microsoft Kinect could be used in a laboratory setting. Investigations suggest that it would not be an ideal device for controlling a mouse, but it shows promise for usages such as manipulating virtual molecules.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239924</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239920</link>
      <description>Title: Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on
Authors: O'Boyle, Noel M; Guha, Rajarshi; Willighagen, Egon L; Adams, Samuel E; Alvarsson, Jonathan; Bradley, Jean-Claude; Filippov, Igor V; Hanson, Robert M; Hanwell, Marcus D; Hutchison, Geoffrey R; James, Craig A; Jeliazkova, Nina; Lang, Andrew SID; Langner, Karol M; Lonie, David C; Lowe, Daniel M; Pansanel, Jerome; Pavlov, Dmitry; Spjuth, Ola; Steinbeck, Christoph; Tenderholt, Adam L; Theisen, Kevin J; Murray-Rust, Peter
Abstract: Abstract Background The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. Results This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239920</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>OSCAR4: a flexible architecture for chemical text-mining</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239919</link>
      <description>Title: OSCAR4: a flexible architecture for chemical text-mining
Authors: Jessop, David M; Adams, Sam E; Willighagen, Egon L; Hawizy, Lezan; Murray-Rust, Peter
Abstract: Abstract The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239919</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Mining chemical information from Open patents</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239918</link>
      <description>Title: Mining chemical information from Open patents
Authors: Jessop, David M; Adams, Sam E; Murray-Rust, Peter
Abstract: Abstract Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239918</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>CMLLite: a design philosophy for CML</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239917</link>
      <description>Title: CMLLite: a design philosophy for CML
Authors: Townsend, Joe A.; Murray-Rust, Peter
Abstract: Abstract CMLLite is a collection of definitions and processes which provide strong and flexible validation for a document in Chemical Markup Language (CML). It consists of an updated CML schema (schema3), conventions specifying rules in both human and machine-understandable forms and a validator available both online and offline to check conformance. This article explores the rationale behind the changes which have been made to the schema, explains how conventions interact and how they are designed, formulated, implemented and tested, and gives an overview of the validation service.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239917</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>The semantic architecture of the World-Wide Molecular Matrix (WWMM)</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/239916</link>
      <description>Title: The semantic architecture of the World-Wide Molecular Matrix (WWMM)
Authors: Murray-Rust, Peter; Adams, Sam E; Downing, Jim; Townsend, Joe A; Zhang, Yong
Abstract: Abstract The World-Wide Molecular Matrix (WWMM) is a ten year project to create a peer-to-peer (P2P) system for the publication and collection of chemical objects, including over 250, 000 molecules. It has now been instantiated in a number of repositories which include data encoded in Chemical Markup Language (CML) and linked by URIs and RDF. The technical specification and implementation is now complete. We discuss the types of architecture required to implement nodes in the WWMM and consider the social issues involved in adoption.
Description: RIGHTS : This article is licensed under the BioMed Central licence at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Thu, 13 Oct 2011 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/239916</guid>
      <dc:date>2011-10-13T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Chemistry in Bioinformatics</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/238097</link>
      <description>Title: Chemistry in Bioinformatics
Authors: Murray-Rust, Peter; Mitchell, John B O; Rzepa, Henry S
Abstract: Abstract Chemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is openly available and freely re-usable, most chemical information is closed and cannot be re-distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols.
Description: Rights : This article is licensed under the BioMed Central license at  http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution License'.  In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work  - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.</description>
      <pubDate>Mon, 06 Jun 2005 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/238097</guid>
      <dc:date>2005-06-06T23:00:00Z</dc:date>
    </item>
    <item>
      <title>Scoring functions and enrichment: a case study on Hsp90</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/238052</link>
      <description>Title: Scoring functions and enrichment: a case study on Hsp90
Authors: Konstantinou-Kirtay, Chrysi; Mitchell, John B O; Lumley, James A
Abstract: Abstract Background The need for fast and accurate scoring functions has been driven by the increased use of in silico virtual screening twinned with high-throughput screening as a method to rapidly identify potential candidates in the early stages of drug development. We examine the ability of some the most common scoring functions (GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus) to discriminate correctly and efficiently between active and non-active compounds among a library of ~3,600 diverse decoy compounds in a virtual screening experiment against heat shock protein 90 (Hsp90). Results Firstly, we investigated two ranking methodologies, GOLDrank and BestScorerank. GOLDrank is based on ranks generated using GOLD. The various scoring functions, GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus, are applied to the pose ranked number one by GOLD for that ligand. BestScorerank uses multiple poses for each ligand and independently chooses the best ranked pose of the ligand according to each different scoring function. Secondly, we considered the effect of introducing the Thr184 hydrogen bond tether to guide the docking process towards a particular solution, and its effect on enrichment. Thirdly, we considered normalisation to account for the known bias of scoring functions to select larger molecules. All the scoring functions gave fairly similar enrichments, with the exception of PMF which was consistently the poorest performer. In most cases, GOLD was marginally the best performing individual function; the Consensus score usually performed similarly to the best single scoring function. Our best results were obtained using the Thr184 tether in combination with the BestScorerank protocol and normalisation for molecular weight. For that particular combination, DOCK was the best individual function; DOCK recovered 90% of the actives in the top 10% of the ranked list; Consensus similarly recovered 89% of the actives in its top 10%. Conclusion Overall, we demonstrate the validity of virtual screening as a method for identifying new leads from a pool of ligands with similar physicochemical properties and we believe that the outcome of this study provides useful insight into the setting up of a suitable docking and scoring protocol, resulting in enrichment of 'target active' compounds.</description>
      <pubDate>Fri, 26 Jan 2007 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/238052</guid>
      <dc:date>2007-01-26T00:00:00Z</dc:date>
    </item>
    <item>
      <title>A novel hybrid ultrafast shape descriptor method for use in virtual screening</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/237988</link>
      <description>Title: A novel hybrid ultrafast shape descriptor method for use in virtual screening
Authors: Cannon, Edward O; Nigsch, Florian; Mitchell, John B O
Abstract: Abstract Background We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. Results The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. Conclusion The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient.</description>
      <pubDate>Mon, 18 Feb 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/237988</guid>
      <dc:date>2008-02-18T00:00:00Z</dc:date>
    </item>
    <item>
      <title>Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/237942</link>
      <description>Title: Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction
Authors: O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John B O
Abstract: Abstract Background We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Results Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21) and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. Conclusion With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.</description>
      <pubDate>Wed, 29 Oct 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/237942</guid>
      <dc:date>2008-10-29T00:00:00Z</dc:date>
    </item>
    <item>
      <title>Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/237730</link>
      <description>Title: Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit
Authors: O'Boyle, Noel M; Morley, Chris; Hutchison, Geoffrey R
Abstract: Abstract Background Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit. Results Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel. Conclusion Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.</description>
      <pubDate>Sun, 09 Mar 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/237730</guid>
      <dc:date>2008-03-09T00:00:00Z</dc:date>
    </item>
    <item>
      <title>Towards Lensfield: data management, processing and semantic publication for vernacular e-science</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/223838</link>
      <description>Title: Towards Lensfield: data management, processing and semantic publication for vernacular e-science
Authors: Downing, Jim; Day, Nick; Murray-Rust, Peter; Hawizy, Lezan; Adams, Nico
Abstract: Lensfield is a desktop and filesystem-based tool designed as a “personal data management assistant” for the scientist. It combines distributed version control (DVCS), software transaction memory (STM) and linked open data (LOD) publishing to create a novel data management, processing and publication tool. The application “just looks after” these technologies for the scientist, providing simple interfaces for typical uses. It is built with Clojure and includes macros which define steps in a common workflow. Functions and Java libraries provide facilities for automatic processing of data which is ultimately published as RDF in a web application. The progress of data processing is tracked by a fine-grained data structure that can be serialized to disk, with the potential to include manual steps and programmatic interrupts in largely automated processes through seamless resumption. Flexibility in operation and minimizing barriers to adoption are major design features.
Description: This paper was presented at the IEEE eScience conference 2009, hosted by the Oxford eResearch Centre and held at the Kassam Stadium outside Oxford.</description>
      <pubDate>Tue, 01 Dec 2009 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/223838</guid>
      <dc:date>2009-12-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>CHIC - Converting Hamburgers Into Cows</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/223837</link>
      <description>Title: CHIC - Converting Hamburgers Into Cows
Authors: Townsend, Joseph A
Abstract: We have developed a methodology and workflow (CHIC) for the automatic semantification and structuring of legacy textual scientific documents. CHIC imports common document formats (PDF, DOCX and (X)HTML) and uses a number of toolkits to extract components and convert them into SciXML. This is sectioned into text-rich and data-rich streams and stand-off annotation (SAF) is created for each. Embedded domain specific objects can be converted into XML (Chemical Markup Language). The different workflow streams can then be recombined and typically converted into RDF (Resource Description Format).</description>
      <pubDate>Tue, 01 Dec 2009 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/223837</guid>
      <dc:date>2009-12-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>Computational Chemistry Robots</title>
      <link>http://www.dspace.cam.ac.uk:80/handle/1810/223836</link>
      <description>Title: Computational Chemistry Robots
Authors: Townsend, Joseph A; Murray-Rust, Peter; Tyrrell, Simon M; Zhang, Yong
Abstract: Millions of compounds are now Openly available (e.g. PubChem) and we describe the automatic computation of their geometries and properties. Using completely automatic procedures, based on modular components and workflow technology (Taverna) we can:&#xD;
&#xD;
* extract structures from 3D databases or crystallographic publications&#xD;
* determine a cost-effective level of theory &#xD;
* optimise ground state geometry and calculate properties&#xD;
* disseminate the results Openly.&#xD;
&#xD;
Although error rates are low their management must be completely robotic.&#xD;
&#xD;
By using spare capacity (Condor) we have calculated 250,000 molecules at PM5 (MOPAC) and over 10000 at B3LYP/63-1G* (GAMESS), and analysed the data robotically, including:&#xD;
* variability between crystallographic experiment and levels of theory&#xD;
* geometric variability within instances of a given functional group&#xD;
* detection of molecular features that give rise to serious errors or pathological computation.&#xD;
&#xD;
The results in our WorldWideMolecularMatrix (WWMM, http://wwmm.ch.cam.ac.uk) are Openly available in our DSpace repository (http://www.dspace.cam.ac.uk/handle/1810/724).
Description: ACS Fall Conference 2005</description>
      <pubDate>Wed, 31 Aug 2005 23:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://www.dspace.cam.ac.uk:80/handle/1810/223836</guid>
      <dc:date>2005-08-31T23:00:00Z</dc:date>
    </item>
  </channel>
</rss>

