Title: Classifying the World Anti-Doping Agency’s 2005 Prohibited List Using the Chemistry Development Kit Fingerprint
Authors: Cannon, Edward O
Mitchell, John B O
Keywords: drugs in sport
machine learning
classification
k-Nearest Neighbours
Random Forest
Issue Date: 2006
Publisher: Springer
Citation: Lecture Notes in Bioinformatics, 4216, 173-182 (2006)
Abstract: We used the freely available Chemistry Development Kit (CDK) fingerprint to classify 5235 representative molecules taken from ten banned classes in the 2005 World Anti-Doping Agency’s (WADA) prohibited list, including molecules taken from the corresponding activity classes in the MDL Drug Data Report (MDDR). We used both Random Forest and k-Nearest Neighbours (kNN)algorithms to generate classifiers. The kNN classifiers with k = 1 gave a very slightly better Matthews Correlation Coefficient than the Random Forest classifiers; the latter, however, predicted fewer false positives. The performance of kNN classifiers tended to decline with increasing k. The performance of the CDK fingerprint is essentially equivalent to that of Unity 2D. Our results suggest that it will be possible to use freely available chemoinformatics tools to aid the fight against drugs in sport, while minimising the risk of wrongfully penalising innocent athletes.
Description: Presented at CompLife 2006, Cambridge, 27-29 September 2006.
URI: http://www.dspace.cam.ac.uk/handle/1810/194743
ISBN: 978-3-540-45767-1
Appears in Collections:Scholarly Works - Unilever Centre for Molecular Informatics

Files in This Item:

File Description SizeFormat
Cannon_CompLife_Final.pdfPDF of final pre-publication version of article191.93 kBAdobe PDFThumbnail
View/Open
CDK_RF_spreadsheet.xlsSpreadsheet for Random Forest - CDK fingerprint77 kBMicrosoft ExcelView/Open
CDK_kNN_spreadsheet.xlsSpreadsheet for k-Nearest Neighbours - CDK fingerprint183.5 kBMicrosoft ExcelView/Open
All_methods_spreadsheet.xlsSpreadsheet for RF and kNN predictions using various fingerprints3.43 MBMicrosoft ExcelView/Open
Additional resources for this item
search for alternative versions in eresources@cambridge
retrieve citation metadata in EndNote format

This item has been accessed 862 times.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.