Distance phenomena in high-dimensional chemical descriptor spaces : consequences for similarity-based approaches

2009 Wiley Periodicals, Inc.

Bibliographische Detailangaben
Veröffentlicht in:Journal of computational chemistry. - 1984. - 30(2009), 14 vom: 15. Nov., Seite 2285-96
1. Verfasser: Rupp, Matthias (VerfasserIn)
Weitere Verfasser: Schneider, Petra, Schneider, Gisbert
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2009
Zugriff auf das übergeordnete Werk:Journal of computational chemistry
Schlagworte:Journal Article Research Support, Non-U.S. Gov't Pharmaceutical Preparations
Beschreibung
Zusammenfassung:2009 Wiley Periodicals, Inc.
Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendations for such choices. We review the mathematical concepts used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products, and, similarity coefficients, as well as the relationships between them, employing (dis)similarity measures commonly used in cheminformatics as examples. We present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration) in high-dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated on both artificial and real (bioactivity) data
Beschreibung:Date Completed 19.01.2010
Date Revised 31.08.2009
published: Print
Citation Status MEDLINE
ISSN:1096-987X
DOI:10.1002/jcc.21218