Improving the accuracy of predicting disulfide connectivity by feature selection

Copyright 2010 Wiley Periodicals, Inc.

Bibliographische Detailangaben
Veröffentlicht in:Journal of computational chemistry. - 1984. - 31(2010), 7 vom: 30. Mai, Seite 1478-85
1. Verfasser: Zhu, Lin (VerfasserIn)
Weitere Verfasser: Yang, Jie, Song, Jiang-Ning, Chou, Kuo-Chen, Shen, Hong-Bin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2010
Zugriff auf das übergeordnete Werk:Journal of computational chemistry
Schlagworte:Journal Article Research Support, Non-U.S. Gov't Disulfides Proteins Cysteine K848JZ4886
Beschreibung
Zusammenfassung:Copyright 2010 Wiley Periodicals, Inc.
Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins
Beschreibung:Date Completed 25.10.2010
Date Revised 21.11.2013
published: Print
Citation Status MEDLINE
ISSN:1096-987X
DOI:10.1002/jcc.21433