Improving machine learning performance by removing redundant cases in medical data sets
Neural network models and other machine learning methods have successfully been applied to several medical classification problems. These models can be periodically refined and retrained as new cases become available. Since training neural networks by backpropagation is time consuming, it is desirab...
Veröffentlicht in: | Proceedings. AMIA Symposium. - 1998. - (1998) vom: 13., Seite 523-7 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , |
Format: | Aufsatz |
Sprache: | English |
Veröffentlicht: |
1998
|
Zugriff auf das übergeordnete Werk: | Proceedings. AMIA Symposium |
Schlagworte: | Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, P.H.S. |
Zusammenfassung: | Neural network models and other machine learning methods have successfully been applied to several medical classification problems. These models can be periodically refined and retrained as new cases become available. Since training neural networks by backpropagation is time consuming, it is desirable that a minimum number of representative cases be kept in the training set (i.e., redundant cases should be removed). The removal of redundant cases should be carefully monitored so that classification performance is not significantly affected. We made experiments on data removal on a data set of 700 patients suspected of having myocardial infarction and show that there is no statistical difference in classification performance (measured by the differences in areas under the ROC curve on two previously unknown sets of 553 and 500 cases) when as many as 86% of the cases are randomly removed. A proportional reduction in the amount of time required to train the neural network model is achieved |
---|---|
Beschreibung: | Date Completed 16.03.1999 Date Revised 10.12.2019 published: Print Citation Status MEDLINE |
ISSN: | 1531-605X |