Improving machine learning performance by removing redundant cases in medical data sets

Neural network models and other machine learning methods have successfully been applied to several medical classification problems. These models can be periodically refined and retrained as new cases become available. Since training neural networks by backpropagation is time consuming, it is desirab...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:Proceedings. AMIA Symposium. - 1998. - (1998) vom: 13., Seite 523-7
1. Verfasser: Ohno-Machado, L (VerfasserIn)
Weitere Verfasser: Fraser, H S, Ohrn, A
Format: Aufsatz
Sprache:English
Veröffentlicht: 1998
Zugriff auf das übergeordnete Werk:Proceedings. AMIA Symposium
Schlagworte:Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, P.H.S.
Beschreibung
Zusammenfassung:Neural network models and other machine learning methods have successfully been applied to several medical classification problems. These models can be periodically refined and retrained as new cases become available. Since training neural networks by backpropagation is time consuming, it is desirable that a minimum number of representative cases be kept in the training set (i.e., redundant cases should be removed). The removal of redundant cases should be carefully monitored so that classification performance is not significantly affected. We made experiments on data removal on a data set of 700 patients suspected of having myocardial infarction and show that there is no statistical difference in classification performance (measured by the differences in areas under the ROC curve on two previously unknown sets of 553 and 500 cases) when as many as 86% of the cases are randomly removed. A proportional reduction in the amount of time required to train the neural network model is achieved
Beschreibung:Date Completed 16.03.1999
Date Revised 10.12.2019
published: Print
Citation Status MEDLINE
ISSN:1531-605X