Evolving Fully Automated Machine Learning via Life-Long Knowledge Anchors
Automated machine learning (AutoML) has achieved remarkable progress on various tasks, which is attributed to its minimal involvement of manual feature and model designs. However, most of existing AutoML pipelines only touch parts of the full machine learning pipeline, e.g., neural architecture sear...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 9 vom: 29. Sept., Seite 3091-3107 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , , , , , , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2021
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article Research Support, Non-U.S. Gov't |
Zusammenfassung: | Automated machine learning (AutoML) has achieved remarkable progress on various tasks, which is attributed to its minimal involvement of manual feature and model designs. However, most of existing AutoML pipelines only touch parts of the full machine learning pipeline, e.g., neural architecture search or optimizer selection. This leaves potentially important components such as data cleaning and model ensemble out of the optimization, and still results in considerable human involvement and suboptimal performance. The main challenges lie in the huge search space assembling all possibilities over all components, as well as the generalization ability over different tasks like image, text, and tabular etc. In this paper, we present a first-of-its-kind fully AutoML pipeline, to comprehensively automate data preprocessing, feature engineering, model generation/selection/training and ensemble for an arbitrary dataset and evaluation metric. Our innovation lies in the comprehensive scope of a learning pipeline, with a novel "life-long" knowledge anchor design to fundamentally accelerate the search over the full search space. Such knowledge anchors record detailed information of pipelines and integrates them with an evolutionary algorithm for joint optimization across components. Experiments demonstrate that the result pipeline achieves state-of-the-art performance on multiple datasets and modalities. Specifically, the proposed framework was extensively evaluated in the NeurIPS 2019 AutoDL challenge, and won the only champion with a significant gap against other approaches, on all the image, video, speech, text and tabular tracks |
---|---|
Beschreibung: | Date Completed 29.09.2021 Date Revised 29.09.2021 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1939-3539 |
DOI: | 10.1109/TPAMI.2021.3069250 |