Integration of improved YOLOv5 for face mask detector and auto-labeling to generate dataset for fighting against COVID-19

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author se...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing. - 1998. - 79(2023), 8 vom: 06., Seite 8966-8992
1. Verfasser:	Pham, Thi-Ngot (VerfasserIn)
Weitere Verfasser:	Nguyen, Viet-Hoan, Huh, Jun-Ho
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	The Journal of supercomputing
Schlagworte:	Journal Article Auto-labeling COVID-19 Coordinate attention Deep learning Face mask detection YOLO YOLOv5 You Only Look One

Beschreibung
Zusammenfassung:	© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. One of the most effective deterrent methods is using face masks to prevent the spread of the virus during the COVID-19 pandemic. Deep learning face mask detection networks have been implemented into COVID-19 monitoring systems to provide effective supervision for public areas. However, previous works have limitations: the challenge of real-time performance (i.e., fast inference and low accuracy) and training datasets. The current study aims to propose a comprehensive solution by creating a new face mask dataset and improving the YOLOv5 baseline to balance accuracy and detection time. Particularly, we improve YOLOv5 by adding coordinate attention (CA) module into the baseline backbone following two different schemes, namely YOLOv5s-CA and YOLOV5s-C3CA. In detail, we train three models with a Kaggle dataset of 853 images consisting of three categories: without a mask "NM," with mask "M," and incorrectly worn mask "IWM" classes. The experimental results show that our modified YOLOv5 with CA module achieves the highest accuracy mAP0.5 of 93.9% compared with 87% of baseline and detection time per image of 8.0 ms (125 FPS). In addition, we build an integrated system of improved YOLOv5-CA and auto-labeling module to create a new face mask dataset of 7110 images with more than 3500 labels for three categories from YouTube videos. Our proposed YOLOv5-CA and the state-of-the-art detection models (i.e., YOLOX, YOLOv6, and YOLOv7) are trained on our 7110 images dataset. In our dataset, the YOLOv5-CA performance enhances with mAP@0.5 of 96.8%. The results indicate the enhancement of the improved YOLOv5-CA model compared with several state-of-the-art works
Beschreibung:	Date Revised 16.09.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	0920-8542
DOI:	10.1007/s11227-022-04979-2