A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails

© 2024 Informa UK Limited, trading as Taylor & Francis Group.

Bibliographische Detailangaben
Veröffentlicht in:Journal of applied statistics. - 1991. - 51(2024), 13 vom: 05., Seite 2592-2626
1. Verfasser: Papageorgiou, Grigorios (VerfasserIn)
Weitere Verfasser: Economou, Polychronis, Bersimis, Sotirios
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:Journal of applied statistics
Schlagworte:Journal Article Text vectorization machine learning performance evaluation metrics text classification text labeling validation
LEADER 01000caa a22002652 4500
001 NLM377773468
003 DE-627
005 20240919233104.0
007 cr uuu---uuuuu
008 240918s2024 xx |||||o 00| ||eng c
024 7 |a 10.1080/02664763.2024.2307535  |2 doi 
028 5 2 |a pubmed24n1539.xml 
035 |a (DE-627)NLM377773468 
035 |a (NLM)39290353 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Papageorgiou, Grigorios  |e verfasserin  |4 aut 
245 1 2 |a A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 19.09.2024 
500 |a published: Electronic-eCollection 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a © 2024 Informa UK Limited, trading as Taylor & Francis Group. 
520 |a Optimizing text preprocessing and text classification algorithms is an important, everyday task in large organizations and companies and it usually involves a labor-intensive and time-consuming effort. For example, the filtering and sorting of a large number of electronic mails (emails) are crucial to keeping track of the received information and converting it automatically into useful and profitable knowledge. Business emails are often unstructured, noisy, and with many abbreviations and acronyms, which makes their handling a challenging procedure. To overcome those challenges, a two-step classification approach is proposed, along with a two-cycle labeling procedure in order to speed up the labeling process. Every step incorporates a heuristic classification approach to assign emails to predefined classes by comparing several classification and text vectorization algorithms. These algorithms are compared and evaluated using the F1 score and balanced accuracy. The implementation of the proposed algorithm is demonstrated in a shipbroker agent operating in Greece with excellent performance, improving organization and administration while reducing expenses 
650 4 |a Journal Article 
650 4 |a Text vectorization 
650 4 |a machine learning 
650 4 |a performance evaluation metrics 
650 4 |a text classification 
650 4 |a text labeling 
650 4 |a validation 
700 1 |a Economou, Polychronis  |e verfasserin  |4 aut 
700 1 |a Bersimis, Sotirios  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t Journal of applied statistics  |d 1991  |g 51(2024), 13 vom: 05., Seite 2592-2626  |w (DE-627)NLM098188178  |x 0266-4763  |7 nnns 
773 1 8 |g volume:51  |g year:2024  |g number:13  |g day:05  |g pages:2592-2626 
856 4 0 |u http://dx.doi.org/10.1080/02664763.2024.2307535  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 51  |j 2024  |e 13  |b 05  |h 2592-2626