Human Versus Machine Intelligence : Assessing Natural Language Generation Models Through Complex Systems Theory

The introduction of Transformer architectures - with the self-attention mechanism - in automatic Natural Language Generation (NLG) is a breakthrough in solving general task-oriented problems, such as the simple production of long text excerpts that resemble ones written by humans. While the performa...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 7 vom: 15. Juni, Seite 4812-4829
1. Verfasser: De Santis, Enrico (VerfasserIn)
Weitere Verfasser: Martino, Alessio, Rizzi, Antonello
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652 4500
001 NLM367563320
003 DE-627
005 20240606232328.0
007 cr uuu---uuuuu
008 240125s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3358168  |2 doi 
028 5 2 |a pubmed24n1430.xml 
035 |a (DE-627)NLM367563320 
035 |a (NLM)38265904 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a De Santis, Enrico  |e verfasserin  |4 aut 
245 1 0 |a Human Versus Machine Intelligence  |b Assessing Natural Language Generation Models Through Complex Systems Theory 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 06.06.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a The introduction of Transformer architectures - with the self-attention mechanism - in automatic Natural Language Generation (NLG) is a breakthrough in solving general task-oriented problems, such as the simple production of long text excerpts that resemble ones written by humans. While the performance of GPT-X architectures is there for all to see, many efforts are underway to penetrate the secrets of these black-boxes in terms of intelligent information processing whose output statistical distributions resemble that of natural language. In this work, through the complexity science framework, a comparative study of the stochastic processes underlying the texts produced by the English version of GPT-2 with respect to texts produced by human beings, notably novels in English and programming codes, is offered. The investigation, of a methodological nature, consists first of all of an analysis phase in which the Multifractal Detrended Fluctuation Analysis and the Recurrence Quantification Analysis - together with Zipf's law and approximate entropy - are adopted to characterize long-term correlations, regularities and recurrences in human and machine-produced texts. Results show several peculiarities and trends in terms of long-range correlations and recurrences in the last case. The synthesis phase, on the other hand, uses the complexity measures to build synthetic text descriptors - hence a suitable text embedding - which serve to constitute the features for feeding a machine learning system designed to operate feature selection through an evolutionary technique. Using multivariate analysis, it is then shown the grouping tendency of the three analyzed text types, allowing to place GTP-2 texts in between natural language texts and computer codes. Similarly, the classification task demonstrates that, given the high accuracy obtained in the automatic discrimination of text classes, the proposed set of complexity measures is highly informative. These interesting results allow us to add another piece to the theoretical understanding of the surprising results obtained by NLG systems based on deep learning and let us to improve the design of new informetrics or text mining systems for text classification, fake news detection, or even plagiarism detection 
650 4 |a Journal Article 
700 1 |a Martino, Alessio  |e verfasserin  |4 aut 
700 1 |a Rizzi, Antonello  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 46(2024), 7 vom: 15. Juni, Seite 4812-4829  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:46  |g year:2024  |g number:7  |g day:15  |g month:06  |g pages:4812-4829 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3358168  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 46  |j 2024  |e 7  |b 15  |c 06  |h 4812-4829