Anchor-Free Correlated Topic Modeling

In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has a characteristic anchor word that only appears in that topic. The anchor-word assumption is fragile in practice, because words and terms ha...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 41(2019), 5 vom: 04. Mai, Seite 1056-1071
1. Verfasser: Fu, Xiao (VerfasserIn)
Weitere Verfasser: Huang, Kejun, Sidiropoulos, Nicholas D, Shi, Qingjiang, Hong, Mingyi
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2019
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM28636168X
003 DE-627
005 20231225051530.0
007 cr uuu---uuuuu
008 231225s2019 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2018.2827377  |2 doi 
028 5 2 |a pubmed24n0954.xml 
035 |a (DE-627)NLM28636168X 
035 |a (NLM)29993625 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Fu, Xiao  |e verfasserin  |4 aut 
245 1 0 |a Anchor-Free Correlated Topic Modeling 
264 1 |c 2019 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 20.11.2019 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has a characteristic anchor word that only appears in that topic. The anchor-word assumption is fragile in practice, because words and terms have multiple uses; yet it is commonly adopted because it enables identifiability guarantees. Remedies in the literature include using three- or higher-order word co-occurence statistics to come up with tensor factorization models, but such statistics need many more samples to obtain reliable estimates, and identifiability still hinges on additional assumptions, such as consecutive words being persistently drawn from the same topic. In this work, we propose a new topic identification criterion using second order statistics of the words. The criterion is theoretically guaranteed to identify the underlying topics even when the anchor-word assumption is grossly violated. An algorithm based on alternating optimization, and an efficient primal-dual algorithm are proposed to handle the resulting identification problem. The former exhibits high performance and is completely parameter-free; the latter affords up to 200 times speedup relative to the former, but requires step-size tuning and a slight sacrifice in accuracy. A variety of real text copora are employed to showcase the effectiveness of the approach, where the proposed anchor-free method demonstrates substantial improvements compared to a number of anchor-word based approaches under various evaluation metrics 
650 4 |a Journal Article 
700 1 |a Huang, Kejun  |e verfasserin  |4 aut 
700 1 |a Sidiropoulos, Nicholas D  |e verfasserin  |4 aut 
700 1 |a Shi, Qingjiang  |e verfasserin  |4 aut 
700 1 |a Hong, Mingyi  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 41(2019), 5 vom: 04. Mai, Seite 1056-1071  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:41  |g year:2019  |g number:5  |g day:04  |g month:05  |g pages:1056-1071 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2018.2827377  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 41  |j 2019  |e 5  |b 04  |c 05  |h 1056-1071