Approximate Fisher Kernels of Non-iid Image Models for Image Categorization

The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed t...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 38(2016), 6 vom: 06. Juni, Seite 1084-98
1. Verfasser: Cinbis, Ramazan Gokberk (VerfasserIn)
Weitere Verfasser: Verbeek, Jakob, Schmid, Cordelia
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2016
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM253436702
003 DE-627
005 20231224170223.0
007 cr uuu---uuuuu
008 231224s2016 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2015.2484342  |2 doi 
028 5 2 |a pubmed24n0844.xml 
035 |a (DE-627)NLM253436702 
035 |a (NLM)26441445 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Cinbis, Ramazan Gokberk  |e verfasserin  |4 aut 
245 1 0 |a Approximate Fisher Kernels of Non-iid Image Models for Image Categorization 
264 1 |c 2016 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 05.06.2017 
500 |a Date Revised 05.06.2017 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods 
650 4 |a Journal Article 
700 1 |a Verbeek, Jakob  |e verfasserin  |4 aut 
700 1 |a Schmid, Cordelia  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 38(2016), 6 vom: 06. Juni, Seite 1084-98  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:38  |g year:2016  |g number:6  |g day:06  |g month:06  |g pages:1084-98 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2015.2484342  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 38  |j 2016  |e 6  |b 06  |c 06  |h 1084-98