Scalable Distributed Hashing for Approximate Nearest Neighbor Search

Hashing has been widely applied to the large-scale approximate nearest neighbor search problem owing to its high efficiency and low storage requirement. Most investigations concentrate on learning hashing methods in a centralized setting. However, in existing big data systems, data is often stored a...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 07., Seite 472-484
1. Verfasser: Cao, Yuan (VerfasserIn)
Weitere Verfasser: Liu, Junwei, Qi, Heng, Gui, Jie, Li, Keqiu, Ye, Jieping, Liu, Chao
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM33411487X
003 DE-627
005 20231225223030.0
007 cr uuu---uuuuu
008 231225s2022 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2021.3130528  |2 doi 
028 5 2 |a pubmed24n1113.xml 
035 |a (DE-627)NLM33411487X 
035 |a (NLM)34874853 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Cao, Yuan  |e verfasserin  |4 aut 
245 1 0 |a Scalable Distributed Hashing for Approximate Nearest Neighbor Search 
264 1 |c 2022 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 16.12.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Hashing has been widely applied to the large-scale approximate nearest neighbor search problem owing to its high efficiency and low storage requirement. Most investigations concentrate on learning hashing methods in a centralized setting. However, in existing big data systems, data is often stored across different nodes. In some situations, data is even collected in a distributed manner. A straightforward way to solve this problem is to aggregate all the data into the fusion center to obtain the search result (aggregating method). However, this strategy is not feasible because of the prohibitive communication cost. Although a few distributed hashing methods have been proposed to reduce this cost, they only focus on designing a distributed algorithm for a specific global optimization objective without considering scalability. Moreover, existing distributed hashing methods aim at finding a distributed solution to hashing, meanwhile avoiding accuracy loss, rather than improving accuracy. To address these challenges, we propose a Scalable Distributed Hashing (SDisH) model in which most existing hashing methods can be extended to process distributed data with no changes. Furthermore, to improve accuracy, we utilize the search radius as a global variable across different nodes to achieve a global optimum search result for every iteration. In addition, a voting algorithm is presented based on the results produced by multiple iterations to further reduce search errors. Theoretical analyses of communication, computation, and accuracy demonstrate the superiority of the proposed model. Numerical simulations on three large-scale and two relatively small benchmark datasets also show that the SDisH model achieves up to 44.75% and 10.23% accuracy gains compared to the aggregating method and state-of-the-art distributed hashing methods, respectively 
650 4 |a Journal Article 
700 1 |a Liu, Junwei  |e verfasserin  |4 aut 
700 1 |a Qi, Heng  |e verfasserin  |4 aut 
700 1 |a Gui, Jie  |e verfasserin  |4 aut 
700 1 |a Li, Keqiu  |e verfasserin  |4 aut 
700 1 |a Ye, Jieping  |e verfasserin  |4 aut 
700 1 |a Liu, Chao  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 31(2022) vom: 07., Seite 472-484  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:31  |g year:2022  |g day:07  |g pages:472-484 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2021.3130528  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 31  |j 2022  |b 07  |h 472-484