Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space

Approximate K-nearest neighbor search is a fundamental problem in computer science. The problem is especially important for high-dimensional and large-scale data. Recently, many techniques encoding high-dimensional data to compact codes have been proposed. The product quantization and its variations...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 41(2019), 9 vom: 12. Sept., Seite 2084-2097
Auteur principal: Heo, Jae-Pil (Auteur)
Autres auteurs: Lin, Zhe, Yoon, Sung-Eui
Format: Article en ligne
Langue:English
Publié: 2019
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article Research Support, Non-U.S. Gov't
Description
Résumé:Approximate K-nearest neighbor search is a fundamental problem in computer science. The problem is especially important for high-dimensional and large-scale data. Recently, many techniques encoding high-dimensional data to compact codes have been proposed. The product quantization and its variations that encode the cluster index in each subspace have been shown to provide impressive accuracy. In this paper, we explore a simple question: is it best to use all the bit-budget for encoding a cluster index? We have found that as data points are located farther away from the cluster centers, the error of estimated distance becomes larger. To address this issue, we propose a novel compact code representation that encodes both the cluster index and quantized distance between a point and its cluster center in each subspace by distributing the bit-budget. We also propose two distance estimators tailored to our representation. We further extend our method to encode global residual distances in the original space. We have evaluated our proposed methods on benchmarks consisting of GIST, VLAD, and CNN features. Our extensive experiments show that the proposed methods significantly and consistently improve the search accuracy over other tested techniques. This result is achieved mainly because our methods accurately estimate distances
Description:Date Completed 11.09.2019
Date Revised 11.09.2019
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2018.2853161