Learning with Average Precision: Training Image Retrieval with a Listwise Loss

发表于 2022-08-10 分类于图像检索/image retrieval 阅读次数：

本文字数： 1.5k 阅读时长 ≈ 3 分钟

原文地址：Learning with Average Precision: Training Image Retrieval with a Listwise Loss

官方实现：

摘要

Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g. special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at this https URL

图像检索可以表述为一个排序问题，其目标是通过降低与查询的相似度来对数据库图像进行排序。最近深度模型通过利用基于排序的损失函数在图像检索任务上已经优于传统方法，但是重要的理论和实践问题依旧存在。首先，它们不是直接优化全局排名，而是最小化基本损失的上限，这不一定会得到最优平均精度（mAP）。其次，这些方法需要大量的工程调试才能很好地实现，比如，特定的预训练以及困难负样本采样。在本文中，我们通过利用列表损失的最新进展来直接优化全局mAP。使用直方图分块近似可以实现AP可微分，从而实现端到端学习。与现有的损失函数相比，该方法在每次迭代时同时考虑数千幅图像并且无需特殊技巧。它也在许多检索基准上实现了最高性能。模型和评估脚本已开源：https://github.com/naver/deep-image-retrieval