Oxford5k and Paris6k

发表于 2022-10-24 分类于图像检索/image retrieval 阅读次数：

本文字数： 3.5k 阅读时长 ≈ 6 分钟

概述

Oxford5k是常用的地标检索数据集，包含了5062张图像，长宽为(1024, 768)。该数据集来自于牛津大学11个建筑物的不同视角，其中每个建筑物5张查询图像，共55张。

Paris6k同样是VGG组出品的地标检索数据集，包含了6412张图像，长宽为(1024, 768)。该数据集来自于Flickr，收集了巴黎的12个建筑物。

Oxford5k和Paris6k使用相同的标注方式、评估标准以及干扰集（共100K来自于Flickr）。

标注

存在4个标签：

Good - 拍摄的建筑物图像非常清晰无干扰（A nice, clear picture of the object/building）
OK - 超过25%的建筑物面积清晰可见（More than 25% of the object is clearly visible）
Bad - 目标不存在（The object is not present）
Junk - 少于25%的建筑物面积可见，或者出现了高度的遮挡或变形（Less than 25% of the object is visible, or there are very high levels of occlusion or distortion）。

在评估计算中，会将Good和OK标签设置为GT，将Junk标签设置为负样本，忽略Bad标签。

评估标准

Oxford5k/Paris6k使用mAP作为评估标准，并且提供了计算代码（单次查询的AP计算）：

// compute_ap.cpp
// This is a modified version of the retrieval benchmark program provided at
// http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/
// It is modified to explicitly include cstdlib.
#include <fstream>
#include <iostream>
#include <set>
#include <string>
#include <vector>
#include <cstdlib>

using namespace std;

/*
 * 读取每行，加入vector
 */
vector<string>
load_list(const string& fname)
{
  vector<string> ret;
  ifstream fobj(fname.c_str());
  if (!fobj.good()) { cerr << "File " << fname << " not found!\n"; exit(-1); }
  string line;
  while (getline(fobj, line)) {
    ret.push_back(line);
  }
  return ret;
}

/*
 * 将vector转换成为set
 */
template<class T>
set<T> vector_to_set(const vector<T>& vec)
{ return set<T>(vec.begin(), vec.end()); }

/**
 * ap计算
 * @param pos 正样本集合
 * @param amb 负样本集合
 * @param ranked_list 排序列表
 * @return ap
 */
float
compute_ap(const set<string>& pos, const set<string>& amb, const vector<string>& ranked_list)
{
  float old_recall = 0.0;
  float old_precision = 1.0;
  float ap = 0.0;

  // 已检索到正样本个数
  size_t intersect_size = 0;
  size_t i = 0;
  size_t j = 0;
  // 遍历检索列表
  for ( ; i<ranked_list.size(); ++i) {
    // 如果第i个排序图像是负样本，跳过本次计算
    if (amb.count(ranked_list[i])) continue;
    // 如果第i个排序图像是正样本，已检索到正样本个数加1
    if (pos.count(ranked_list[i])) intersect_size++;
    // 第i个排序图像即不是正样本又不是负样本，那么就是干扰项（Bad标签或者其他）

    // 计算召回率 = 已检索到正样本个数 / 正样本集总数。
    // 如果是正样本，那么召回率提高
    // 如果是干扰项，那么召回率保持不变
    float recall = intersect_size / (float)pos.size();
    // 计算精度 = 已检索到正样本个数 / 预测已遍历样本集个数
    // 如果是正样本，那么精度提高
    // 如果是干扰项，那么精度下降
    float precision = intersect_size / (j + 1.0);

    // 计算AP(曲线下面积)，横坐标是召回率，纵坐标是精度
    // 之前坐标点：(old_recall, old_precision)
    // 当前坐标点：(recall, precision)
    // 累加前后坐标点的矩形面积来模拟曲线下面积
    // 对于干扰项，前后召回率一样，所以ap += 0
    ap += (recall - old_recall)*((old_precision + precision)/2.0);

    old_recall = recall;
    old_precision = precision;
    j++;
  }
  return ap;
}

int
main(int argc, char** argv)
{
  if (argc != 3) {
    // 输入查询图像名 + 检索列表文件（每一行表示一个检索图像名）
    cout << "Usage: ./compute_ap [GROUNDTRUTH QUERY] [RANKED LIST]\n";
    return -1;
  }

  string gtq = argv[1];

  // 加载检索列表
  vector<string> ranked_list = load_list(argv[2]);
  // 加载该查询对应的Good图像集合
  set<string> good_set = vector_to_set( load_list(gtq + "_good.txt") );
  // 加载该查询对应的OK图像集合
  set<string> ok_set = vector_to_set( load_list(gtq + "_ok.txt") );
  // 加载该查询对应的Junk图像集合
  set<string> junk_set = vector_to_set( load_list(gtq + "_junk.txt") );

  // 设置Good图像集合和OK图像集合作为正样本
  set<string> pos_set;
  pos_set.insert(good_set.begin(), good_set.end());
  pos_set.insert(ok_set.begin(), ok_set.end());

  // 设置Junk图像集合作为负样本
  // 计算该查询图像的平均精度
  float ap = compute_ap(pos_set, junk_set, ranked_list);
  
  cout << ap << "\n";

  return 0;
}

mAP就是多次查询计算得到的AP均值。

大海

Oxford5k and Paris6k

概述

标注

评估标准

相关阅读