COCO Datasets and Its API

本文介绍了 COCO 数据的格式以及其 API 的使用。

Introduction

COCO 全程为 Common Objects in Context ，最初是由微软开放使用的数据集，可以用于目标检测，语义分割，关键点检测等任务。

其官网¹的介绍如下:

“COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.”

COCO 以 JSON 格式来对数据进行标注，且通过在标记中使用极其精巧的图像存储格式，使得标注格式较小，且提供多个 API ，使得 COCO 数据集已经成为了计算机视觉领域中最佳的图像数据集。此外，许多自标注数据集也广泛采用 COCO 格式进行标注，很有必要对 COCO 数据集的标注格式进行详细了解。

COCO Data Format

下面以目前手上的一个语义分割的数据集来简单阐述一下 COCO 中 JSON 格式的对应的语义信息。

标注抽取的样本如下。

{
	"info": {
		"contributor": "...",
		"about": "...",
		"date_created": "...",
		"description": "...",
		"url": "...",
		"version": "1.0",
		"year": 2020
	},

	"categories": [
        {"supercategory": "person","id": 1,"name": "person"},
        {"supercategory": "vehicle","id": 2,"name": "bicycle"},
        {"supercategory": "vehicle","id": 3,"name": "car"},
        {"supercategory": "vehicle","id": 4,"name": "motorcycle"},
        {"supercategory": "vehicle","id": 5,"name": "airplane"},        
    ],
    
	"images": [{
			"id": 20289,
			"file_name": "000000020289.jpg",
			"width": 300,
			"height": 300
		},
		{
			"id": 45176,
			"file_name": "000000045176.jpg",
			"width": 300,
			"height": 300
		}
    ],

	"annotations": [{
		"id": 377545,
		"image_id": 44153, 
		"segmentation": [
			[152.0, 180.0, 156.0, 176.0, 160.0, 181.0, 156.0, 186.0, 152.0, 180.0]
        ],
		"area": 42.0,
        "bbox": [152.0, 152.0, 28.0, 8.0],
		"category_id": 100,
		"iscrowd": 0
	}, {
		"id": 446305,
		"image_id": 52178,
		"segmentation": [
			[257, 123, 243, 123, 243, 112, 257, 112, 257, 123]
		],
		"area": 154.0,
		"bbox": [123, 243, 134, 14],
		"category_id": 100,
		"iscrowd": 0
	}]
}

上述 JSON 对象中各子对象的语义如下：

info ：数据集的大致描述。
categories：其中包括了类别的列表，且每个类别都属于一个超类别（supercategory），id 信息是唯一的，在自建数据的时候，可以使用 COCO 的分类，也可以自建类别列表。
images：数据集中的图像信息，每张图片对应一个唯一的 id 。
annotations：数据集中的标注信息，其中
- image_id，与 images子对象中的 id 项对应，原因是图像和标注存在一对多的关系，故需要增加次项。
- segmentation，以 [152.0, 180.0, 156.0, 176.0, 160.0, 181.0, 156.0, 186.0, 152.0, 180.0] 举例，该数据中依次以 (x, y) 表示标注的像素边界点，依次连接两点，可以得到最终的带有像素标注的图片。
  
  此外，segmentation 在不同的数据集中可能是多维数据，如果在同一个 bounding box 中存在物体重叠的话。
- area，标注像素区域的大小。
- bbox，bounding box 的位置，以图像左上角为原点简历坐标系，前两位以 (x, y) 为 bounding box 左上角的位置，后两位为 bounding box 的宽和高。
- category_id，该区域的类别
- iscrowd，以 0， 1 值指定标注是针对单个对象还是对象组，如果是对象组，则采用 RLE 的编码进行标注。

此外，segmentation 还可能使用 RLE （Run-Length-Encoding）编码的方式存储像素标注，不直接存储像素点，而是按列依次存储像素点连续的长度（依次存储的依据为是否是标注），样例如下：

{
    "segmentation": {
        "counts": [179,27,392,41,…,55,20],
        "size": [426,640]
    },
    "area": 220834,
    "iscrowd": 1,
    "image_id": 250282,
    "bbox": [0,34,639,388],
    "category_id": 1,
    "id": 900100250282
}

如需进一步了解 RLE 存储格式以及其它任务的数据格式，可以查看博文²或视频³ ，以及 COCO 官网关于 Data Format ⁴的介绍。

Results Format

生成的结果 JSON 格式如下所示：

{
	"image_id": 16307,
	"category_id": 100,
	"score": 102.91002765062223,
	"segmentation": {
		"size": [300, 300],
		"counts": "`l>Y1R83N1O00O10000001OO1000000000000000000000000000000000000000000O10O10000000000000000000000000001O0000000000000000000000000000000000000000000000O10000000000001O00000000000000000001O0000000O10000000000000000000000000000000000000000000000001O00000000000000000000000000000000000000001O000001O000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000d1"
	},
	"bbox": [50.0, 246.0, 249.0, 46.0]
}

其中 segmentation 项和上述的 RLE 格式相似，但是可以从 counts 中看到其掩码是通过二进制来表示的，可以使用 COCO 的 MaskApi 的 encode() 转化为上述描述的 RLE 格式，进而得到掩码。

具体的描述可见官网关于 Results format⁵的介绍，以及 COCO 的 API Demo⁶。

COCO API

COCO 为Matlab，Lua 和 Python 都提供了 API，具体使用请参考⁶ 中给出的 Demo。

COCO API 的功能只能满足基本的需求。如果需要进一步改造 API ，需要自己去读 API 实现的代码，其中 Python API 易懂且易改。

下述代码是我对 showAnns API 的一些理解，具体见注释。

def showAnns(self, anns, draw_bbox=False):
    """
    Display the specified annotations.
    :param anns (array of object): annotations to display
    :return: None
    """
    if len(anns) == 0:
        return 0
    # 判断任务
    if "segmentation" in anns[0] or "keypoints" in anns[0]:
        datasetType = "instances"
    elif "caption" in anns[0]:
        datasetType = "captions"
    else:
        raise Exception("datasetType not supported")
    if datasetType == "instances":
        ax = plt.gca()
        ax.set_autoscale_on(False)
        polygons = []
        color = []
        for ann in anns:
            c = (np.random.random((1, 3)) * 0.6 + 0.4).tolist()[0]  # 默认随机颜色，如果需要指定颜色则更改此处
            if "segmentation" in ann:
                if type(ann["segmentation"]) == list:
                    # polygon
                    # 此为原始存储的格式，即原始的数据集格式，前后连点，就能生成 多边形 mask 
                    for seg in ann["segmentation"]:
                        poly = np.array(seg).reshape((int(len(seg) / 2), 2))
                        polygons.append(Polygon(poly))
                        color.append(c)
                else:
                    # mask
                    # RLE 格式存储的数据，需要特殊处理
                    t = self.imgs[ann["image_id"]]
                    if type(ann["segmentation"]["counts"]) == list:
                        rle = maskUtils.frPyObjects(
                            [ann["segmentation"]], t["height"], t["width"]
                        )
                    else:
                        rle = [ann["segmentation"]]
                    m = maskUtils.decode(rle)
                    img = np.ones((m.shape[0], m.shape[1], 3))
                    if ann["iscrowd"] == 1:
                        color_mask = np.array([2.0, 166.0, 101.0]) / 255
                    if ann["iscrowd"] == 0:
                        color_mask = np.random.random((1, 3)).tolist()[0]
                    for i in range(3):
                        img[:, :, i] = color_mask[i]
                    ax.imshow(np.dstack((img, m * 0.5)))
            if "keypoints" in ann and type(ann["keypoints"]) == list:
                # turn skeleton into zero-based index
                sks = np.array(self.loadCats(ann["category_id"])[0]["skeleton"]) - 1
                kp = np.array(ann["keypoints"])
                x = kp[0::3]
                y = kp[1::3]
                v = kp[2::3]
                for sk in sks:
                    if np.all(v[sk] > 0):
                        plt.plot(x[sk], y[sk], linewidth=3, color=c)
                plt.plot(
                    x[v > 0],
                    y[v > 0],
                    "o",
                    markersize=8,
                    markerfacecolor=c,
                    markeredgecolor="k",
                    markeredgewidth=2,
                )
                plt.plot(
                    x[v > 1],
                    y[v > 1],
                    "o",
                    markersize=8,
                    markerfacecolor=c,
                    markeredgecolor=c,
                    markeredgewidth=2,
                )
            if draw_bbox:
                [bbox_x, bbox_y, bbox_w, bbox_h] = ann["bbox"]
                poly = [
                    [bbox_x, bbox_y],
                    [bbox_x, bbox_y + bbox_h],
                    [bbox_x + bbox_w, bbox_y + bbox_h],
                    [bbox_x + bbox_w, bbox_y],
                ]
                np_poly = np.array(poly).reshape((4, 2))
                polygons.append(Polygon(np_poly))
                color.append(c)
        p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
        ax.add_collection(p)
        p = PatchCollection(polygons, facecolor="none", edgecolors=color, linewidths=2)
        ax.add_collection(p)
    elif datasetType == "captions":
        for ann in anns:
            print(ann["caption"])

Reference

https://cocodataset.org “COCO - Common Objects in Context” ↩︎
https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch “Create COCO Annotations From Scratch” ↩︎
https://www.youtube.com/watch?v=h6s61a_pqfM “COCO Dataset Format - Complete Walkthrough” ↩︎
https://cocodataset.org/#format-data “Data format” ↩︎
https://cocodataset.org/#format-results “Results format” ↩︎
https://github.com/cocodataset/cocoapi “cocodataset/cocoapi: COCO API - Dataset” ↩︎

Introduction#

COCO Data Format#

Results Format#

COCO API#

Reference#

Introduction

COCO Data Format

Results Format

COCO API

Reference