将 bdd100k 数据集转换为 yolo 格式
封面图《見上げてごらん、夜空の星》

前言

因为原先从 COCO 中提取的数据集的检测效果一直不好，在观察了几个数据集后发现，COCO 数据集的图片中我需要的目标都比较小，且我需要的一些类样本不均衡，比如 truck 类就特别少，尽管 YOLOv4 的论文中说可以使用 focal loss 来解决样本不均衡，但是最简单的还是换一个数据集，因此打算新做一个数据集。

BDD100K 数据集

官网：https://bdd-data.berkeley.edu/
论文：https://arxiv.org/abs/1805.04687
github：https://github.com/bdd100k/bdd100k

BDD100K 是伯克利大小 AI 实验室发布的一个自动驾驶领域的数据集。具体内容可以看论文，在此我只关心我需要的内容，各个类别的数据量。

BDD100K 中车辆行人的数量都很多，truck 有 4296 个，Bus 有 16505 个。而 COCO 中我想要 2000 个 truck 最后只返回了我 1400 左右，因此在 COCO 提取训练集训练测试结果的时候发现 truck 和 bus 分不清，且 truck 的 AP50 很低。

数据集转换

由于 BDD100K 的数据集格式和 darknet 需要的 yolo 格式不一样，因此需要将 label 进行转换，顺便对 label 进行一下提取，去掉几个我不需要的类。
BDD100K 的标签文件格式遵循 Scalabel Format，此外还有一些 BDD100K 自己的格式，不过与我的需求没有关系。因此先来看一下 scalabel 格式

- name: string
- url: string
- videoName: string (optional)
- attributes: a dictionary of frame attributes
- intrinsics
    - focal: [x, y]
    - center: [x, y]
    - nearClip:
- extrinsics
    - location
    - rotation
- timestamp: int64 (epoch time ms)
- frameIndex: int (optional, frame index in this video)
- labels [ ]:
    - id: string
    - category: string (classification)
    - manualShape: boolean
    - manualAttributes: boolean
    - score: float
    - attributes: a dictionary of label attributes
    - box2d:
        - x1: float
        - y1: float
        - x2: float
        - y2: float
    - box3d:
        - alpha:
        - orientation:
        - location: ()
        - dimension: (3D point, height, width, length)
    - poly2d: an array of objects, with the structure
        - vertices: [][]float (list of 2-tuples [x, y])
        - types: string
        - closed: boolean

再看一看实际文件，还是符合的

代码

参考了 github 的一个现有转换脚本进行了一定的修改，添加了命令行支持和类别提取。代码如下。

import os
import json
from tqdm import tqdm
import cv2

TRAIN_LABEL_NAME = 'bdd100k_labels_images_train.json'
VAL_LABEL_NAME = 'bdd100k_labels_images_val.json'

classes = [
    "person",
    "bike",
    "car",
    "motor",
    "bus",
    "truck",
    "rider",
    "train"
]

counter = {}

for c in classes:
    counter[c] = 0


def get_args():
    import argparse

    parser = argparse.ArgumentParser()

    parser.add_argument('-i', '--images', help="images path",
                        type=str, default='./images/100k')
    parser.add_argument('-l', '--labels', help="labels path",
                        type=str, default="./labels")
    parser.add_argument('-t', '--type', help="type of dataset",
                        choices=['train', 'val'], default='train')

    return parser.parse_args()


def convertBdd100k2yolo(imageFileName, label):
    img = cv2.imread(imageFileName)
    width, height = img.shape[1], img.shape[0]
    dw = 1.0/width
    dh = 1.0/height

    catName = label['category']
    classIndex = classes.index(catName)
    roi = label['box2d']

    w = roi['x2']-roi['x1']
    h = roi['y2']-roi['y1']
    x_center = roi['x1'] + w/2
    y_center = roi['y1'] + h/2

    x_center, y_center, w, h = x_center*dw, y_center*dh, w*dw, h*dh

    return "{} {} {} {} {}\n".format(classIndex, x_center, y_center, w, h)


if __name__ == '__main__':
    args = get_args()
    # 转为文件位置
    if args.type == 'train':
        imageRootPath = os.path.join(args.images, 'train')
        labelFilePath = os.path.join(args.labels, TRAIN_LABEL_NAME)
    else:
        imageRootPath = os.path.join(args.images, 'val')
        labelFilePath = os.path.join(args.labels, VAL_LABEL_NAME)

    with open(labelFilePath) as file:
        lines = json.load(file)
        print("loaded labels")

    for line in tqdm(lines):
        # 图像名
        name = line['name']
        labels = line['labels']
        imagePath = os.path.join(imageRootPath, name)
        labelPath = imagePath.replace('jpg', 'txt')
        if not os.path.isfile(os.path.realpath(imagePath)):
            continue
        with open(labelPath, 'w') as file:
            # 遍历label
            for label in labels:
                cat = label['category']
                if cat in classes:
                    counter[cat] += 1
                    file.write(convertBdd100k2yolo(imagePath,label))

    print(counter)

效果如下，训练集因为中途缩放了一下 vscode 窗口，所以进度条就这样了。除去一个 20% 量的 test，类别数量还是和论文里面的差不多的。

删除无标签图片

经过训练发现存在一些无标签图片，因此需要进行删除，代码如下

import os


def get_args():
    import argparse

    parser = argparse.ArgumentParser()

    parser.add_argument('-i', '--images', help="images path",
                        type=str, default='./images/100k')
    parser.add_argument('-t', '--type', help="type of dataset",
                        choices=['train', 'val'], default='train')

    return parser.parse_args()


def getFileList(dir, extract):
    fileList = []
    filenames = os.listdir(dir)
    for filename in filenames:
        ext = os.path.splitext(filename)[-1]
        if ext == extract:
            fileList.append(filename)
    return fileList


if __name__ == '__main__':
    args = get_args()

    if args.type == 'train':
        imageRootPath = os.path.join(args.images, 'train')
    else:
        imageRootPath = os.path.join(args.images, 'val')

    imageName = getFileList(imageRootPath, '.jpg')
    labelName = getFileList(imageRootPath, '.txt')

    labelName = [label.replace(".txt", ".jpg") for label in labelName]
    lackImages = set(imageName) - set(labelName)

    for file in lackImages:
        os.remove(os.path.join(imageRootPath,file))
        #print(os.path.join(imageRootPath,file))

总结

标签转换完成，接下来生成 train.txt 和 val.txt 就和之前的 coco2yolo 一样了。随后的效果就得训练后才能知道了。

参考

[1]Yu F, Chen H, Wang X, et al. Bdd100k: A diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 2636-2645.
[2]BDD100K：一个大规模、多样化的驾驶视频数据集
[3]bdd100k_to_yolo_converter