PyTorch(1.3.0+)：基于UNet和camvid数据集的道路分割

警告

请注意PyTorch使用的版本！！！

背景

语义分割是深度学习中的一个非常重要的研究方向，并且UNet是语义分割中一个非常经典的模型。在本次博客中，我尝试用UNet对camvid dataset数据集进行道路分割，大致期望的效果如下：

原图

道路分割效果

本博客的代码参考了以下链接：

https://github.com/milesial/Pytorch-UNet
https://github.com/qubvel/segmentation_models.pytorch

数据集介绍及处理

之前的博客里，我几乎不怎么介绍数据集，因为用到的数据集比较简单；但是在使用camvid dataset的时候，我脑袋都大了，用了两三个小时才搞清楚这个数据集到底是啥情况。

数据集下载链接

虽然数据集的主页还可以访问，但是下载链接好像都失效了，所以最后还是用了aws上存储链接。

https://s3.amazonaws.com/fast-ai-imagelocal/camvid.tgz

数据说明

camvid数据集里包括三种重要信息，分别是RGB影像、语义分割图和标签说明。
RGB影像就不用多少了，为三通道RGB。
语义分割图为单通道，其中像素值代表了当前像素的类别，其对应关系存储在标签说明里。
标签说明对应了语义分割图像素值和类别的关系，如下：

0	 Animal
1	 Archway
2	 Bicyclist
3	 Bridge
4 	Building
5	 Car
6	 CartLuggagePram
7	 Child
8	 Column_Pole
9 	 Fence
10 LaneMkgsDriv
11 LaneMkgsNonDriv
12 Misc_Text
13 MotorcycleScooter
14 OtherMoving
15 ParkingBlock
16 Pedestrian
17 Road
18 RoadShoulder
19 Sidewalk
20 SignSymbol
21 Sky
22 SUVPickupTruck
23 TrafficCone
24 TrafficLight
25 Train
26 Tree
27 Truck_Bus
28 Tunnel
29 VegetationMisc
30 Void
31 Wall

数据处理

下载后数据后会有一个压缩包，包括images和labels，分别对应的是RGB的影像和像素的标签。
首先要做以下的一些处理，包括：

重命名labels的名称，去掉名称里的_P，保证labels和images的名称一致
将原始数据集按照7:2:1的规则，分割成train:valid:test

rename.py

import os,sys

cur_path = 'D:/camvid/camvid/labels' # 你的数据集路径

labels = os.listdir(cur_path)

for label in labels:
    old_label = str(label)
    new_label = label.replace('_P.png','.png')
    print(old_label, new_label)
    os.rename(os.path.join(cur_path,old_label),os.path.join(cur_path,new_label))

split_dataset.py

import os
import random
import shutil

# 数据集路径
dataset_path = 'D:/camvid/camvid'
images_path = 'D:/camvid/camvid/images'
labels_path   = 'D:/camvid/camvid/labels'

images_name = os.listdir(images_path)
images_num  = len(images_name)
alpha  = int( images_num  * 0.7 )
beta   = int( images_num  * 0.9 )

print(images_num)

random.shuffle(images_name)

train_list = images_name[0:alpha]
valid_list = images_name[alpha:beta]
test_list  = images_name[beta:images_num]

# 确认分割正确
print('train list: ',len(train_list))
print('valid list: ',len(valid_list))
print('test list: ',len(test_list))
print('total num: ',len(test_list)+len(valid_list)+len(train_list))

# 创建train,valid和test的文件夹
train_images_path = os.path.join(dataset_path,'train_images')
train_labels_path  = os.path.join(dataset_path,'train_labels')
if os.path.exists(train_images_path)==False:
    os.mkdir(train_images_path )
if os.path.exists(train_labels_path)==False:
    os.mkdir(train_labels_path)

valid_images_path = os.path.join(dataset_path,'valid_images')
valid_labels_path  = os.path.join(dataset_path,'valid_labels')
if os.path.exists(valid_images_path)==False:
    os.mkdir(valid_images_path )
if os.path.exists(valid_labels_path)==False:
    os.mkdir(valid_labels_path)

test_images_path = os.path.join(dataset_path,'test_images')
test_labels_path  = os.path.join(dataset_path,'test_labels')
if os.path.exists(test_images_path)==False:
    os.mkdir(test_images_path )
if os.path.exists(test_labels_path)==False:
    os.mkdir(test_labels_path)

# 拷贝影像到指定目录
for image in train_list:
    shutil.copy(os.path.join(images_path,image), os.path.join(train_images_path,image))
    shutil.copy(os.path.join(labels_path,image), os.path.join(train_labels_path,image))

for image in valid_list:
    shutil.copy(os.path.join(images_path,image), os.path.join(valid_images_path,image))
    shutil.copy(os.path.join(labels_path,image), os.path.join(valid_labels_path,image))

for image in test_list:
    shutil.copy(os.path.join(images_path,image), os.path.join(test_images_path,image))
    shutil.copy(os.path.join(labels_path,image), os.path.join(test_labels_path,image))

代码

代码链接：https://github.com/Yannnnnnnnnnnn/learnPyTorch/blob/master/road%20segmentation%20(camvid).ipynb

# 导入库
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import numpy as np
import cv2
import matplotlib.pyplot as plt


# 设置数据集路径
DATA_DIR = 'D:/camvid/camvid' # 根据自己的路径来设置

x_train_dir = os.path.join(DATA_DIR, 'train_images')
y_train_dir = os.path.join(DATA_DIR, 'train_labels')

x_valid_dir = os.path.join(DATA_DIR, 'valid_images')
y_valid_dir = os.path.join(DATA_DIR, 'valid_labels')

x_test_dir = os.path.join(DATA_DIR, 'test_images')
y_test_dir = os.path.join(DATA_DIR, 'test_labels')

# 导入pytorch
import torch
from torch.utils.data import DataLoader
from torch.utils.data import Dataset as BaseDataset
import torch.nn as nn
import torch.nn.functional as F
from torch import optim

# 自定义Dataloader
class Dataset(BaseDataset):
    """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.
    
    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (albumentations.Compose): data transfromation pipeline 
            (e.g. flip, scale, etc.)
        preprocessing (albumentations.Compose): data preprocessing 
            (e.g. noralization, shape manipulation, etc.)
    
    """
    
    def __init__(
            self, 
            images_dir, 
            masks_dir, 
            augmentation=None,
    ):
        self.ids = os.listdir(images_dir)
        self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
        self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]
        
        self.augmentation = augmentation

    
    def __getitem__(self, i):
                
        # read data
        image = cv2.imread(self.images_fps[i])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        mask = cv2.imread(self.masks_fps[i], 0)
        
        #　抱歉代码写的这么粗暴，意思就是讲mask里的道路设置为前景，而其他设置为背景
        # road
        mask = (mask==17)
        mask = mask.astype('float')   
        
        # apply augmentations
        if self.augmentation:
            sample = self.augmentation(image=image, mask=mask)
            image, mask = sample['image'], sample['mask']
       
        # 这里必须设置一个mask的shape，因为前边的形状是（320,320）
        return image, mask.reshape(1,320,320)
        
    def __len__(self):
        return len(self.ids)

# 数据增强
# 关于albumentations 怎么用我就不废话了
# 需要说明的是，我本身是打算用pytorch自带的transform
# 然而我实在没有搞明白，怎么同时对image和mask进行增强
# 如果连续调用两次transform，那么image和mask的增强方式都不一致，肯定不行
# 如果将[image;mask]堆砌在一起，放到transform里，image和mask的增强方式倒是一样了，但是transform最后一步的toTensor会把mask归一化，这肯定也是不行的
import albumentations as albu
def get_training_augmentation():
    train_transform = [
        albu.HorizontalFlip(p=0.5),
        albu.Resize(height=320, width=320, always_apply=True),
        albu.ShiftScaleRotate(scale_limit=0.1, rotate_limit=20, shift_limit=0.1, p=1, border_mode=0),
    ]
    return albu.Compose(train_transform)

def get_test_augmentation():
    train_transform = [
        albu.Resize(height=320, width=320, always_apply=True),
    ]
    return albu.Compose(train_transform)                                         

augmented_dataset = Dataset(
    x_train_dir, 
    y_train_dir, 
    augmentation=get_training_augmentation(), 
)


# 定义UNet的基本模块
# 代码来自https://github.com/milesial/Pytorch-UNet
class DoubleConv(nn.Module):
    """(convolution => [BN] => ReLU) * 2"""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.double_conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.double_conv(x)

class Down(nn.Module):
    """Downscaling with maxpool then double conv"""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.maxpool_conv = nn.Sequential(
            nn.MaxPool2d(2),
            DoubleConv(in_channels, out_channels)
        )

    def forward(self, x):
        return self.maxpool_conv(x)

class Up(nn.Module):
    """Upscaling then double conv"""

    def __init__(self, in_channels, out_channels, bilinear=True):
        super().__init__()

        # if bilinear, use the normal convolutions to reduce the number of channels
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        else:
            self.up = nn.ConvTranspose2d(in_channels // 2, in_channels // 2, kernel_size=2, stride=2)

        self.conv = DoubleConv(in_channels, out_channels)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        # input is CHW
        diffY = torch.tensor([x2.size()[2] - x1.size()[2]])
        diffX = torch.tensor([x2.size()[3] - x1.size()[3]])

        x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2,
                        diffY // 2, diffY - diffY // 2])
        # if you have padding issues, see
        # https://github.com/HaiyongJiang/U-Net-Pytorch-Unstructured-Buggy/commit/0e854509c2cea854e247a9c615f175f76fbb2e3a
        # https://github.com/xiaopeng-liao/Pytorch-UNet/commit/8ebac70e633bac59fc22bb5195e513d5832fb3bd
        x = torch.cat([x2, x1], dim=1)
        return self.conv(x)
        
class OutConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(OutConv, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1)

    def forward(self, x):
        return self.conv(x)

# UNet
class UNet(nn.Module):
    def __init__(self, n_channels, n_classes, bilinear=True):
        super(UNet, self).__init__()
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.bilinear = bilinear

		# 考虑到我电脑的显卡大小，我降低了参数~~，无奈之举
        self.inc = DoubleConv(n_channels, 32)
        self.down1 = Down(32, 64)
        self.down2 = Down(64, 128)
        self.down3 = Down(128, 256)
        self.down4 = Down(256, 256)
        self.up1 = Up(512, 128, bilinear)
        self.up2 = Up(256, 64, bilinear)
        self.up3 = Up(128, 32, bilinear)
        self.up4 = Up(64, 32, bilinear)
        self.outc = OutConv(32, n_classes)
        self.out  = torch.sigmoid #此处记得有sigmoid
    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up1(x5, x4)
        x = self.up2(x, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)
        logits = self.outc(x)
        logits = self.out(logits)
        return logits

# 设置train数据集
# 原谅我偷懒，并没有valid，因为我并没有train多少epoch
train_dataset = Dataset(
    x_train_dir, 
    y_train_dir, 
    augmentation=get_training_augmentation(), 
)
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)

# 准备训练，定义模型，我只做了两分类（偷懒）
# 另外，由于我修改了UNet模型，所以encoder部分，肯定不能用预训练模型
# 并且，我真的很反感每次都用预训练模型，没啥成就感。。。
net = UNet(n_channels=3, n_classes=1)

# 训练
from torch.autograd import Variable
net.cuda()

# 这里我说一下我是怎么train的
# 先lr=0.01,train大概40个epoch
# 然后lr=0.005,train大概40个epoch
# 最后在lr=0.0001,train大概20个epoch
optimizer = optim.RMSprop(net.parameters(), lr=0.4, weight_decay=1e-8)

# 这个loss是专门用于二分类的，吴恩达的课程我记得前几节课就讲了
criterion = nn.BCELoss()

device = 'cuda'
for epoch in range(10):
    
    net.train()
    epoch_loss = 0
    
    for data in train_loader:
        
        # 修改一下数据格式
        images,labels = data
        images = images.permute(0,3,1,2) # 交换通道顺序
        images = images/255. # 把image的值归一化到[0,1]
        images = Variable(images.to(device=device, dtype=torch.float32))
        labels = Variable(labels.to(device=device, dtype=torch.float32))
        

        pred = net(images)
        
        # 这里我不知道是看了哪里的代码
        # 最开始犯傻写成了 loss = criterion(pred.view(-1), labels.view(-1))
        # 结果loss很久都不下降
        # 还不知道为啥
        loss = criterion(pred, labels)
        epoch_loss += loss.item()
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print('loss: ', loss.item())
       
 # 测试
 test_dataset_noaug = Dataset(
    x_train_dir, 
    y_train_dir,
    augmentation=get_test_augmentation(),
    )

image, mask = test_dataset_noaug[77]
show_image = image
with torch.no_grad():
    image = image/255.
    image = image.astype('float32')
    image = torch.from_numpy(image)
    image = image.permute(2,0,1)
    image = image.to()
    print(image.shape)
    
    pred = net(image.unsqueeze(0).cuda())
    pred = pred.cpu()

# 大于0.5我才认为是对的
pred = pred>0.5
# 展示图如下
visualize(image=show_image,GT=mask[0,:,:],Pred=pred[0,0,:,:])

结果及分析

看一下最终结果，做一下分析讨论，总结经验。

结果

关于结果，这里随便展示几个吧，感觉还行。

分析

这是我第一次train分割的网络，有一些经验，写一写。

最开始train的时候，我比较心贪，用的原始分辨率的影像，720*960；结果网络参数太多，根本train不了，而且训练效果也不好；最后降采样才正常了，且效果变好了。
在训练之前，务必搞清楚数据集的格式，不然都不知道在train啥。
我在选择分割对象的时候，其实最开始也是用car,但是明显这个类别在影像里特别少，效果一直不好；最后选取了sky,road和wall这种样本较多的，效果才比较好；这说明样本数量还是很重要的。

PyTorch(1.3.0+)：基于UNet和camvid数据集的道路分割

警告

背景

原图

道路分割效果

数据集介绍及处理

数据集下载链接

数据说明

数据处理

rename.py

split_dataset.py

代码

结果及分析

结果

分析

PyTorch：學習conv1D,conv2D和conv3D

Pytorch : Run FlowNet2 with Pytorch

讀後感--《魔鬼數學：大數據時代，數學思維的力量》

偷懶性開發：gitblid+jenkins持續性開發與集成

PyTorch(1.3.0+)：學習torch.nn.functional.grid_sample

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結