A Journey Through Fastbook (AJTFB) - Chapter 6: Regression
Its the more things you can do with computer vision chapter of "Deep Learning for Coders with fastai & PyTorch"! Having looked at both multiclass and multilable classification, we now turn our attention to regression tasks. In particular, we'll look at key point regression models covered in chapter 6. Soooo lets go!
Other posts in this series:
A Journey Through Fastbook (AJTFB) - Chapter 1
A Journey Through Fastbook (AJTFB) - Chapter 2
A Journey Through Fastbook (AJTFB) - Chapter 3
A Journey Through Fastbook (AJTFB) - Chapter 4
A Journey Through Fastbook (AJTFB) - Chapter 5.
A Journey Through Fastbook (AJTFB) - Chapter 6a
Regression
A regression task is all about predicting a continous value rather than a particular cateogry.
Here we'll consider a particular type of regression problem called image regression, where the "independent variable is an image, and the dependent variable is one or more float." Our model is going to be a key point model that aims to predict a point (e.g., 2 labels ... the x and y) on the image, which in our example is the center of a person's face.
Defining your DataBlock
Again, the DataBlock
is a blueprint for everything required to turn your raw data (images and labels) into something that can be fed through a neural network (DataLoaders with a numerical representation of both your images and labels). Below is the one presented in this chapter.
from fastai.vision.all import *
path = untar_data(URLs.BIWI_HEAD_POSE)
path.ls()
"There are 24 directories numbered from 01 to 24 (they corresond to the different people photographed), and a corresponding .obj file for each (we won't need them here)
(path/'01').ls()
Looks like each person has multiple images, and for each image there is a text file telling us where the point is. We can write a function to get the .txt file for any given image as such
img_files = get_image_files(path)
def img2pose(img_fpath):
return Path(f"{str(img_fpath)[:-7]}pose.txt")
img2pose(img_files[0])
img = PILImage.create(img_files[0])
print(img.shape)
img.to_thumb(160)
And the books provides the function to use to extract the x/y (point) which is given as ...
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_img_center(img_fpath):
ctr = np.genfromtxt(img2pose(img_fpath), skip_header=3)
x = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
y = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
return tensor([x,y])
get_img_center(img_files[0])
And with the above info and methods, we can now construct our DataBlock
dblock = DataBlock(
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_img_center,
splitter=FuncSplitter(lambda o: o.parent.name=='13'),
batch_tfms=[*aug_transforms(size=(240,320)), Normalize.from_stats(*imagenet_stats)]
)
Let's break down our blueprint!
-
Define the data types for our inputs and targets via the
blocks
argument.Here our targets are of type
PointBlock
. "This is necessary so that fastai knows that the labels represent coordinates ... it knows that when doing data augmentation, it should do the same augmentation to these coordinates as it does to the images." -
Define how we're going to get our images via
get_items
.Can just use the
get_image_files
since we will be passing thepath
into ourDataBlock.dataloaders()
method -
Define how, from the raw data, we're going to create our labels via
get_y
.Will simply use the
get_img_center
we defined above since we will get getting a bunch of paths to images. -
Define how we're going to create our validation dataset via
splitter
Here we define a custom splitter using
FuncSplitter
, which gives us complete control in how our validation set is determined. Here it will be all the images associated to person "13". -
Define things we want to do for each item via
item_tfms
Nothing for this example
-
Define things we want to do for each mini-batch of items via
batch_tfms
For each minibatch of data, we'll resize each image to 320x240 pixels and apply the default augmentations specified in
aug_transforms
. We'll also normalize our images used the ImageNet mean/standard deviations since our pretrained model was trained on ImageNet.Note: If you want to serialize yourLearner
, do not use lambda functions for defining your DataBlock methods! They can’t be pickled.
dls = dblock.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
To get a feel for what our item_tfms
and batch_tfms
are doing, we can show_batch
using a single image as we do below.
dls.show_batch(unique=True)
xb, yb = dls.one_batch()
xb.shape, yb.shape, yb[0]
plot_function(partial(sigmoid_range,low=-1,high=1), min=-4, max=4)
Learner
Define your loss function
As we didn't define a loss function, fastai will pick one for us based on our task. Here is will be MSELoss
(mean squared loss).
"... when coordinates are used as the dependent variable, most of the time we're likely to be trying to predict something as close as possible; that's basically what MSELoss
does"
dls.loss_func
learn = cnn_learner(dls, resnet18, y_range=(-1,1))
learn.lr_find()
learn.fit_one_cycle(5, 2e-2)
"Generally when we run this we got a loss of around 0.0001, which correspondes to this average coordinate prediction error:"
math.sqrt(0.0001)
# 0.01
This is pretty accurate ...
learn.show_results(ds_idx=1, max_n=3, figsize=(6,8))
Summary
Pick your loss and metrics according to your task ...
For single-label classification: nn.CrossEntropyLoss
and accuracy, precision, recall, f1, etc...
For multi-label classification: nn.BCEWithLogitsLoss
and accuracy, precision, recall, f1, etc...
For regression: nn.MSELoss
and the square root of the validation loss as the metric
Resources
- https://book.fast.ai - The book's website; it's updated regularly with new content and recommendations from everything to GPUs to use, how to run things locally and on the cloud, etc...