- Tip 1: Use the "presizing trick"
- Tip 2: Create a "baseline"
- Tip 3: Normalize your data
- Tip 4: Use "progressive resizing"
- Tip 5: Use Test Time Augmentation (TTA)
- Tip 6: Use MixUp
- Tip 7: Use "Label Smoothing"
Other posts in this series:
A Journey Through Fastbook (AJTFB) - Chapter 1
A Journey Through Fastbook (AJTFB) - Chapter 2
A Journey Through Fastbook (AJTFB) - Chapter 3
A Journey Through Fastbook (AJTFB) - Chapter 4
A Journey Through Fastbook (AJTFB) - Chapter 5
A Journey Through Fastbook (AJTFB) - Chapter 6a \ A Journey Through Fastbook (AJTFB) - Chapter 6b
The Imagenette is a subset of the ImageNet dataset that "contains a subset of 10 very different categories from the orginal ImageNet dataset, making for quicker training when we want to experiment"
from fastai.vision.all import * path = untar_data(URLs.IMAGENETTE)
See chapter 5, pp.189-191. The idea here is to first crop the image so that further augmentations can be applied without creating empty space (via
item_tfms), with further augmentations being applied on the GPU on batches of images for speed (via
On the training set, the initial crop area is chosen randomly with the size set to cover the entire width/height of the image with random crop and other augmentations done on the GPU.
On the validation set, a center square is always used in the first step and only a resize is applied on the GPU to get the image width/height equal to the final size needed.
dblock = DataBlock( blocks=(ImageBlock(), CategoryBlock()), get_items = get_image_files, get_y = parent_label, item_tfms = Resize(460), batch_tfms = aug_transforms(size=224, min_scale=0.75) ) dls = dblock.dataloaders(path, bs=64)
model = xresnet50() learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy) learn.fit_one_cycle(5, 3e-3)
Normalizetransform to apply this a batch at a time.
imagenet_statsinstead of calculating the mean/std ourselves).
# (because we aren't using normalization yet, you'll see the mean and standard deviation are not very close to # 0 and 1 respectively) x, y = dls.one_batch() x.mean(dim=[0,2,3]), x.std(dim=[0,2,3])
(TensorImage([0.4518, 0.4554, 0.4344], device='cuda:0'), TensorImage([0.2868, 0.2783, 0.2998], device='cuda:0'))
def get_dls(batch_size, image_size): dblock = DataBlock( blocks=(ImageBlock(), CategoryBlock()), get_items = get_image_files, get_y = parent_label, item_tfms = Resize(460), batch_tfms = [*aug_transforms(size=image_size, min_scale=0.75), Normalize.from_stats(*imagenet_stats)] ) dls = dblock.dataloaders(path, bs=batch_size) return dls
dls = get_dls(64, 224) # an example of normalization calculated on a batch of images # (because we are using normalization now, the mean and standard deviation are very close to 0 and 1 respectively) x, y = dls.one_batch() print(x.mean(dim=[0,2,3]), x.std(dim=[0,2,3])) # does this normalization improve our model? Let's see ... model = xresnet50() learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy) learn.fit_one_cycle(5, 3e-3)
TensorImage([-0.0816, -0.0114, 0.0695], device='cuda:0') TensorImage([1.1806, 1.1762, 1.2825], device='cuda:0')
"... start training using small images, and end training using large images. Spending most of the epochs training with small images helps training complete faster."
"... the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image .... So, when we change the image size in the middle of training, it doesn't mean that we have to find totally different parameters for our model."
dls = get_dls(128,128) learn = Learner(dls, xresnet50(), loss_func=CrossEntropyLossFlat(), metrics=accuracy) learn.fit_one_cycle(4, 3e-3) # simply replace the `Learner.dls` with new `DataLoaders` and continue traning. learn.dls = get_dls(64, 224) learn.fine_tune(5, 1e-3)
DataLoaderswith bigger images, we simply assign it to
"On the other hand, if the transfer learning task is going to use images that are of different sizes, shapes, or styles than those used in the pretraining task, progressive resizing will probably help"
"... select a number of areas to crop from the original rectangular image, pass each of them through our model, and take the maximum or average of the predictions. In fact, we can do this not just for different crops, but for different values across all of our test time augmentation parameters"
What is the problem TTA addresses and why use it?
"When we use random cropping, fastai will automatically use center-cropping for the validation set" which can be probelmatic, for example, in multi-label tasks where "sometimes there are small objects toward the edges of an image" that might be cropped out entirely or perhaps features on the fringe that are required for any classification task.
preds, targs = learn.tta() accuracy(preds, targs).item()
"TTA gives us a good boost in performance, with no additional training required.
"Mixup ... is a powerful data augmentation technique that can provide dramatically higher accuracy, especially when you don't have much data and don't have a pretrained model that was trained on data similar to your dataset"
It is a dataset-independent form of data augmentation = can be applied without domain knowledge of the dataset to configure other forms of data augmentation (e.g., flipping and to what degree, etc...)
How does Mixup work?
- Select another random image
- Pick a weight at random
- Take a weighted average of the selected image with your image = Your independent variable
- Take a weighted average of the selected image's labels with your image's labels = Your dependent variable
- Use #3 to predict #4
img2, targ2 = dataset[randint(0, len(dataset))] t = random_float(0.5, 1.0) new_img = t * img1 + (1-t) * img2 new_targ = t * targ1 + (1-t) * targ2
See pp.247-249 for a detailed example of how Mixup works and is used in fastai
model = xresnet50() learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy, cbs=MixUp) learn.fit_one_cycle(5, 3e-3)
"In the theoretical expression of loss, in classification problems, our targets are one hot encoded .... That means the model is trained to return 0 for all categories but one, for which it is trained to return 1.... This encourages overfitting and gives you at inference time a model that is not going to give meaningful probabilities: it will always say 1 for the predicted category even if it's not too sure, just because it was trained that way.
"In general, your data will never be perfect. Even if the labels were manually produced by humans, they could make mistakes, or have differences of opinions on images that are harder to label"
What is the solution this this?
"... we could replace all our 1s with a number a bit less than 1, and our 0s with a number a bit more than 0, and then train. This is" = Label smoothing. "By encouraging your model to be a less confident, label smoothing will make your training more robust, even if there is mislabeled data. The result will be a model that generalizes better at inference."
See pp.249-251 for a detailed explanation and example of how label smoothing operates. To use it, we just have to change our loss function.
model = xresnet50() learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(), metrics=accuracy) learn.fit_one_cycle(5, 3e-3)
https://book.fast.ai - The book's website; it's updated regularly with new content and recommendations from everything to GPUs to use, how to run things locally and on the cloud, etc...
Bag of Tricks for Image Classification with Convolutional Neural Networks discusses a variety of techniques you can use with CNNs
How to Train Your ResNet 8: Bag of Tricks discusses a variety of techniques you can use to training ResNets.
IceVision is a great resource for all things computer vision and a fastai friendly library. You may want to follow these twitter accounts as well: @ai_fast_track and @Fra_Pochetti (creator of IceVision).