- What is a ResNet & Why use it for computer vision tasks?
- ResNet best practices
- An example using the high-level API
Arguably the best architecture for most computer vision tasks, here we take a look at ResNet and how it can be used in fastai for a variety of such tasks.
A ResNet is a model architecture that has proven to work well in CV tasks. Several variants exist with different numbers of layers with the larger architectures taking longer to train and more prone to overfitting especially with smaller datasets.
The number represents the number of layers in this particular ResNet variant ... "(other options are 18, 50, 101, and 152) ... model architectures with more layers take longer to train and are more prone to overfitting ... on the other hand, when using more data, they can be qite a bit more accurate." 2
Sound, time series, malware classification ... "a good rule of thumb for converting a dataset into an image representation: if the human eye can recognize categories from the images, then a deep learning model should be able to do so too." 1
Pretty well apparently (at least at the time this post was written) ...
from fastai.vision.all import * path = untar_data(URLs.PETS)/'images' def is_cat(x): return x.isupper() dls = ImageDataLoaders.from_name_func(path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(224))
Why do we make images 224x224 pixels?
"This is the standard size for historical reasons (old pretrained models require this size exactly) ... If you increase the size, you'll often get a model with better results since it will be able to focus on more details." 3
learn = cnn_learner(dls, resnet18, metrics=error_rate)
As you can see above, the architecture being used is a resnet with 18 layers.