Arguably the best architecture for most computer vision tasks, here we take a look at ResNet and how it can be used in fastai for a variety of such tasks.


Why is it important?

Because it allows you to know both what your NN is doing/learning and whether it is learning anything at all. The former is helpful because it gives you confidence that your model is learning to look at the right information and insights on how to improve it, the later because a model that isn't learning anything (e.g., able to update its parameters so as to improve itself) isn't a helpful or useful model.

Tip: Learn how to visualize and understand your activations and gradients

Visualizing computer vision models

The top of this image is a visualization of the weights (what the model is learning), and the one below is a visualization of the activations, in particular, the parts of training images that most strongly match each set of weights above. 1

Tip: This kind of visualization is particularly helpful in transfer learning as it allows us to infer which layers may require more or less training for our task. For example, the layer above probably requires little to no training as it looks to be identifying edges and gradients, thing likely helpful and necessary for all computer vision tasks.

Examples

Vectors into 2D grayscale images (MNIST)

Courtesy of Abishek Thakur's, "Approaching (almost) any Machine Learning Problem" 2

inputs, targets = datasets.fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
targets = targets.astype(int)

inputs.shape, targets.shape # always helpful to see the shape of things
((70000, 784), (70000,))
# see https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch
images = inputs.reshape((-1,28,28))

print(images.shape)
plt.imshow(images[0], cmap='gray')
(70000, 28, 28)
<matplotlib.image.AxesImage at 0x7f91fc62b210>

Vectors as clusters (MNIST)

Courtesy of Abishek Thakur's, "Approaching (almost) any Machine Learning Problem" 3

tsne = manifold.TSNE(n_components=2, random_state=42)
transformed_data = tsne.fit_transform(inputs[:1000]) # reduces dimensionality of each vector to 2
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
cluster_data = np.column_stack((transformed_data, targets[:1000]))
cluster_data.shape # transformed_data 2 dims (call them x and y) + targets 1 dim = 3
(1000, 3)
tsne_df = pd.DataFrame(cluster_data, columns=['x', 'y', 'targets'])
print(len(tsne_df))
tsne_df.head(2)
1000
x y targets
0 22.735518 14.271368 5.0
1 45.913292 0.439934 0.0
viz = sns.FacetGrid(tsne_df, hue='targets', height=8)
viz.map(plt.scatter, 'x', 'y').add_legend()
<seaborn.axisgrid.FacetGrid at 0x7f91fa9adc90>

1. "Chaper 1: Your Deep Learning Journey". In The Fastbook pp.33-36 provides several visualizations of what the parameters and activations look like at different layers in a CNN.

2. "Supervised vs unsupervised learning". In Approaching (almost) any Machine Learning Problem p.11

3. Ibid., pp.12-13