Here we look at machine learning in general (of which deep learning is a subset) as well as the process of finetuning a pretrained ML model. When you think of deep learning ... think neural networks.

A picture

An explanation

"Suppowe we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignemnt so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would 'learn' from its experince" - Arthur Samuel 1

Architecture vs. model

... a model is a special kind of program:it's one that can do many different things, depending > on the weights. 2

The functional form of the model is called its architecture.

Note: The architecture is "the template of the model that we’re trying to fit; i.e., the actual mathematical function that we’re passing the input data and parameters to" ... whereas the model is a particular set of parameters + the architecture.

Parameters

Weights are just variables, and a weight assignment is a particuarl choice of values for those variables. [Weights] are generally referred to as model parameters ... the term weights being reserved for a particular type of model parameter. 3

The weights are called parameters.

Note: These parameters are the things that are "learnt"; the values that can be updated, whereas activations in a neural network are simply numbers as the result of some calculation.

Inputs vs.labels

The inputs, also known as your independent variable(s) [your X] is what your model uses to make predictions. 4

The labels, also known as your dependent variable(s) [your y] represent the correct target value for your task. 5

Loss

The [model's] measure of performance is called the loss ... [the value of which depends on how well your model is able to predict] the correct labels. 6

The loss is a measure of model performance that SGD can use to make your model better. A good loss function provides good gradients (slopes) that can be used to make even very minor changes to your weights so as to improve things. Visually, you want gentle rolling hills rather than abrupt steps or jagged peaks.

Note: You can think of the loss as the model’s metric, that is, how it both understands how good it is and can help it improve.

Transfer learning

Transfer learning is the process of taking a "pretrained model" that has been trained on a very large dataset with proven SOTA results, and "fine tuning" it for your specific task, which while likely similar to the task the pretrained model was trained for to one degree or another, is not the necesarily the same.

How does it work?

  1. The head of your model (the newly added part specific to your dataset/task) should be trained first since it is the only one with completely random weights.
  2. The degree to which your weights of the pretrained model will need to be updated is proportional to how similar your data is to the data it was trained on. The more dissimilar, the more the weights will need to be changed.
  3. Your model will only be as good as the data it was trained on, so make sure what you have is representative of what it will see in the real world. It "can learn to operate on only the patterns seen in the input data used to train it."

The process of training (or fitting) the model is the process of finding a set of parameter values (or weights) that specialize that general architecture into a model that works well for our particular kind of data [and task]

What is the high-level approach in fastai?

fastai provides a fine_tune method that uses proven tricks and hyperparameters for various DL tasks that the author's have found works well most of the time. 7


What do we have at the end of training (or finetuning)?

... once the model is trained - that is, once we've chosen our final weight assignments - then we can think of the weights as being part of the model since we're not varying them anymore. 8

This means a trained model can be treated like a typical function.


1. "Chaper 1: Your Deep Learning Journey". In The Fastbook p.21

2. Ibid.

3. Ibid., pp.21-22

4. Ibid., p.22

5. Ibid.

6. Ibid.

7. Ibid., pp.32-33. Includes a full discussion on how the method works

8. Ibid., p.22