When it comes to both your inputs and targets, knowing whether they are categorial or continuous guides how you represent them, the loss function you use, and the metrics you choose in measuring performance.


What is a categorical datatype?

Categorical data "contains values that are one of a discrete set of choice" such as gender, occupation, day of week, etc... 1

What if our target is categorical?

If your target/lables are categorical, then you have either a multi-classification classification problem (e.g., you are trying to predict a single class) or a multi-label classification problem (e.g., you are trying to predict whether your example belongs to zero or multiple classes).

Multi-classification tasks

For multi-classification tasks, a sensible loss function would be cross entropy loss (nn.CrossEntropyLoss) and useful metrics are likely to include error rate, accuracy, F1, recall, and/or precision depending on your business objectices and the make up of your dataset. For example, if you're dealing with a highly imbalanced dataset, choosing accuracy would lead to an inflated sense of model performance since it may be learning to just predict the most common class.

Note: What if you need to predict "None"? This is more real world and covered nicely in Zach Mueller’s Recognizing Unknown Images (or the Unknown Label problem).

Multi-label tasks

For multi-label tasks, a sensible loss function would be binary cross entropy loss (BCE) (nn.BCEWithLogitsLoss) and useful metrics are likely to include F1, recall, and/or precision depending on your business objectices and the make up of your dataset. Notice that I didn't include error rate, or its opposite accuracy, as their datasets are generally highly imbalanced.

What if our input is categorical?

Categorical inputs are generally represented by an embedding (e.g., a vector of numbers). Why? Mostly because it gives your model the ability to provide a more complex representation of your category than a single numer would.

For example, imagine that one of your inputs is day of week (e.g., Sunday, Monday, etc.) ... what does that mean? When combined with other inputs, its likely that the meaning of it is going to be much more nuanced than a single number can represent, and so we'd like to use multiple learned numbers. This is what an embedding is.


What is a continuous datatype?

Continuous data is numerical that represents a quantity such as age, salary, prices, etc...

What if our target is continuous?

If your target/labels are continuous, then you have a regression problem and the most likely loss function you would choose would be mean-square-error loss (MSE) (nn.MSELoss) and your metric MSE as well

"... MSE is already a a useful metric for this task (although its' probably more interpretable after we take the square root)" ... the RMSE (% fn 3 %}

Note:For tasks that predict a continuous number, consider using y_range to constrain the network to predicting a value in the known range of valid values.2

What if our input is continuous?

In many cases there isn't anything special you need to do, in others, it makes sense to scale these numbers so they are in the same range (usually 0 to 1) as the rest of your continuous inputs. This process is called normalization. 4. The reason you would want to do this is so continuous values with bigger range of values (say 1000) don't drown out those with a smaller range (say 5) during model training.

Normalization

Note: "When training a model, if helps if your input data is normalizaed - that is, has a mean of 0 and a standard deviation of 1.

See How To Calculate the Mean and Standard Deviation — Normalizing Datasets in Pytorch

import torch

print('Example 1')
nums = torch.tensor([0, 50, 100], dtype=float)
print(f'Some raw values: {nums}')

# 1. calculate their mean and standard deviation
m = nums.mean()
std = nums.std()
print(f'Their mean is {m} and their standard deviation is {std}')

# 2. normalize their values 
normalized = (nums - m) / std
print(f'Here are their values after normalization: {normalized}')
print('')

print('Example 2')
nums = torch.tensor([0, 5000, 10000], dtype=float)
print(f'Some raw values: {nums}')

# 1. calculate their mean and standard deviation
m = nums.mean()
std = nums.std()
print(f'Their mean is {m} and their standard deviation is {std}')

# 2. normalize their values 
normalized = (nums - m) / std
print(f'Here are their values after normalization: {normalized}')
print('')
Example 1
Some raw values: tensor([  0.,  50., 100.], dtype=torch.float64)
Their mean is 50.0 and their standard deviation is 50.0
Here are their values after normalization: tensor([-1.,  0.,  1.], dtype=torch.float64)

Example 2
Some raw values: tensor([    0.,  5000., 10000.], dtype=torch.float64)
Their mean is 5000.0 and their standard deviation is 5000.0
Here are their values after normalization: tensor([-1.,  0.,  1.], dtype=torch.float64)

fastai supplies a Normalize transform you can use to do this ... "it acts on a whole mini-batch at once, so you can add it to the batch_tfms secion of your data block ... you need to pass to this transform the mean and standard deviation that you want to use. If you don't, "fastai will automatically calculate them from a single batch of your data). p.241

Note: "This means that when you distribute a model, you need to also distribute the statistics used for normalization." (p.242)
Important: "... if you’re using a model that someon else has trained, make sure you find out what normalization statistics they used an match them" (p.242)

1. "Chaper 1: Your Deep Learning Journey". In The Fastbook p.46

3. Ibid. p.236. A good examle of how RMSE provides a reasonable metric for regression tasks is included on this page in reference to KeyPoint detection (e.g., detecting a point/coordinate, an x and y)

2. Ibid., p.47

4. Ibid., pp.241-42, 320 includes an extended discussion of the why, how, and where "normalization" is needed.