After recently fixing some previous models that were broken (see the tl;dr section in this notebook to see where I had gone wrong previously) I decided to see if I can do the same with some other models. In this notebook I will attempt to fix my previous attempts at a 2 linear layer model using PyTorch Modules to construct the architecture and then utilising fastai’s Learner to create and train the model from this architecture.
The places where I was going wrong previously are as follows:
batch_accuracy
and rmse
were therefore edited accordingly in order to accomodate the differing shapeDataLoader
that I was creating, for our test/submission data, was not that which was expected by our Learner
’s get_preds
method when trying to make predictions on it.
test_dl = DataLoader(test_dset, batch_size=len(test_dset))
, the fix was to access the dataset
property of test_dset
rather than trying to access test_dset
directly: test_dl = DataLoader(test_dset.dataset, batch_size=len(test_dset))
Learner
’s predict
method also, but in the end the above way of using get_preds
worked.# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
# install fastkaggle if not available
try: import fastkaggle
except ModuleNotFoundError:
!pip install -Uq fastkaggle
from fastkaggle import *
comp = 'digit-recognizer'
path = setup_comp(comp, install='fastai "timm>=0.6.2.dev0"')
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /Users/gaz/.kaggle/kaggle.json'
from fastai.vision.all import *
path.ls()
(#3) [Path('digit-recognizer/test.csv'),Path('digit-recognizer/train.csv'),Path('digit-recognizer/sample_submission.csv')]
df = pd.read_csv(path/'train.csv')
df
label | pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | ... | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41996 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41997 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41998 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41999 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42000 rows × 785 columns
train_data_split = df.iloc[:33_600,:]
valid_data_split = df.iloc[33_600:,:]
len(train_data_split)/42000,len(valid_data_split)/42000
(0.8, 0.2)
pixel_value_columns = train_data_split.iloc[:,1:]
label_value_column = train_data_split.iloc[:,:1]
pixel_value_columns = pixel_value_columns.apply(lambda x: x/255)
train_data = pd.concat([label_value_column, pixel_value_columns], axis=1)
train_data.describe()
label | pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | ... | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 33600.000000 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | 33600.0 | ... | 33600.000000 | 33600.000000 | 33600.000000 | 33600.000000 | 33600.000000 | 33600.000000 | 33600.0 | 33600.0 | 33600.0 | 33600.0 |
mean | 4.459881 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000801 | 0.000454 | 0.000255 | 0.000086 | 0.000037 | 0.000007 | 0.0 | 0.0 | 0.0 | 0.0 |
std | 2.885525 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.024084 | 0.017751 | 0.013733 | 0.007516 | 0.005349 | 0.001326 | 0.0 | 0.0 | 0.0 | 0.0 |
min | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
25% | 2.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
50% | 4.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
75% | 7.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
max | 9.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.996078 | 0.996078 | 0.992157 | 0.992157 | 0.956863 | 0.243137 | 0.0 | 0.0 | 0.0 | 0.0 |
8 rows × 785 columns
pixel_value_columns_tensor = torch.tensor(train_data.iloc[:,1:].values).float()
label_value_column_tensor = torch.tensor(train_data.iloc[:,:1].values).float()
train_ds = list(zip(pixel_value_columns_tensor,label_value_column_tensor))
We’ll make this a function, so that we can do the same again for our validation data.
train_dl = DataLoader(train_ds, batch_size=256)
train_xb,train_yb = first(train_dl)
train_xb.shape,train_xb.shape
(torch.Size([256, 784]), torch.Size([256, 784]))
def dataset_from_dataframe(dframe):
pixel_value_columns = dframe.iloc[:,1:]
label_value_column = dframe.iloc[:,:1]
pixel_value_columns = pixel_value_columns.apply(lambda x: x/255)
pixel_value_columns_tensor = torch.tensor(train_data.iloc[:,1:].values).float()
label_value_column_tensor = torch.tensor(train_data.iloc[:,:1].values).float()
return list(zip(pixel_value_columns_tensor, label_value_column_tensor))
valid_ds = dataset_from_dataframe(valid_data_split)
valid_dl = DataLoader(valid_ds, batch_size=256)
To ease my mind and help spot places where I could be making errors, i’ll make a function that can visually show a particular input (digit image) to me.
def show_image(item):
item = item.view(28,28) * 255
plt.gray()
plt.imshow(item, interpolation='nearest')
plt.show()
Now, for my sanity, i’ll test an images in train_xb
.
show_image(train_xb[0])
number_of_classes = 10
def one_hot(yb):
batch_size = len(yb)
one_hot_yb = torch.zeros(batch_size, number_of_classes)
x_coordinates_array = torch.arange(len(one_hot_yb))
# used `.squeeze()` becasue yb originally has the size (batch_size, 1) and we just want a size of (batch_size). ([1, 2, 3, ...] instead of [[1], [2], [3], ...])
# used `.long()` because: "tensors used as indices must be long, int, byte or bool tensors"
y_coordinates_array = yb.squeeze().long()
# set to `1.` rather than `1` because: Index put requires the source and destination dtypes match, got Float for the destination and Long for the source.
one_hot_yb[x_coordinates_array, y_coordinates_array] = torch.tensor(1.)
return one_hot_yb.T
def rmse(a, b):
b = one_hot(b)
mse = nn.MSELoss()
loss = torch.sqrt(mse(a, b))
return loss
one_hot(train_yb),one_hot(train_yb).shape,one_hot(train_yb)[:,0],train_yb[0]
(tensor([[0., 1., 0., ..., 0., 0., 0.],
[1., 0., 1., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]),
torch.Size([10, 256]),
tensor([0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]),
tensor([1.]))
Let’s run it on our test batch predictions from above.
def calc_grad(batch_inputs, batch_labels, batch_model):
batch_preds = batch_model(batch_inputs)
loss = rmse(batch_preds, batch_labels)
loss.backward()
def train_epoch(dl, batch_model, params, lr):
for xb,yb in dl:
calc_grad(xb, yb, batch_model)
for p in params:
pdata1 = p.data
p.data -= p.grad*lr
pdata2 = p.data
p.grad.zero_()
def get_predicted_label(pred):
#returns index of highest value in tensor, which convenietnly also is directly the the digit/label that it corresponds to
return torch.argmax(pred)
get_predicted_label(torch.tensor([0,4,3,2,6,1]))
tensor(4)
def batch_accuracy(preds, yb):
#remember each column in our preds is an indivudual prediction, so we transpose preds in order to iterate through each precition in our list comprehension below
preds = torch.tensor([get_predicted_label(pred) for pred in preds.T])
# is_correct is a tensor of True and False values
is_correct = preds==yb.squeeze()
# now we turn all True values into 1 and all False values into 0, then return the mean of those values
return is_correct.float().mean()
def validate_epoch(dl, batch_model):
accuracies = [batch_accuracy(batch_model(xb),yb) for xb,yb in dl]
# turn list of tensors into one single tensor of stacked values, so that we can then calculate the mean across all those values
stacked_tensor = torch.stack(accuracies)
mean_tensor = stacked_tensor.mean()
# round method only works on value within tensor so we use item() to get it (and then round to four decimal places)
return round(mean_tensor.item(), 4)
Lets create our model architecture using PyTorch modules for our Linear
and ReLU
layers, and we then we we can take advatage of fastai’s Learner
module and SGD
(Stochastic Gradient Descent) optimiser for our training. Perhaps that will show even further improvements.
simple_net = nn.Sequential(
nn.Linear(784,30),
nn.ReLU(),
nn.Linear(30,10)
)
Note how we do not need to initialise the params manually here, we just pass the desired shapes of our params to nn.Linear
and it initialises them internally.
After a bit of experimentation with this method I noticed that the predictions/output of simple_net
(see previous notebook) were not in the same shape as with simple_nn
(without use of the Pytorch nn
Modules), rather they were the in the transposed format. For that reason I have ammended batch_accuracy
and rmse
to include the relevant pieces of data augmentation needed (see relevant comments in the code itself).
def batch_accuracy(preds, yb):
# preds no longer needs to e transposed like it was before
preds = torch.tensor([get_predicted_label(pred) for pred in preds])
# is_correct is a tensor of True and False values
is_correct = preds==yb.squeeze()
# now we turn all True values into 1 and all False values into 0, then return the mean of those values
return is_correct.float().mean()
def rmse(a, b):
# the one hot encoded labels needed transposing to match the shape of the predictions/outputs of `simple_net`
b = one_hot(b.squeeze()).T
mse = nn.MSELoss()
loss = torch.sqrt(mse(a, b))
return loss
### End of Changes Made
dls = DataLoaders(train_dl,valid_dl)
learn = Learner(dls, simple_net, opt_func=SGD,
loss_func=rmse, metrics=batch_accuracy)
Learner
contains the very handy lr_find()
method that runs through all possible lr
values and locates the ‘sweet spot’ for it’s value. This saves us having to manually triall and error different values ourselves!
learn.lr_find()
SuggestedLRs(valley=0.05754399299621582)
learn.fit(10, lr=0.01)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.306814 | 0.295378 | 0.269286 | 00:00 |
1 | 0.285192 | 0.279350 | 0.501577 | 00:00 |
2 | 0.272066 | 0.267298 | 0.623780 | 00:00 |
3 | 0.260594 | 0.256262 | 0.698631 | 00:01 |
4 | 0.250645 | 0.246955 | 0.737024 | 00:01 |
5 | 0.242500 | 0.239447 | 0.761756 | 00:00 |
6 | 0.235948 | 0.233414 | 0.777887 | 00:00 |
7 | 0.230643 | 0.228504 | 0.789018 | 00:00 |
8 | 0.226261 | 0.224414 | 0.797976 | 00:00 |
9 | 0.222562 | 0.220934 | 0.805089 | 00:00 |
So with the same lr
(learning rate) as simple_nn
we have acheived an ~81.3 % accuracy but this time in just 10 epochs.
We can load the test data directly into learn
and then use it to make our predictions on that data.
Notice how we didn’t include the final step of creating a dataloader here, like we did for our training and validation data earlier. That’s because, for some reason, we now need to use the test_dl
method on the dls
object of our learn
. I think it is something to do with how I have done this all manaully in a way that we normally wouldn’t in real practice. (for the sake of learning). For example there is the fact that I earlier used DataLoader
on the training and validation data instead of ImageDataLoaders
or DataBlock
with the appropriate input and output types declared (ImageBlock
and CategoryBlock
respectively).
The methods used below were acquired from this comment in the fastai forums.
test_df = pd.read_csv(path/'test.csv')
test_df.describe()
pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | ... | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | ... | 28000.000000 | 28000.000000 | 28000.000000 | 28000.000000 | 28000.000000 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 |
mean | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.164607 | 0.073214 | 0.028036 | 0.011250 | 0.006536 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
std | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 5.473293 | 3.616811 | 1.813602 | 1.205211 | 0.807475 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
min | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
25% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
50% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
75% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
max | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 253.000000 | 254.000000 | 193.000000 | 187.000000 | 119.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 rows × 784 columns
tense = torch.tensor(test_df.values)/255
show_image(tense[0])
# learn.predict(tense)
pixel_value_columns = torch.tensor(test_df.values)/255
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
dummy_label_value_column_tensor = torch.zeros(len(pixel_value_columns_tensor)).float()
test_list = list(zip(pixel_value_columns_tensor, dummy_label_value_column_tensor))
/var/folders/8z/yl3fjfvj4872y8z3xmr4dr2c0000gn/T/ipykernel_50580/2487244240.py:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
test_dset = DataLoader(test_list)
In my previous attempt, I was loading in the test set as a DataLoader using the following method on our Learner.
# test_dl = learn.dls.test_dl(test_dset, num_workers=0, shuffle=False)
test_dl = DataLoader(test_dset.dataset, batch_size=len(test_dset))
test_xb,test_yb = first(test_dl)
test_xb.shape,test_yb.shape
(torch.Size([28000, 784]), torch.Size([28000]))
show_image(test_xb[0]),test_yb[0]
(None, tensor(0.))
Let’s now look at the output of our model
preds = learn.get_preds(dl=test_dl)
x,y = preds
x[0],y[0]
(tensor([ 0.3201, 0.0867, 0.8190, 0.0424, -0.0170, -0.2212, 0.0148, -0.0009,
0.1567, -0.0841]),
tensor(0.))
looks like we are getting tuples as our output. The first item in each tuple, x
is likely to be the activation that corresponds to each possible class that we are classifying our inputs by (each possible digit).
The second item in each tuple, y
, I now realise is the labels that we provided in the test dataset, which were all zeros.
Let’s print a few tuple pairs to confirm this.
x
is our prediction activations now and y
is the labels we supplied which are just 0
s. Let’s convert our prediction activations into predicted labels.
get_predicted_label(x[0]),y[0],get_predicted_label(x[1]),y[1],get_predicted_label(x[2]),y[2],get_predicted_label(x[3]),y[3]
(tensor(2),
tensor(0.),
tensor(0),
tensor(0.),
tensor(9),
tensor(0.),
tensor(7),
tensor(0.))
Yep, seems we were right. And since the index of the activations corresponds to the actual values of our classes (0-9) we can just use get_predicted_label
to get our predicted labels. So let’s get our submission data ready. First, we get a list of our single value predictions, using a list comprehension.
predicted_labels = [get_predicted_label(pred).numpy() for pred in x]
predicted_labels_series = pd.Series(predicted_labels, name="Label")
predicted_labels_series
0 2
1 0
2 9
3 7
4 3
..
27995 9
27996 7
27997 3
27998 9
27999 2
Name: Label, Length: 28000, dtype: object
sample_submission = pd.read_csv(path/'sample_submission.csv')
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 0 |
3 | 4 | 0 |
4 | 5 | 0 |
... | ... | ... |
27995 | 27996 | 0 |
27996 | 27997 | 0 |
27997 | 27998 | 0 |
27998 | 27999 | 0 |
27999 | 28000 | 0 |
28000 rows × 2 columns
sample_submission['Label'] = predicted_labels_series
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 0 |
2 | 3 | 9 |
3 | 4 | 7 |
4 | 5 | 3 |
... | ... | ... |
27995 | 27996 | 9 |
27996 | 27997 | 7 |
27997 | 27998 | 3 |
27998 | 27999 | 9 |
27999 | 28000 | 2 |
28000 rows × 2 columns
sample_submission.to_csv('subm.csv', index=False)
!head subm.csv
ImageId,Label
1,2
2,0
3,9
4,7
5,3
6,7
7,0
8,3
9,0
!kaggle competitions submit -c digit-recognizer -f ./subm.csv -m "two linear layer model using PyTorch nn.Linear and nn.ReLu UPDATED"
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /Users/gaz/.kaggle/kaggle.json'
100%|█████████████████████████████████████████| 208k/208k [00:00<00:00, 281kB/s]
Successfully submitted to Digit Recognizer
This received a score of 0.80389. This is not bad but also not our best. In fact, the 2 layer linear model without the use of PyTorch’s nn
modules did better. You may have noticed that we didn’t actualy add soft max to our model architecture. Let’s try that now and see if we have any improvements.
simple_net = nn.Sequential(
nn.Linear(784,30),
nn.ReLU(),
nn.Linear(30,10),
nn.Softmax()
)
learn = Learner(dls, simple_net, opt_func=SGD,
loss_func=rmse, metrics=batch_accuracy)
learn.lr_find()
/Users/gaz/mambaforge/envs/fastbook/lib/python3.10/site-packages/torch/nn/modules/container.py:217: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
SuggestedLRs(valley=0.3019951581954956)
learn.fit(10, lr=0.1)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.299001 | 0.298298 | 0.141042 | 00:01 |
1 | 0.296652 | 0.295476 | 0.219702 | 00:01 |
2 | 0.292797 | 0.290697 | 0.356131 | 00:01 |
3 | 0.285784 | 0.282168 | 0.512232 | 00:01 |
4 | 0.274211 | 0.268486 | 0.625684 | 00:01 |
5 | 0.257071 | 0.248983 | 0.715357 | 00:01 |
6 | 0.232951 | 0.222247 | 0.755595 | 00:01 |
7 | 0.207361 | 0.198186 | 0.776190 | 00:01 |
8 | 0.187238 | 0.180095 | 0.819732 | 00:01 |
9 | 0.171898 | 0.166387 | 0.846518 | 00:01 |
So with the same number of epochs we have now acheived an ~85 % accuracy, definitely an improvement upon not using nn.Softmax
.
test_df = pd.read_csv(path/'test.csv')
test_df.describe()
pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | ... | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | ... | 28000.000000 | 28000.000000 | 28000.000000 | 28000.000000 | 28000.000000 | 28000.0 | 28000.0 | 28000.0 | 28000.0 | 28000.0 |
mean | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.164607 | 0.073214 | 0.028036 | 0.011250 | 0.006536 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
std | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 5.473293 | 3.616811 | 1.813602 | 1.205211 | 0.807475 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
min | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
25% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
50% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
75% | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
max | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 253.000000 | 254.000000 | 193.000000 | 187.000000 | 119.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 rows × 784 columns
tense = torch.tensor(test_df.values)/255
show_image(tense[0])
pixel_value_columns = torch.tensor(test_df.values)/255
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
dummy_label_value_column_tensor = torch.zeros(len(pixel_value_columns_tensor)).float()
test_list = list(zip(pixel_value_columns_tensor, dummy_label_value_column_tensor))
/var/folders/8z/yl3fjfvj4872y8z3xmr4dr2c0000gn/T/ipykernel_50580/2487244240.py:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
test_dset = DataLoader(test_list)
test_dl = DataLoader(test_dset.dataset, batch_size=len(test_dset))
test_xb,test_yb = first(test_dl)
test_xb.shape,test_yb.shape
(torch.Size([28000, 784]), torch.Size([28000]))
show_image(test_xb[0]),test_yb[0]
(None, tensor(0.))
Let’s now look at the output of our model
preds = learn.get_preds(dl=test_dl)
x,y = preds
x[0],y[0]
/Users/gaz/mambaforge/envs/fastbook/lib/python3.10/site-packages/torch/nn/modules/container.py:217: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
(tensor([1.1154e-02, 8.0940e-05, 9.7530e-01, 7.2095e-04, 6.1947e-04, 1.4843e-03,
4.3284e-03, 4.7975e-04, 5.3718e-03, 4.6172e-04]),
tensor(0.))
Let’s confirm that softmax is doing what it is supposed to do. Our 10 activations, per image, should now add up to 1.
sum(x[0])
tensor(1.0000)
Looks good to go, let’s continue with our submission.
predicted_labels = [get_predicted_label(pred).numpy() for pred in x]
predicted_labels_series = pd.Series(predicted_labels, name="Label")
predicted_labels_series
0 2
1 0
2 9
3 7
4 2
..
27995 9
27996 7
27997 3
27998 9
27999 2
Name: Label, Length: 28000, dtype: object
sample_submission = pd.read_csv(path/'sample_submission.csv')
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 0 |
3 | 4 | 0 |
4 | 5 | 0 |
... | ... | ... |
27995 | 27996 | 0 |
27996 | 27997 | 0 |
27997 | 27998 | 0 |
27998 | 27999 | 0 |
27999 | 28000 | 0 |
28000 rows × 2 columns
sample_submission['Label'] = predicted_labels_series
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 0 |
2 | 3 | 9 |
3 | 4 | 7 |
4 | 5 | 2 |
... | ... | ... |
27995 | 27996 | 9 |
27996 | 27997 | 7 |
27997 | 27998 | 3 |
27998 | 27999 | 9 |
27999 | 28000 | 2 |
28000 rows × 2 columns
sample_submission.to_csv('subm.csv', index=False)
!head subm.csv
ImageId,Label
1,2
2,0
3,9
4,7
5,2
6,7
7,0
8,3
9,0
!kaggle competitions submit -c digit-recognizer -f ./subm.csv -m "two linear layer model using PyTorch nn.Linear and nn.ReLu with nn.Softmax UPDATED"
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /Users/gaz/.kaggle/kaggle.json'
100%|█████████████████████████████████████████| 208k/208k [00:00<00:00, 262kB/s]
Successfully submitted to Digit Recognizer
This received a score of 0.8485 an improvement upon the previous model by including softmax, as expected. The reason these two models didn’t get as good as the model that didn’t use PyTorch’s nn
modules is likely becuase we used 500 epochs to train that one, whereas we only used 10 epochs to train the ones in this notebook. I didn’t want to risk overfitting but let’s try a few more epochs.
learn.fit(10, lr=0.1)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.159544 | 0.156210 | 0.862946 | 00:01 |
1 | 0.151786 | 0.148674 | 0.872202 | 00:01 |
2 | 0.145487 | 0.142968 | 0.879107 | 00:00 |
3 | 0.140583 | 0.138521 | 0.884732 | 00:01 |
4 | 0.136697 | 0.134954 | 0.888958 | 00:00 |
5 | 0.133540 | 0.132022 | 0.893631 | 00:01 |
6 | 0.130916 | 0.129555 | 0.896518 | 00:01 |
7 | 0.128688 | 0.127443 | 0.899315 | 00:01 |
8 | 0.126764 | 0.125600 | 0.901488 | 00:01 |
9 | 0.125074 | 0.123970 | 0.903036 | 00:01 |
/Users/gaz/mambaforge/envs/fastbook/lib/python3.10/site-packages/torch/nn/modules/container.py:217: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
So now after 20 epochs we have the same accuracy on our training data that we did on the 2 layer linear model that didn’t use PyTorch modules or fastai.
test_df = pd.read_csv(path/'test.csv')
tense = torch.tensor(test_df.values)/255
pixel_value_columns = torch.tensor(test_df.values)/255
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
dummy_label_value_column_tensor = torch.zeros(len(pixel_value_columns_tensor)).float()
test_list = list(zip(pixel_value_columns_tensor, dummy_label_value_column_tensor))
/var/folders/8z/yl3fjfvj4872y8z3xmr4dr2c0000gn/T/ipykernel_50580/2487244240.py:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
pixel_value_columns_tensor = torch.tensor(pixel_value_columns).float()
test_dset = DataLoader(test_list)
test_dl = DataLoader(test_dset.dataset, batch_size=len(test_dset))
test_xb,test_yb = first(test_dl)
test_xb.shape,test_yb.shape
(torch.Size([28000, 784]), torch.Size([28000]))
Let’s now look at the output of our model
preds = learn.get_preds(dl=test_dl)
x,y = preds
x[0],y[0]
/Users/gaz/mambaforge/envs/fastbook/lib/python3.10/site-packages/torch/nn/modules/container.py:217: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
(tensor([2.0435e-04, 6.7201e-08, 9.9963e-01, 2.3687e-05, 1.7624e-07, 1.5268e-06,
1.7885e-05, 1.3268e-06, 1.1589e-04, 2.1377e-06]),
tensor(0.))
sum(x[0])
tensor(1.)
predicted_labels = [get_predicted_label(pred).numpy() for pred in x]
predicted_labels_series = pd.Series(predicted_labels, name="Label")
sample_submission = pd.read_csv(path/'sample_submission.csv')
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 0 |
3 | 4 | 0 |
4 | 5 | 0 |
... | ... | ... |
27995 | 27996 | 0 |
27996 | 27997 | 0 |
27997 | 27998 | 0 |
27998 | 27999 | 0 |
27999 | 28000 | 0 |
28000 rows × 2 columns
sample_submission['Label'] = predicted_labels_series
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 0 |
2 | 3 | 9 |
3 | 4 | 9 |
4 | 5 | 2 |
... | ... | ... |
27995 | 27996 | 9 |
27996 | 27997 | 7 |
27997 | 27998 | 3 |
27998 | 27999 | 9 |
27999 | 28000 | 2 |
28000 rows × 2 columns
sample_submission.to_csv('subm.csv', index=False)
!head subm.csv
ImageId,Label
1,2
2,0
3,9
4,9
5,2
6,7
7,0
8,3
9,0
!kaggle competitions submit -c digit-recognizer -f ./subm.csv -m "two linear layer model using PyTorch nn.Linear and nn.ReLu with nn.Softmax UPDATED"
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /Users/gaz/.kaggle/kaggle.json'
100%|█████████████████████████████████████████| 208k/208k [00:00<00:00, 276kB/s]
Successfully submitted to Digit Recognizer
This got a score of 0.90078. Almost the same now as what we got (0.90142) after 500 epochs of training our 2 layer linear model using more manual PyTorch methods. There is an even more effective training method called 1cycle training that we will investigate below.
Lets try the exact same model but now instead of using the fit
methon on our Learner object, we will use fit_one_cycle
.
So with fit_one_cycle
instead of using a static learning rate, we actually have it as being dynamic over the course of the epoch:
By training with higher learning rates (in between start and end), we:
This type of training is called 1cycle training.
We will have to reinitialise our Learner.
learn = Learner(dls, simple_net, opt_func=SGD,
loss_func=rmse, metrics=batch_accuracy)
learn.fit_one_cycle(1)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.124147 | 0.123946 | 0.903095 | 00:01 |
/Users/gaz/mambaforge/envs/fastbook/lib/python3.10/site-packages/torch/nn/modules/container.py:217: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
~90% accuracy in just one epoch! the efficiency of training has improved even further this time.
Let’s use this model to make predictions on the test data and make another submission to the competition.
x,y = learn.get_preds(dl=test_dl)
predicted_labels = [get_predicted_label(pred).numpy() for pred in x]
predicted_labels_series = pd.Series(predicted_labels, name="Label")
sample_submission = pd.read_csv(path/'sample_submission.csv')
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 0 |
3 | 4 | 0 |
4 | 5 | 0 |
... | ... | ... |
27995 | 27996 | 0 |
27996 | 27997 | 0 |
27997 | 27998 | 0 |
27998 | 27999 | 0 |
27999 | 28000 | 0 |
28000 rows × 2 columns
sample_submission['Label'] = predicted_labels_series
sample_submission
ImageId | Label | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 0 |
2 | 3 | 9 |
3 | 4 | 9 |
4 | 5 | 2 |
... | ... | ... |
27995 | 27996 | 9 |
27996 | 27997 | 7 |
27997 | 27998 | 3 |
27998 | 27999 | 9 |
27999 | 28000 | 2 |
28000 rows × 2 columns
sample_submission.to_csv('subm.csv', index=False)
!head subm.csv
ImageId,Label
1,2
2,0
3,9
4,9
5,2
6,7
7,0
8,3
9,0
!kaggle competitions submit -c digit-recognizer -f ./subm.csv -m "two linear layer model using PyTorch nn.Linear and nn.ReLu with nn.Softmax with fit_one_cycle instead of fit (1cycle training)"
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /Users/gaz/.kaggle/kaggle.json'
100%|█████████████████████████████████████████| 208k/208k [00:00<00:00, 320kB/s]
Successfully submitted to Digit Recognizer
predictions_list = [int(pred.numpy()[0]) for pred in y]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[165], line 1
----> 1 predictions_list = [int(pred.numpy()[0]) for pred in y]
Cell In[165], line 1, in <listcomp>(.0)
----> 1 predictions_list = [int(pred.numpy()[0]) for pred in y]
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
pred_labels = pd.Series(predictions_list, name="Label")
pred_labels
sample_submission = pd.read_csv(path/'sample_submission.csv')
sample_submission
sample_submission.to_csv('subm.csv', index=False)
!head subm.csv
sample_submission['Label'] = pred_labels
sample_submission
!kaggle competitions submit -c digit-recognizer -f ./subm.csv -m "two linear layer model using PyTorch nn.Linear and nn.ReLu and nn.Softmax, trained for 10 epochs."
A score of 0.90078 once again but this time with just one epoch of training. 1cycle training is clearly a very efficient way to train models.
Previously I have learnt that looking at how our data looks, along each step of our processing is very important, this notebook has reinforced that and also stressed the fast that this is just as important for the test/submission data as it is for our training data. I’ve also learnt that there are multiple methods on Learner
that we can use for making predictions and that I don’t quite yet understand which is better of if there are select specific reasons to use each. My suspicion right now, due to some thing’s I read whilst trying to debug my issues in prediction making, is that .predict()
was a fastai version one way that is now succeeded by .get_preds()
which is a fastai version 2 way. I could be completely wrong about this and more reading needs to be done. But for now I am happy that I finally managed to get a result out of the above models.
Next I may try and do the same thing I have in this notebook but with convolutional layers instead of the linear layers (nn.Conv2d
instead of nn.Linear
). This will be done in a separate notebook.