I had an image that i was using as my profile image on certain social media. This image was actually a small crop from an original full image. Since using this crop I had never thought to make a note of where the full image is. I did have a few old folders of images that mainly needed deleting (whatsapp media) but i know the full image was sent to me and may indeed be somewhere in one of these folders.
That’s when it occured to me that rather than go through the 1000’s of images one by one, I could surely just develop an image recognition model that would be able to detect any image that includes the crop image that I was using as a profile picture.
I had just finished going through chapter four of ‘Deep Learning for Coders with fastai and Pytorch’ by Jeremy Howard and Sylvian Gugger. In this chapter it is stated that a single nonlinearity with, two linear layers, is enough to approximate any function. With this in mind I thought why not give such a model the problem I am trying to solve with finding my full image.
As you can see in my last blog post I created an image classifier model, with very minimal effort, using fastai and PyTorch. When doing this I saw that there are many, pre-trained, models readily available for me to use and it wouldn’t take much effort in terms of writing up the code at all. The same pre-trained model could be chosen, and then further trained on my specific data in order to fulfil my specific purpose.
Perhaps the hardest part in both methods, is preparing the data for the model to be trained with. I decided what I would do is get a set of my own photos. The original copies would all be negatives. But then I will make a copy of them all and all the copies will have the crop image that I am trying to detect, added to them. these edited copies will be the positives.
Both methods produced a model that was not able to fulfill the purpose I wanted to use it for. I concluded that I would need to study further in the fastai course and read further in the book.
I began by installing and importing the necessary packages.
#hide
# !mamba install -c fastchan fastai
# !mamba install -c fastchan nbdev
from fastai.vision.all import *
import pathlib
Then I proceeded to import my image data and get it into a format that is ready to be utilised easily by my models. The format used is actually one that fastai uses (see DataLoaders
further below).
Note: in the code block below,
get_image_files(path/'training/negative')
does the same thing that(path/'training/negative').ls()
also does.
path = pathlib.Path().resolve()
training_negatives = get_image_files(path/'training/negative')
training_positives = get_image_files(path/'training/positive')
Since all images are of different sizes, this is not good for a model. We must normalise them to make them all uniform. I opted to resize them all to 128 pixels by 128 pixels.
im = Image.open(training_positives[2]).resize((128,128))
The shape of im_tens
below shows that each image is 3 lots of a 128 by 128 matrix of pixels. That’s one matrix for each color R, G, and B
im_tens = tensor(im)
im_tens.shape
torch.Size([128, 128, 3])
Let’s plot, for each of the three colours, the shades for each pixel value, for a subsection of the image.
df_red = pd.DataFrame(im_tens[87:112,99:119,0])
df_red.style.set_properties(**{'font-size':'6pt'}).background_gradient('Reds')
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 255 | 236 | 230 | 251 | 254 | 252 | 244 | 229 | 181 | 108 | 62 | 45 | 77 | 128 | 123 | 78 | 49 | 47 | 52 | 49 |
1 | 226 | 234 | 254 | 255 | 251 | 238 | 214 | 143 | 79 | 53 | 61 | 112 | 125 | 76 | 42 | 44 | 53 | 49 | 38 | 32 |
2 | 240 | 255 | 250 | 245 | 229 | 201 | 122 | 69 | 45 | 79 | 127 | 87 | 36 | 41 | 55 | 49 | 38 | 33 | 33 | 28 |
3 | 213 | 230 | 243 | 234 | 193 | 104 | 66 | 46 | 92 | 112 | 59 | 84 | 137 | 95 | 38 | 33 | 33 | 30 | 24 | 17 |
4 | 136 | 143 | 150 | 139 | 106 | 62 | 43 | 91 | 97 | 42 | 78 | 189 | 199 | 152 | 51 | 38 | 44 | 41 | 35 | 29 |
5 | 145 | 152 | 107 | 59 | 60 | 38 | 85 | 84 | 45 | 51 | 128 | 159 | 144 | 106 | 47 | 48 | 60 | 66 | 68 | 61 |
6 | 148 | 122 | 59 | 59 | 39 | 74 | 85 | 45 | 50 | 65 | 135 | 129 | 115 | 72 | 51 | 58 | 68 | 79 | 82 | 84 |
7 | 139 | 73 | 64 | 41 | 62 | 83 | 44 | 50 | 60 | 105 | 162 | 134 | 115 | 64 | 63 | 72 | 83 | 96 | 98 | 94 |
8 | 92 | 59 | 39 | 48 | 95 | 68 | 83 | 131 | 169 | 179 | 159 | 144 | 103 | 52 | 66 | 73 | 79 | 88 | 97 | 103 |
9 | 60 | 43 | 34 | 100 | 140 | 172 | 179 | 180 | 154 | 163 | 143 | 128 | 77 | 48 | 59 | 63 | 56 | 46 | 45 | 54 |
10 | 46 | 28 | 86 | 96 | 156 | 152 | 139 | 135 | 146 | 138 | 120 | 107 | 116 | 179 | 166 | 96 | 40 | 31 | 35 | 44 |
11 | 25 | 68 | 95 | 81 | 156 | 142 | 133 | 122 | 137 | 119 | 104 | 95 | 128 | 159 | 155 | 140 | 83 | 69 | 78 | 85 |
12 | 43 | 115 | 52 | 107 | 148 | 136 | 123 | 108 | 141 | 120 | 105 | 82 | 110 | 133 | 124 | 126 | 162 | 126 | 112 | 123 |
13 | 104 | 75 | 67 | 145 | 140 | 136 | 117 | 105 | 134 | 115 | 100 | 92 | 115 | 123 | 119 | 108 | 140 | 135 | 90 | 95 |
14 | 111 | 42 | 70 | 145 | 144 | 134 | 124 | 127 | 132 | 109 | 96 | 113 | 111 | 116 | 112 | 105 | 132 | 125 | 68 | 68 |
15 | 59 | 43 | 64 | 135 | 140 | 132 | 131 | 137 | 132 | 109 | 97 | 110 | 118 | 120 | 112 | 113 | 130 | 116 | 58 | 52 |
16 | 36 | 37 | 58 | 118 | 121 | 115 | 117 | 132 | 131 | 113 | 103 | 111 | 128 | 121 | 114 | 126 | 125 | 112 | 74 | 52 |
17 | 40 | 35 | 47 | 96 | 87 | 86 | 98 | 121 | 117 | 105 | 101 | 105 | 125 | 116 | 106 | 115 | 117 | 101 | 62 | 42 |
18 | 37 | 34 | 37 | 85 | 82 | 79 | 82 | 88 | 78 | 75 | 73 | 75 | 83 | 79 | 72 | 76 | 82 | 81 | 43 | 31 |
19 | 30 | 27 | 33 | 70 | 88 | 79 | 71 | 72 | 64 | 61 | 63 | 57 | 59 | 59 | 56 | 57 | 61 | 74 | 47 | 29 |
20 | 26 | 19 | 40 | 73 | 93 | 81 | 76 | 72 | 63 | 61 | 66 | 59 | 59 | 61 | 64 | 67 | 68 | 62 | 43 | 38 |
21 | 25 | 35 | 61 | 41 | 101 | 94 | 82 | 74 | 66 | 68 | 66 | 65 | 64 | 65 | 70 | 68 | 64 | 39 | 34 | 41 |
22 | 33 | 66 | 33 | 52 | 111 | 98 | 86 | 78 | 71 | 69 | 66 | 67 | 67 | 68 | 68 | 63 | 40 | 28 | 37 | 34 |
23 | 108 | 108 | 28 | 51 | 99 | 103 | 88 | 76 | 70 | 67 | 68 | 72 | 70 | 64 | 59 | 54 | 25 | 17 | 33 | 36 |
24 | 103 | 97 | 60 | 61 | 84 | 89 | 85 | 82 | 79 | 80 | 83 | 88 | 99 | 101 | 94 | 98 | 87 | 75 | 85 | 90 |
df_green = pd.DataFrame(im_tens[87:112,99:119,1])
df_green.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greens')
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 255 | 230 | 221 | 241 | 242 | 240 | 232 | 219 | 171 | 97 | 55 | 41 | 72 | 117 | 114 | 70 | 40 | 39 | 44 | 43 |
1 | 220 | 229 | 249 | 244 | 239 | 227 | 204 | 133 | 70 | 41 | 51 | 106 | 115 | 67 | 36 | 41 | 46 | 40 | 31 | 26 |
2 | 229 | 245 | 238 | 233 | 217 | 193 | 112 | 59 | 39 | 70 | 116 | 75 | 31 | 35 | 45 | 40 | 30 | 25 | 25 | 23 |
3 | 204 | 219 | 232 | 222 | 181 | 97 | 58 | 37 | 82 | 103 | 48 | 58 | 99 | 70 | 31 | 27 | 25 | 20 | 12 | 10 |
4 | 132 | 134 | 141 | 130 | 101 | 59 | 34 | 83 | 86 | 36 | 51 | 133 | 137 | 101 | 34 | 22 | 23 | 21 | 19 | 15 |
5 | 143 | 148 | 101 | 48 | 54 | 33 | 75 | 76 | 34 | 42 | 85 | 108 | 92 | 63 | 26 | 25 | 32 | 36 | 38 | 35 |
6 | 145 | 118 | 53 | 53 | 31 | 63 | 73 | 34 | 42 | 49 | 86 | 81 | 72 | 41 | 23 | 28 | 35 | 38 | 43 | 44 |
7 | 136 | 65 | 56 | 34 | 51 | 74 | 35 | 39 | 47 | 69 | 111 | 88 | 72 | 35 | 33 | 40 | 46 | 51 | 54 | 52 |
8 | 83 | 54 | 33 | 40 | 82 | 54 | 59 | 91 | 113 | 127 | 115 | 100 | 67 | 30 | 34 | 40 | 46 | 51 | 60 | 65 |
9 | 52 | 37 | 27 | 85 | 98 | 125 | 126 | 115 | 90 | 114 | 96 | 83 | 46 | 25 | 28 | 33 | 29 | 22 | 23 | 31 |
10 | 40 | 21 | 79 | 72 | 105 | 101 | 82 | 72 | 94 | 88 | 73 | 63 | 75 | 141 | 126 | 63 | 16 | 15 | 20 | 17 |
11 | 17 | 61 | 87 | 53 | 107 | 93 | 78 | 65 | 88 | 71 | 56 | 52 | 78 | 105 | 102 | 95 | 44 | 29 | 37 | 46 |
12 | 34 | 105 | 40 | 71 | 95 | 85 | 71 | 59 | 87 | 74 | 60 | 46 | 65 | 76 | 69 | 79 | 120 | 78 | 60 | 72 |
13 | 96 | 69 | 49 | 100 | 90 | 84 | 66 | 58 | 85 | 65 | 56 | 55 | 66 | 65 | 67 | 62 | 92 | 87 | 45 | 51 |
14 | 102 | 36 | 50 | 93 | 90 | 84 | 76 | 81 | 84 | 63 | 54 | 69 | 59 | 62 | 63 | 60 | 83 | 77 | 38 | 33 |
15 | 50 | 36 | 44 | 82 | 88 | 83 | 81 | 88 | 83 | 62 | 53 | 66 | 66 | 68 | 64 | 66 | 81 | 71 | 32 | 24 |
16 | 33 | 35 | 41 | 71 | 71 | 66 | 68 | 83 | 82 | 64 | 57 | 66 | 78 | 70 | 66 | 79 | 74 | 67 | 46 | 29 |
17 | 34 | 33 | 34 | 62 | 46 | 45 | 52 | 72 | 67 | 56 | 55 | 61 | 73 | 63 | 60 | 68 | 69 | 60 | 33 | 20 |
18 | 32 | 28 | 23 | 56 | 48 | 43 | 43 | 48 | 36 | 36 | 37 | 38 | 40 | 38 | 37 | 39 | 43 | 48 | 22 | 16 |
19 | 29 | 19 | 13 | 45 | 55 | 42 | 36 | 36 | 28 | 27 | 28 | 28 | 28 | 28 | 25 | 28 | 32 | 41 | 28 | 13 |
20 | 22 | 10 | 20 | 47 | 59 | 46 | 39 | 35 | 29 | 29 | 29 | 29 | 29 | 29 | 30 | 33 | 36 | 36 | 25 | 19 |
21 | 12 | 17 | 40 | 19 | 63 | 55 | 45 | 37 | 29 | 34 | 29 | 29 | 31 | 30 | 32 | 32 | 34 | 18 | 15 | 21 |
22 | 18 | 43 | 17 | 23 | 67 | 59 | 51 | 39 | 33 | 35 | 29 | 31 | 32 | 32 | 33 | 34 | 19 | 9 | 14 | 14 |
23 | 77 | 82 | 14 | 23 | 54 | 57 | 51 | 38 | 33 | 32 | 31 | 35 | 34 | 30 | 31 | 30 | 11 | 1 | 11 | 13 |
24 | 77 | 75 | 43 | 41 | 54 | 59 | 57 | 54 | 53 | 53 | 55 | 59 | 67 | 76 | 77 | 78 | 69 | 62 | 67 | 68 |
df_blue = pd.DataFrame(im_tens[87:112,99:119,2])
df_red.style.set_properties(**{'font-size':'6pt'}).background_gradient('Blues')
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 255 | 236 | 230 | 251 | 254 | 252 | 244 | 229 | 181 | 108 | 62 | 45 | 77 | 128 | 123 | 78 | 49 | 47 | 52 | 49 |
1 | 226 | 234 | 254 | 255 | 251 | 238 | 214 | 143 | 79 | 53 | 61 | 112 | 125 | 76 | 42 | 44 | 53 | 49 | 38 | 32 |
2 | 240 | 255 | 250 | 245 | 229 | 201 | 122 | 69 | 45 | 79 | 127 | 87 | 36 | 41 | 55 | 49 | 38 | 33 | 33 | 28 |
3 | 213 | 230 | 243 | 234 | 193 | 104 | 66 | 46 | 92 | 112 | 59 | 84 | 137 | 95 | 38 | 33 | 33 | 30 | 24 | 17 |
4 | 136 | 143 | 150 | 139 | 106 | 62 | 43 | 91 | 97 | 42 | 78 | 189 | 199 | 152 | 51 | 38 | 44 | 41 | 35 | 29 |
5 | 145 | 152 | 107 | 59 | 60 | 38 | 85 | 84 | 45 | 51 | 128 | 159 | 144 | 106 | 47 | 48 | 60 | 66 | 68 | 61 |
6 | 148 | 122 | 59 | 59 | 39 | 74 | 85 | 45 | 50 | 65 | 135 | 129 | 115 | 72 | 51 | 58 | 68 | 79 | 82 | 84 |
7 | 139 | 73 | 64 | 41 | 62 | 83 | 44 | 50 | 60 | 105 | 162 | 134 | 115 | 64 | 63 | 72 | 83 | 96 | 98 | 94 |
8 | 92 | 59 | 39 | 48 | 95 | 68 | 83 | 131 | 169 | 179 | 159 | 144 | 103 | 52 | 66 | 73 | 79 | 88 | 97 | 103 |
9 | 60 | 43 | 34 | 100 | 140 | 172 | 179 | 180 | 154 | 163 | 143 | 128 | 77 | 48 | 59 | 63 | 56 | 46 | 45 | 54 |
10 | 46 | 28 | 86 | 96 | 156 | 152 | 139 | 135 | 146 | 138 | 120 | 107 | 116 | 179 | 166 | 96 | 40 | 31 | 35 | 44 |
11 | 25 | 68 | 95 | 81 | 156 | 142 | 133 | 122 | 137 | 119 | 104 | 95 | 128 | 159 | 155 | 140 | 83 | 69 | 78 | 85 |
12 | 43 | 115 | 52 | 107 | 148 | 136 | 123 | 108 | 141 | 120 | 105 | 82 | 110 | 133 | 124 | 126 | 162 | 126 | 112 | 123 |
13 | 104 | 75 | 67 | 145 | 140 | 136 | 117 | 105 | 134 | 115 | 100 | 92 | 115 | 123 | 119 | 108 | 140 | 135 | 90 | 95 |
14 | 111 | 42 | 70 | 145 | 144 | 134 | 124 | 127 | 132 | 109 | 96 | 113 | 111 | 116 | 112 | 105 | 132 | 125 | 68 | 68 |
15 | 59 | 43 | 64 | 135 | 140 | 132 | 131 | 137 | 132 | 109 | 97 | 110 | 118 | 120 | 112 | 113 | 130 | 116 | 58 | 52 |
16 | 36 | 37 | 58 | 118 | 121 | 115 | 117 | 132 | 131 | 113 | 103 | 111 | 128 | 121 | 114 | 126 | 125 | 112 | 74 | 52 |
17 | 40 | 35 | 47 | 96 | 87 | 86 | 98 | 121 | 117 | 105 | 101 | 105 | 125 | 116 | 106 | 115 | 117 | 101 | 62 | 42 |
18 | 37 | 34 | 37 | 85 | 82 | 79 | 82 | 88 | 78 | 75 | 73 | 75 | 83 | 79 | 72 | 76 | 82 | 81 | 43 | 31 |
19 | 30 | 27 | 33 | 70 | 88 | 79 | 71 | 72 | 64 | 61 | 63 | 57 | 59 | 59 | 56 | 57 | 61 | 74 | 47 | 29 |
20 | 26 | 19 | 40 | 73 | 93 | 81 | 76 | 72 | 63 | 61 | 66 | 59 | 59 | 61 | 64 | 67 | 68 | 62 | 43 | 38 |
21 | 25 | 35 | 61 | 41 | 101 | 94 | 82 | 74 | 66 | 68 | 66 | 65 | 64 | 65 | 70 | 68 | 64 | 39 | 34 | 41 |
22 | 33 | 66 | 33 | 52 | 111 | 98 | 86 | 78 | 71 | 69 | 66 | 67 | 67 | 68 | 68 | 63 | 40 | 28 | 37 | 34 |
23 | 108 | 108 | 28 | 51 | 99 | 103 | 88 | 76 | 70 | 67 | 68 | 72 | 70 | 64 | 59 | 54 | 25 | 17 | 33 | 36 |
24 | 103 | 97 | 60 | 61 | 84 | 89 | 85 | 82 | 79 | 80 | 83 | 88 | 99 | 101 | 94 | 98 | 87 | 75 | 85 | 90 |
Here there are 2 options:
Since I am simply looking for the same occurence of the same sub image within each image, I opted for taking the average across all three colours. This would save on training time and computing power needed.
Some operations in PyTorch, like taking the mean, require us to cast our interger types to float types. We can do that here too.
Casting in PyTorch is as simple as typing the name of the type you wish to cast to, and treating it as a method (in this case float)
def resizeImageAndGetMeanAcrossAllColours(img):
resized = tensor(Image.open(img).resize((128,128)))
return resized.float().mean(2)
negative_tensors = [resizeImageAndGetMeanAcrossAllColours(o) for o in training_negatives]
positive_tensors = [resizeImageAndGetMeanAcrossAllColours(o) for o in training_positives]
negative_tensors[0].shape,positive_tensors[0].shape,len(negative_tensors),len(positive_tensors)
(torch.Size([128, 128]), torch.Size([128, 128]), 10, 10)
negative_tensors
and positive_tensors
are currently just lists of tensors, made by using a list comprehension. We will now create a single rank-3 tensor out of each list of tensors by ‘stacking’ each item within each list.
Generally, when images are floats, the pixel values are expected to be between 0 and 1, so I divide by 255 here (the highest value that any individual pixel can have)
stacked_negatives = torch.stack(negative_tensors)/255
stacked_positives = torch.stack(positive_tensors)/255
stacked_positives.shape,stacked_negatives.shape
(torch.Size([10, 128, 128]), torch.Size([10, 128, 128]))
We can now begin to get our data ready to load.
We will concatenate thew negative and positive tensors. then we use view
to change the shape of the tensor without changing its’ contents. We want a list of vectors (a rank-2 tensor) instead of a list of matrices (a rank-3 tensor). The -1
, passed to view
, tells it to “make that axis as big as neccessary” in order to fit all the data.
train_x = torch.cat([stacked_negatives, stacked_positives]).view(-1,128*128)
train_x.shape
torch.Size([20, 16384])
We will also need a label for each image, will use 0 for negatives and 1 for positives.
Note that we use unsqueeze
to insert a dimension of size 1 at the specified position. example from the docs:
x = torch.tensor([1, 2, 3, 4])
torch.unsqueeze(x, 0)
tensor([[ 1, 2, 3, 4]])
torch.unsqueeze(x, 1)
tensor([[ 1],
[ 2],
[ 3],
[ 4]])
This is so that both train_x
and train_y
have a shape that corresponds to each other.
train_y = tensor([0]*len(stacked_negatives) + [1]*len(stacked_positives)).unsqueeze(1)
train_x.shape, train_y.shape, tensor([0]*len(stacked_negatives) + [1]*len(stacked_positives)).shape
(torch.Size([20, 16384]), torch.Size([20, 1]), torch.Size([20]))
For fastai, a Dataset needs to return a tuple of independent and dependent variable (x,y), when indexed.
Python’s zip
combined with list
provides, a simple way to get this functionality
training_dset = list(zip(train_x,train_y))
x,y = training_dset[0]
x.shape,y
(torch.Size([16384]), tensor([0]))
So now we have a training data set, lets create a validation data set as well.
validation_negatives = get_image_files(path/'validation/negative')
validation_positives = get_image_files(path/'validation/positive')
valid_negative_tensors = [resizeImageAndGetMeanAcrossAllColours(o) for o in validation_negatives]
valid_positive_tensors = [resizeImageAndGetMeanAcrossAllColours(o) for o in validation_positives]
stacked_valid_negatives = torch.stack(valid_negative_tensors)/255
stacked_valid_positives = torch.stack(valid_positive_tensors)/255
valid_x = torch.cat([stacked_valid_negatives, stacked_valid_positives]).view(-1,128*128)
valid_y = tensor([0]*len(stacked_valid_negatives) + [1]*len(stacked_valid_positives)).unsqueeze(1)
valid_dset = list(zip(valid_x,valid_y))
x,y = valid_dset[-1]
x.shape,y
(torch.Size([16384]), tensor([1]))
Datasets are fed in to DataLoader in order to create a collection of mini batches.
training_dl = DataLoader(training_dset, batch_size=2, shuffle=True)
valid_dl = DataLoader(valid_dset, batch_size=2, shuffle=True)
list(training_dl)[0]
(tensor([[0.4706, 0.5137, 0.6092, ..., 0.5804, 0.5804, 0.5856],
[0.5830, 0.5895, 0.5974, ..., 0.2902, 0.3007, 0.5137]]),
tensor([[1],
[0]]))
We can now use DataLoaders
as a wrapper for our training and validation loaders. We do this as this is the format we need them in in order to pass then to fastai’s Learner (see further below).
dls = DataLoaders(training_dl,valid_dl)
Lets first see how we would do predictions if we were using a simple linear model.
Note that params are initialised using torch.Tensor.requires_grad_(). this tells PyTorch that we will want gradients to be calculated with respect to these params.
Weights, and bias, will also be initialised as random values, to be altered when training.
def init_params(size, multiplier=1.0): return (torch.randn(size)*multiplier).requires_grad_()
weights = init_params(128*128,1)
bias = init_params(1)
We can use @
operator for multiplying each vector in xb matrix by weights, rather than doing a for
loop over the matrix (as this is very slow).
def linear_model(xb): return xb@weights + bias
preds = linear_model(train_x)
preds
tensor([-17.7085, -23.1593, 39.8399, 23.0836, -14.8054, -15.5319, -8.1738,
-17.4093, -28.8678, -3.7090, -28.3204, 16.7366, -4.1800, -21.9037,
-8.2536, -12.8513, -9.2549, -13.8841, 15.4846, -16.7070],
grad_fn=<AddBackward0>)
We know our model is likely to be more complex than a single linear function. We also know that a single nonlinearity with two linear layers is enough to approximate any function (it can be mathematically proven that such a setup can solve any computable problem to an arbitrarily high level of accuracy). For now we will do just the bare minimum and create a model as such.
without using PyTorch or fastai, we would perhaps create a model like so:
Note that
torch.max
will directly return anything above the value of the argument passed to it, otherwise it will just pass the argument passed to it. In other words any negative values are converted to zero’s. This is the nonlinearity that we are adding
def simple_net(x):
res = x@w1 + b1
res = res.max(tensor(0.0))
res = res@w2 + b2
return res
w1 = init_params(128*128,5)
b1 = init_params(5)
w2 = init_params(5,1)
b2 = init_params(1)
In the linear model, the sidebar above, we are able to run our model on batches like so:
def linear_model(xb): return xb@weights + bias
but if we were to do the same with our simple_net
we would soon run into an issue.
This is because the first line res = x@w1 + b1
will do a matrix multiplication of each item in the batch, against each 128x128 matrix in w1
.
The third line res = res@w2 + b2
would take all 5 of the outputs from the second and first lines, but it would return just a single value.
So for all the images in a batch we receive one prediction?
No. This model is actually meant to be run on one image at a time. The number of 128*128 matrices in w1
is actually just a way of adding ‘hidden layers’ within the first layer. So what happens is we receive a prediction for the image, for each matrix we put in w1
. The second linear function (the third line) is therefore, in a way, finding a way to select the stronges prediction produced by the first linear function. We can choose any number of matrices for w1
with varying levels of accuracy.
In order to run the model on batches, I created a function that will run it in batches and produce a tensor of the results.
NOTE: I deduced the above logic from three things:
1. How the linear model is applied.
2. The comments about simple_net made by BobMcDear on the first page of this thread on the fastai forum.
3. The comment made about simple_net by akashgshastri on the first page of this thread on the fastai forum.
As it stands currently, I am open to being corrected on this logic…
def get_batch_predictions(xb):
return torch.stack([simple_net(x) for x in xb])
## Training the Model Now we also need a loss function that will show how far our prediction is from the truth. Since all our labels/ground truths are values of either 0 or 1, we can normalise our predictions for values that only lie between 0 and 1 also. For this we can use the sigmoid function.
def sigmoid(x): return 1/(1+torch.exp(-x))
def loss_function(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1-predictions, predictions).mean()
It is the loss that we eventually call .backward()
on in order to get our gradient with respect to each of the params (w1
, b1
, w2
, and b2
).
Lets prepare a batch out of our training for giving this method a go.
batch_x = train_x[:5]
batch_preds = get_batch_predictions(batch_x)
batch_targets = train_y[:5]
batch_x,batch_preds,batch_targets
(tensor([[0.4484, 0.4471, 0.4444, ..., 0.4719, 0.4784, 0.4758],
[0.2209, 0.2222, 0.2248, ..., 0.4000, 0.4157, 0.4405],
[0.7686, 0.7739, 0.7791, ..., 0.6261, 0.5869, 0.5516],
[0.5830, 0.5895, 0.5974, ..., 0.2902, 0.3007, 0.5137],
[0.8366, 0.8366, 0.8366, ..., 0.5412, 0.5255, 0.5098]]),
tensor([[-1149.1401],
[ -343.6340],
[-1202.1691],
[ -801.5582],
[-1316.0537]], grad_fn=<StackBackward0>),
tensor([[0],
[0],
[0],
[0],
[0]]))
loss = loss_function(batch_preds, batch_targets)
loss
tensor(0., grad_fn=<MeanBackward0>)
loss.backward()
w1.grad.shape,w1.grad.mean(),b1.grad,w2.grad.shape,w2.grad.mean(),b2.grad
(torch.Size([16384]),
tensor(0.),
tensor([-0., 0., 0., -0., -0.]),
torch.Size([5]),
tensor(0.),
tensor([0.]))
now lets do all that in one function
def calc_gradient(xb,yb,model):
preds = model(xb)
loss = loss_function(preds,yb)
loss.backward()
calc_gradient(train_x[:5],train_y[:5],get_batch_predictions)
w1.grad.mean(),b1.grad.mean(),w2.grad.mean(),b2.grad.mean()
(tensor(0.), tensor(0.), tensor(0.), tensor(0.))
w1.grad.zero_()
b1.grad.zero_()
w2.grad.zero_()
b2.grad.zero_()
tensor([0.])
now we can write a train epoch function that does this all in one go.
def train_epoch(model, learning_rate, params):
for xb,yb in training_dl:
calc_gradient(xb ,yb, model)
for p in params:
p.data -= p.grad*learning_rate
p.grad.zero_()
lets also create a function to check accuracy of each batch.
def batch_accuracy(preds_b, targets_b):
preds_normalised = preds_b.sigmoid()
correct = (preds_normalised>0.5) == targets_b
return correct.float().mean()
batch_accuracy(batch_preds, batch_targets)
tensor(1.)
we can also create a function to show how accurate are model is after each training epoch. this would be done by testing it agains our validation data.
### round() rounds the first arugment a number of decimal digits provided by the second argument
def validate_epoch(model):
accuracies = [batch_accuracy(model(xb), yb) for xb, yb in valid_dl]
return round(torch.stack(accuracies).mean().item(), 4)
validate_epoch(get_batch_predictions)
0.5
lr = 1.
params = w1,b1,w2,b2
# train_epoch(get_batch_predictions, lr, params)
# validate_epoch(get_batch_predictions)
now lets try it over a few epochs.
for i in range(10):
train_epoch(get_batch_predictions, lr, params)
print(validate_epoch(get_batch_predictions), end=' ')
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
Clearly this is not indicative of a model that will improve with training. I tried various ways of tinkering and debugging my simple_net
but to no avail. I could only conclude that simple_net
is not sufficient for the task I have, or, if it is to be sufficient, then it requires a lot more computing power than I am throwing at it (which could be by way of more weights matrices and more training epochs).
So the next thing I tried was the exampt same simple_net
but declared and trained the purely fastai and PyTorch way.
using PyTorch instead, we can create it the following way:
simple_net = nn.Sequential(
nn.Linear(128*128,20),
nn.ReLU(),
nn.Linear(20,1)
)
fastai has as built in Stochastic Gradient Descent optimiser, SGD.
And then fastai also provides us with Learner that we can use to create put everything together:
learn = Learner(dls, simple_net, opt_func=SGD, loss_func=loss_function, metrics=batch_accuracy)
now we can use fastai’s Learner.fit instead or our for loop of train_epoch.
learn.fit(10,lr=lr)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.543056 | 0.500000 | 0.500000 | 00:00 |
1 | 0.525924 | 0.500000 | 0.500000 | 00:00 |
2 | 0.515540 | 0.500000 | 0.500000 | 00:00 |
3 | 0.508081 | 0.500000 | 0.500000 | 00:00 |
4 | 0.506870 | 0.500000 | 0.500000 | 00:00 |
5 | 0.506186 | 0.500000 | 0.500000 | 00:00 |
6 | 0.503422 | 0.500000 | 0.500000 | 00:00 |
7 | 0.501898 | 0.500000 | 0.500000 | 00:00 |
8 | 0.501641 | 0.500000 | 0.500000 | 00:00 |
9 | 0.500227 | 0.500000 | 0.500000 | 00:00 |
So I received pretty much the same result as when I did it using a more manual methodology.
And now for the second out of my proposed methods, I will take full advantage of fastai. I will use the DataBlock
api for creating my DataLoaders and I will use resnet18
as my model. I’ll also use vision_learner
to train it.
Note I also used
DataBlock
as part of my image classifier that you can see in my previous blog post
data_block = DataBlock(
blocks=(ImageBlock,CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=2),
get_y=parent_label,
item_tfms=Resize(128))
We can now use the dataloaders
method on our data_black
. As you may see, the data block has defined the following:
ImageBlock
’s and the label’s are CategoryBlock
’sget_image_files
(the same one we used in our manual data prepartion).seed
means that we will always use the same set of data as our validation, ensuring the model does not get trained on it.Note: in our manual data preperation we created our own validation data set and training data set. As you see above here
DataBlock
is instructed to extract a validation set automatically for us. For that reason we will now access a folder that has all the same data we used in our manual method, but this time they are seperated into ‘positive’ and ‘negative’ category as a whole. They are not further split into training and validation.
dls=data_block.dataloaders(path/'all')
Lets take a look at some of the images in our validation set.
dls.valid.show_batch(max_n=4, nrows=1)
All looking good so far. At the moment the data_block just has each image cropped at a random position, at 128 pixels by 128 pixels. Now lets take a look at some of the different ways we could transform our data, before we use it to train our model.
Here’s how it looks with the images ‘squished’:
data_block = data_block.new(item_tfms=Resize(128, ResizeMethod.Squish))
dls = data_block.dataloaders(path/'all')
dls.valid.show_batch(max_n=4, nrows=1)
Here’s how it looks with the images ‘padded’ inorder to fill any space that may be left when minimising them inorder to fit within the specified 128 by 128 size:
data_block = data_block.new(item_tfms=Resize(128, ResizeMethod.Pad, pad_mode='zeros'))
dls = data_block.dataloaders(path/'all',bs=5)
dls.valid.show_batch(max_n=4, nrows=1)
Since the actual image in my image library that I am looking for is not skewed or stretched in anyway, I decided to go for the ‘padded’ mode with my data.
Note how this time I included
bs=5
in thedataloaders
method call. Without this i kept gettingnan
for mytrain_loss
when training the model. The following three lines of code were what I used to help me detect what was wrong here:
x,y = learn.dls.one_batch()
out = learn.model(x)
learn.loss_func(out, y)
which gave the error message:ValueError: This DataLoader does not contain any batches
.
The debugging code was suggested by KevinB in the first page of this thread on the fastai forum
# x,y = learn.dls.one_batch()
# out = learn.model(x)
# learn.loss_func(out, y)
I then proceeded to train a resnet18
model on my data using vision_learner
:
Note before successfully doing so it turned out that I had to downgrade my version of
torchvision
to 0.12 so that the methods I was opting to use, would work. I also had to upgrade my MacOS operating system as all apple silicon running on anything below 12.3 MacOS did not have Pytorch GPU support.
#!!!ACTIVATE CORRECT ENVIRONMENT BEFORE RUNNING THIS CELL!!!#
# !pip install --upgrade torchvision==0.12
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(10)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.761049 | 2.855227 | 0.833333 | 00:03 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.013831 | 2.266986 | 0.666667 | 00:02 |
1 | 0.923140 | 1.505093 | 0.666667 | 00:02 |
2 | 1.193987 | 1.579342 | 0.500000 | 00:02 |
3 | 1.121834 | 1.969770 | 0.500000 | 00:02 |
4 | 1.127938 | 2.468466 | 0.666667 | 00:02 |
5 | 1.091505 | 2.685832 | 0.833333 | 00:02 |
6 | 1.096134 | 2.706148 | 0.833333 | 00:02 |
7 | 1.015126 | 2.648789 | 0.833333 | 00:02 |
8 | 0.935627 | 2.475007 | 0.833333 | 00:02 |
9 | 0.947918 | 2.624727 | 0.833333 | 00:02 |
A very underwhelming result, just like in the first method via my simple_net
.
Non the less I thought I would confirm just how confused this model is by plotting a confusion matrix:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
Why not take a look at the top losses whilst were at it!
interp.plot_top_losses(5, nrows=1)
I can only conclude that I, thus far, am using models that are not sufficient for the purpose that I require them for (object detection). I clearly have a lot more to learn about different neural networks and will endeavour to pay close attention to this matter in my continued study.