Gurtaj's blog!

Why fastai?

I wanted to get into machine learning, and I had heard that TensorFlow and PyTorch were topdogs in the game so I decided to do some research on YouTube to find comparisons of the two, in order to try and deduce which would be better. After some time I decided that a fastai & PyTorch stack would be best for me as it would be:

Following the Course CNN Example

Some key things to note before starting

The course first takes us through the differences/similarities between a program and a neural network model; in short once a model is trained it can be treated just like a program.

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Untitled.png

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Untitled%201.png

It also tells us some key things to be aware of when training models:

Avoid overfitting:

This is where the model gets really good at predicting the dataset that you have provided for training but is no good for data outside that set:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Untitled%202.png

It has memorised the training set rather than trying to find generalisable underlying patterns in the data.

This is why, in our data, we must make sure that we have a validation set as well as a training set.

We should also ensure that our validation set is always the same so that our model never has a chance to memorise the labels on that set of data.

We can then see if our model is overfitting by comparing our chose metric(s) on the training data to the metric(s) on the validation data.

My First CNN

The first neural network architecture we dive into is CNNs (convolutional neural networks) which are good for performing tasks on images, or any type of data that have meaning when converted to images (like sounds, time series’, or even malware binary files).

fastai have a github repo that contains all the jupyter files needed in order to interact with the course examples and try things out.

We are walked through an example of making a bear classifier and all the methods described are exactly what i used for my icon classifier, so here goes..

First is the process of gathering data, the course recommends a bing image search method but this seemed like a lot more effort than i was willing to put into it just because it was free (creating an account and using the given key etc.). With a quick bit of googling i was able to find jmd_imagescraper which uses DuckDuckGo for scraping images:

!pip install jmd_imagescraper
from jmd_imagescraper.core import * # don't worry, it's designed to work with import *

I found this to be so satisfactory that now, even instead of Googling things, i just Duck it!

Initially my icons of choice were Drake, Snoop Dogg, and Beyonce. I figured they’re all famous enough to have all first 150 image results actually be them. meaning the scraped data would be labelled correctly:

icon_types = 'beyonce','snoop dogg','drake'
# use line below if already have path directory need to delete its' existing contents
# shutil.rmtree(path)
path = Path('icons')
for o in icon_types:
    results = duckduckgo_search(path, f'{o}', f'{o}', max_results=150)

Out of the results we get, all may not be images (or some of the files may be corrupted), luckily fastai has a couple of functions, get_image_files and verify_images, that we can use to detect these and then remove them from our data:

fns = get_image_files(path)
failed = verify_images(fns)
failed.map(Path.unlink);

Before we create our model we have to tell fastai what kind of data we have, and how its’ structured. We can do this by using the DataBlock api:

icons = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
		batch_tfms=aug_transforms())

Here, we are telling fastai:

The DataBlock is then used, like a template, for creating our DataLoaders. We also still need to tell fastai the source of our data, in this case path:

dls = icons.dataloaders(path)

Now that I had created my DataLoaders it was time to train my icon classifier. We create our Learner and fine tune it.


learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

Here I have told fastai to create a convolutional neural network (CNN).

fastai also has a way of plotting the top losses of a trained network interp.plot_top_losses. It also has an ImageClassifierCleaner() tool. The latter is a GUI that allows us to go through the data in order of highest to lowest loss. When i went through the data that was labelled as Drake i saw that there was a high number of images that were not the singer Drake but other well known Drake’s who were sports players, random drakes i had no awareness of, and also Nathan Drake, the video game character from Uncharted:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-10_at_08.13.16.png

One of these imposter Drakes was even confused to be Beyoncé somehow:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-10_at_08.14.25.png

I figured i should switch Drake for another well known celebrity with a more unique name. So instead i went with Eminem, in retrospect he probably should have been my first choice all along!

Another handy visual tool that is provided by fastai is interp.plot_confusion_matrix. It is a visual representation of what the model predicts vs what the real truth is (calculated using the validation set):

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-10_at_08.11.43.png

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-12_at_08.26.16.png

Once i had trained this model with the new choice of famous icons, it was time to turn it into a a demo-able online application. We use fastai’s learn.export() which saves both the architecture and the trained parameters of our model. We can now use this exported model wherever we want our app to exist.

Since i do not expect this model to be used on much more than one pic at a time i figured i no longer need to utilise a GPU and therefore can run the app on my local machine and do it all on a CPU instead.

Running jupyter notebook Locally

Jupyter was all still very new to me and i had only ever used it on cloud services. After a quick bit of ducking i found that vscode already has an extension to run jupyter notebooks locally. This seemed like my most approachable entry into running things on my machine since I was already a vscode user anyway.

I simply installed Microsoft’s Jupyter and Python extensions for vscode and i was ready to go!

TURNS OUT there is a bug associated with using an upload button from ipywidgets which is needed for this app, at the time of me trying to do this, vscode were aware of the issue but had ‘no plans to fix’ it:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-08_at_11.52.48.png

For this reason i had to ditch using vscode and opt to just run jupyter notebook from my CLI (command line interface) manually.

I downloaded and installed the latest python version (python 3.9). I then downloaded and installed pip, the package management system used to install and manage software packages/libraries written in Python.

My first code block in the notebook was used to install fastbook and ipywidgets. ipywidgets is what provides the GUI components that will be utilised for the app. I then continued to import all the necessary libraries.

!pip install ipywidgets
!conda install fastbook

pip is very useful and I was able to install ipywidgets with it just fine. I used anaconda (conda) to install fastbook (note the use of ! is only used when writing lines of codes from within a notebook, that you want to actually be executed from the terminal).

Installing jupyter proved to be very troublesome, i went through many guided examples online that all used pip so i know it is possible, but for whatever reason, the setup i had on my machine just wasn’t allowing for it. I actually spent around 3 days trying different methods. Finally, I followed jupyter’s steps for using anaconda instead. This actually worked straight away with no hiccups, so in retrospect i should have just done it this way from the start.

With everything installed I imported all the libraries needed by my notebook:

import fastbook
fastbook.setup_book()
from fastbook import *
from pathlib import Path
from IPython.display import display
import ipywidgets as widgets

Now I could import my model and begin using it!

path = Path()
learn_inf = load_learner(path/'export.pkl')

First I downloaded the export.pkl file of the trained model, from the cloud notebook so that I could access it in my local repo.

It was then imported using learn_inf = load_learner(path/'export.pkl').

I needed to create two buttons

btn_upload = widgets.FileUpload()
out_pl = widgets.Output()
lbl_pred = widgets.Label()
btn_run = widgets.Button(description='Classify Icon')

Now we define the actual function that will run when we click the classify button (the upload button already has the upload functionality built in):

def on_click_classify(change):
	img = PILImage.create(btn_upload.data[-1])
	out_pl.clear_output()
	with out_pl: display(img.to_thumb(128, 128))
	pred, pred_idx, probs = learn_inf.predict(img)
	lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'

This function can now be attached to our classify button:

btn_run.on_click(on_click_classify)

So now we have made all our GUI components and attached functionality to them, but we haven’t actually told our notebook to display them yet. We can do this with VBox:

widgets.VBox([widgets.Label('Provide an image of your famous icon!'), btn_upload, btn_run, out_pl, lbl_pred])

Running this code displays all the GUI elements passed to it. Notice how the first thing passed is a label that provides a bit of instruction to the user widgets.Label('Provide an image of your famous icon!').

I also added a markdown cell above the GUI components to add a bit of a descriptive introduction of the model, for users.

Running Demo’s on Local Machine

Once we have jupyter notebook running it is actually very easy to access voila. All we need to do is go to the /voila route of the url that is served by jupyter. So something like http::/localhost:8888/tree for our jupyter notebook tree, would become http::/localhost:8888/voila for our voila tree, for example.

The notebook tree looks something like this:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-08_at_16.44.19.png

And the voila tree looks something like this:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-08_at_16.45.12.png

The voila instance of my notebook looked like this at first:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-09_at_07.40.13.png

Notice the installation code displaying at the top. Clearly this is of no importance to the user and should be hidden. I did a little ducking on how to hide certain outputs from appearing in voila but i didn’t see anything that looked very promising. I decided to just try and comment out the installation code in my notebook:

# !pip install ipywidgets
# !conda install fastbook

Then when i ran the app it looked like this:

https://raw.githubusercontent.com/gurtaj1/blog/master/post%20assets/Screenshot_2021-08-09_at_07.43.18.png

All the things, and only the things, that i want to appear to a user.

I can only assume that this worked ok without the installation code still running because i had already had the relevant packages installed locally and therefore did not need to re-install them whenever running the notebook. This left me wondering how i can achieve this same clean output when I deploy the notebook. I detail how this is done in the deployment section.

After getting voila to work locally, I was ready to deploy it to somewhere remote so that people can access it from anywhere.

Deployment of my Model

The fastai course videos and book recommend a site called Binder for this so I gave it a go but was unsuccessful after several attempts. I found other people online had similar issues as me so, as the the fastai book had instructed, for more up to date information i should refer to the course website. On here i found guidelines on how to use heroku for deployment and I followed the steps.

Just as with Binder, I had to ensure that I had my notebook in a GitHub repository.

I also learnt that I needed a requirements.txt file in this repo; to list all the packages that are required for my notebook to run:

http://download.pytorch.org/whl/cpu/torch-1.7.1%2Bcpu-cp39-cp39-linux_x86_64.whl
http://download.pytorch.org/whl/cpu/torchvision-0.9.0%2Bcpu-cp39-cp39-linux_x86_64.whl
fastai
voila
ipywidgets
fastbook

Notice the pytorch wheel url’s at the top, it took a bit of ducking to figure out which exact versions of torch and torchvision were required for the version of python that i was using (3.9) but eventually, after several deployment attempts and interpretation of heroku error messages, i had found that these were the correct ones.

Another file that i require in my repo was Procfile, the process file. This tells Heroku what kind of app to create:

web: voila --port=$PORT --no-browser --enable_nbextensions=True icon_classifier.ipynb

And that was it!

I had officially made my first personally trained CNN model and deployed it online, ready to share with whomever I like.

EDIT (09/05/2023):

The model is no longer hosted on herokuapp (I assume due to my inactivity), but you can now find the same model, with a slightly different UI, here.