Building & Visualizing your first ConvNet

In the last few years alone a special kind of Neural Network known as the Convolutional Neural Network (ConvNet) has gained a lot of popularity, and for all the right reasons. ConvNets have a unique property of retaining translational invariance. In elaborative terms, they exploit spatially-local correlation by enforcing a local connectivity pattern between neurons of adjacent layers.

To know more about ConvNets or Convolutions in general, you can read about them on Christopher Olah’s blog here –

My objective of writing this article is to get you upto speed on training deep learning models in the cloud without the hassles of setting up a VM, AWS instance or anything of that sort. After following through this article you’d be able to design your own classification task with lots of images & train your own deep learning models. All you’ll need to implement the entire thing is a some knowledge of Python, GitHub & basics of Keras – the quintessential DL starter. The data & implementation used here is inspired from this post on the official Keras blog.

Setup

In any binary classification task our primary requirement is the data itself, more so, a dataset segregated into 2 classes. For the purpose, I’m sticking to a simple dataset – DogsVsCats from Kaggle. It is one of the most rudimentary classification task where we classify whether an image contains a dog or a cat. For simplicity, I’ve bundled 1000 images of Dogs & Cats each and created a directory with the following structure.

.
.
├── train
	├── cats
	├── dogs
└── val
	├── cats
	├── dogs
└── test
.
.

The dataset can be accessed on FloydHub at sominw/datasets/dogsvscats/1

If you haven’t already setup an account on FloydHub, you can do so by using the FloydHub QuickStart Documentation. It’s incredibly simple & if you’re stuck at any point, they provide intercom chat support.

Cloning the Git Repo

I’ve already prepared a starter code to use the above mentioned dataset that’ll allow you to tinker around with the model we’re building and make you understand the basics of training models with large datasets on Floyd. Navigate to a directory of your choice and enter the following –

$ git clone https://github.com/sominwadhwa/DogsVsCats-Floyd.git

Creating a project on Floyd

Once you’re through with cloning the directory, it’s time to initialise the project on Floyd.

Create & name your project under your account on FloydHub.com.
Locally, within the terminal, head over to the git repo (DogsVsCats-Floyd) managing the source code.

$ floyd init [project-name]

Code

Pre-requisites

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dropout, Activation, Flatten, Dense
from keras.callbacks import ModelCheckpoint, TensorBoard
import h5py
from keras import backend as K
import numpy as np

NumPy: NumPy is a scientific computing package in Python. In context of Machine Learning, it is primarily used to manipulate N-Dimensional Arrays & some linear algebra & random number capabilities. (Documentation)
Keras: Keras is a high level neural networks API used for rapid prototyping. We’ll be running it on top of TensorFlow, an open source library for numerical computation using data flow graphs. (Documentation)
h5py: Used simultaneously with NumPy to store huge amounts of numerical data in HDF5 binary data format. (Documentation)
Tensorboard: Tool used to visualize the static compute graph created by TensordFlow, plot quantitative metrics during the execution, and show additional information about it. (Concise Tutorial)

width, height = 150, 150
training_path = "/input/train"
val_path = "/input/val"
n_train = 2000
n_val = 400
epochs = 100
batch_size = 32

The above snippet defines the training & validation paths. /input is the default mount point of any directory (root) uploaded as ‘data’ on Floyd. The dataset used here is a publically accessible one.

Model Architecture

model = Sequential()
model.add(Conv2D(32,(3,3), input_shape= input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.4))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
        optimizer='rmsprop',
        metrics=['accuracy'])

Keras makes it incredibly simple to sequentially stack fully configurable modules of neural layers, cost functions, optimizers, activation functions & regularization schemes over one another. For this demonstration, we’ve stacked three 2D ConvNet layers (1 Input, 2 Hidden) with ReLu activation. To control overfitting, there’s a 40% dropout before the final activation in the last layer of the network along with MaxPooling layers. For the loss function, since this is a standard binary classification problem, binary_crossentropy is a standard choice. To read & learn more about Cross-Entropy loss, you can checkout this article by Rob DiPietro.

Pooling: One indispensable part of a ConvNet is the Pooling Layer. It serves two primary purposes. By progressively reducing the spatial size of the representation, it retains ‘translational invariance’ in the network and by virtue of that it also reduces the amount of parameters and computation in the network, hence also controlling overfitting. Pooling is often applied with filters of size 2x2 with a stride of 2 at every depth slice. A pooling layer of size 2x2 with stride of 2 shrinks the input image to 1/4 of its original size.

Data Preparation

Since we’re using very little data (1k training examples per class), we try to augment these examples by a number of different image transformations using ImageGenerator class in Keras.

train_data = ImageDataGenerator(
        rescale= 1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

train = train_data.flow_from_directory(
        training_path,
        class_mode='binary',
        batch_size=batch_size,
        target_size=(width,height))

So with a single image, we can generate a lot more belonging to the same class, containing the same object but in a slightly different form.

TensorBoard

Tensorboard is a visualization tool provided with TensorFlow that allows us to visualize TensorFlow compute graphs among other things.

tensorboard = TensorBoard(log_dir='/output/Graph', histogram_freq=0, write_graph=True, write_images=True)

Keras provides callbacks to implement TensorBoard among other procedures to keep a check on the internal states & statistics of the model during training. More so, FloydHub provides exclusive support for TensorBoard inclusion. For instance, the above snippet stores the TensorBoard logs in a directory /Graph & generates the graph in real time.

To know more about TensorBoard functionality & its usage head over to the official documentation.

Training

So now that we’ve thoroghly dissected the code, it’s finally time to train this network on the cloud. To run this job on Floyd, simply run the following in your terminal (after navigating to the project directory)

floyd run --data sominw/datasets/dogsvscats/1:input --gpu --tensorboard "python very_little_data.py --logdir /output/Graph"

--logdir flag provides a directory for storing the tensorboard logs.
--gpu (optional) indicates that you wish to use the GPU compute.
--tensorboard indicates the usage of TenorBoard.

Upon indicating that you’re using TensorBoard (while executing the job), FloydHub provides a direct link to access the TenorBoard.

To know more about TensorBoard’s support on Floyd, you can checkout this article by Naren Thiagarajan.

Outputs

Keras lets you store multi dimensional numerical matrices in the form of weights in HDF5 Binary data format.

model.save_weights('/output/very_little_weights.hdf5')

The snippet above stores your generated weight file, at the end of training, to the /Output directory. And that’s it! You’ve finally trained & visualized your first scalable ConvNet.

I’ll encourage you to try out your own variants of ConvNets by editing the source code. In fact, you can refer to this article & build an even more powerful ConvNet by using pre-trained VGG weights.

If you’d like to read more about running instances on Floyd, using Datasets & running jobs with external dependencies, read my previous article on FloydHub’s Blog: Getting Started with Deep Learning on FloydHub