Getting deeper with Keras¶

Tensorflow is a powerful and flexible tool, but coding large neural architectures with it is tedious.
There are plenty of deep learning toolkits that work on top of it like Slim, TFLearn, Sonnet, Keras.
Choice is matter of taste and particular task
We'll be using Keras

import sys
sys.path.append("..")
import grading

# use preloaded keras datasets and models
! mkdir -p ~/.keras/datasets
! mkdir -p ~/.keras/models
! ln -s $(realpath ../readonly/keras/datasets/*) ~/.keras/datasets/
! ln -s $(realpath ../readonly/keras/models/*) ~/.keras/models/

import numpy as np
from preprocessed_mnist import load_dataset
import keras
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
y_train,y_val,y_test = map(keras.utils.np_utils.to_categorical,[y_train,y_val,y_test])

Using TensorFlow backend.

import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(X_train[0]);

The pretty keras¶

import tensorflow as tf
s = tf.InteractiveSession()

import keras
from keras.models import Sequential
import keras.layers as ll

model = Sequential(name="mlp")

model.add(ll.InputLayer([28, 28]))

model.add(ll.Flatten())

# network body
model.add(ll.Dense(128))
model.add(ll.Activation('relu'))

model.add(ll.Dense(128))
model.add(ll.Activation('relu'))

# output layer: 10 neurons for each class with softmax
model.add(ll.Dense(10, activation='softmax'))

# categorical_crossentropy is your good old crossentropy
# but applied for one-hot-encoded vectors
model.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 28, 28)            0         
_________________________________________________________________
flatten_7 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_20 (Dense)             (None, 128)               100480    
_________________________________________________________________
activation_14 (Activation)   (None, 128)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 128)               16512     
_________________________________________________________________
activation_15 (Activation)   (None, 128)               0         
_________________________________________________________________
dense_22 (Dense)             (None, 10)                1290      
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________

Model interface¶

Keras models follow Scikit-learn's interface of fit/predict with some notable extensions. Let's take a tour.

# fit(X,y) ships with a neat automatic logging.
#          Highly customizable under the hood.
model.fit(X_train, y_train,
          validation_data=(X_val, y_val), epochs=5);

Train on 50000 samples, validate on 10000 samples
Epoch 1/5
50000/50000 [==============================] - 18s - loss: 0.2522 - acc: 0.9258 - val_loss: 0.1204 - val_acc: 0.9647
Epoch 2/5
50000/50000 [==============================] - 17s - loss: 0.1047 - acc: 0.9677 - val_loss: 0.1039 - val_acc: 0.9703
Epoch 3/5
50000/50000 [==============================] - 17s - loss: 0.0716 - acc: 0.9774 - val_loss: 0.0953 - val_acc: 0.9719
Epoch 4/5
50000/50000 [==============================] - 17s - loss: 0.0539 - acc: 0.9827 - val_loss: 0.0845 - val_acc: 0.9771
Epoch 5/5
50000/50000 [==============================] - 19s - loss: 0.0429 - acc: 0.9860 - val_loss: 0.0716 - val_acc: 0.9800

# estimate probabilities P(y|x)
model.predict_proba(X_val[:2])

2/2 [==============================] - 0s

array([[  3.40621086e-14,   1.71583245e-08,   3.21689953e-07,
          9.99998808e-01,   6.88375789e-13,   1.19103518e-07,
          4.41528494e-14,   1.29938102e-10,   7.46766887e-07,
          4.22223589e-09],
       [  4.74102357e-08,   6.38726902e-08,   6.47961815e-06,
          1.76842204e-05,   1.48537280e-08,   2.51152346e-06,
          3.54826454e-07,   1.78500219e-08,   9.99971271e-01,
          1.57341390e-06]], dtype=float32)

# Save trained weights
model.save("weights.h5")

print("\nLoss, Accuracy = ", model.evaluate(X_test, y_test))

 9920/10000 [============================>.] - ETA: 0s
Loss, Accuracy =  [0.080907465473318008, 0.97829999999999995]

Whoops!¶

So far our model is staggeringly inefficient. There is something wring with it. Guess, what?

# Test score...
test_predictions = model.predict_proba(X_test).argmax(axis=-1)
test_answers = y_test.argmax(axis=-1)

test_accuracy = np.mean(test_predictions==test_answers)

print("\nTest accuracy: {} %".format(test_accuracy*100))

assert test_accuracy>=0.92,"Logistic regression can do better!"
assert test_accuracy>=0.975,"Your network can do better!"
print("Great job!")

 9984/10000 [============================>.] - ETA: 0s
Test accuracy: 97.83 %
Great job!

answer_submitter = grading.Grader("0ybD9ZxxEeea8A6GzH-6CA")
answer_submitter.set_answer("N56DR", test_accuracy)

answer_submitter.submit(<your-email>, <your-assignment-token>)

Submitted to Coursera platform. See results on assignment page!

Keras + tensorboard¶

Remember the interactive graphs from Tensorboard one notebook ago?

Thing is, Keras can use tensorboard to show you a lot of useful information about the learning progress. Just take a look!

! rm -r /tmp/tboard/**

from keras.callbacks import TensorBoard
model.fit(X_train, y_train, validation_data=(X_val, y_val), 
          epochs=10,
          callbacks=[TensorBoard("/tmp/tboard")])

Tips & tricks¶

Here are some tips on what you could do. Don't worry, to reach the passing threshold you don't need to try all the ideas listed here, feel free to stop once you reach the 0.975 accuracy mark.

Network size
- More neurons,
- More layers, (docs)
- Nonlinearities in the hidden layers
  - tanh, relu, leaky relu, etc
- Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

Early Stopping
- Training for 100 epochs regardless of anything is probably a bad idea.
- Some networks converge over 5 epochs, others - over 500.
- Way to go: stop when validation score is 10 iterations past maximum

Faster optimization
- rmsprop, nesterov_momentum, adam, adagrad and so on.
  - Converge faster and sometimes reach better optima
  - It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs

Regularize to prevent overfitting
- Add some L2 weight norm to the loss function, theano will do the rest
  - Can be done manually or via - https://keras.io/regularizers/

Data augmemntation - getting 5x as large dataset for free is a great deal
- https://keras.io/preprocessing/image/
- Zoom-in+slice = move
- Rotate+zoom(to remove black stripes)
- any other perturbations
- Simple way to do that (if you have PIL/Image):
  - from scipy.misc import imrotate,imresize
  - and a few slicing
- Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.