Single Layer NN using TensorFlow

Single Layer NN using TensorFlow

The following code classifies MNIST dataset using a single layer NN with softmax activation function, Cross entropy loss function and Mini-batch Technique

Softmax Function

\sigma(z){_j} = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_j}}

Cross Entropy

Cross Entropy = -{\sum_{i=1}^{i=n}Y_{i}^{'}.\log(Y^i)}

Mini-batch Technique

Taking a btach of 100 images in a single iteration. 2 Reasons to use it:

  1. Analyzing Single image results in a curvy descent. Knowledge of 100 images at a single time gives a more precise consensus of the gradient
  2. We do distributed processing using GPUs on which matrix multiplications works faster. (Optimised for GPUs)
In [1]:
import os
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline

Load Data

MNIST dataset is a handwritten numbers dataset. We download it from tensorflow examples.

Make sure to change the path according to your need

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("tutorials/data/MNIST/", one_hot=True)
print("Extraction of images is complete.")
Extracting tutorials/data/MNIST/train-images-idx3-ubyte.gz
Extracting tutorials/data/MNIST/train-labels-idx1-ubyte.gz
Extracting tutorials/data/MNIST/t10k-images-idx3-ubyte.gz
Extracting tutorials/data/MNIST/t10k-labels-idx1-ubyte.gz
Extraction of images is complete.

TensorFlow Placeholders

Tensorflow placeholders are like variables waiting for input. They are access points to the computational graphs on which we can just feed into the graph.

In [3]:
X = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.global_variables_initializer()


Our model is a Softmax Function applied over the weighted sum of the image pixels along with a bias added to it.
Y = softmax(X.W + b)

X = flattened image vector (1, 784)

W = Weights matrix of shape (784, 10) -> 784 weights for each pixel and 1 column for each class [0-9]

b = bias vector with 10 columns, each column representing a class

Role of Mini-batch

Since, we are using mini-batch technique. We will use X as a flattened image matrix with 100 rows, each row representing a flattened image.

In [4]:
Y = tf.nn.softmax(tf.matmul(X, W) + b)

Placeholder for correct Label

We need a Placeholder, to hold the correct labels, which will help us to compute the accuracy and cross-entropy of our model.

In [5]:
Y_ = tf.placeholder(tf.float32, [None, 10])

Loss Function

As discussed above we are using cross-entropy loss function
Cross Entropy = -{\sum_{i=1}^{i=n}Y_{i}^{'}.\log(Y^i)}

In [6]:
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

% of correct answers found in batch

Graph nodes to compute the accuracy of our model

In [7]:
is_correct = tf.equal(tf.arg_max(Y_, 1), tf.arg_max(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))


We are using the simplest Gradient Descent technique as an optimizer, with a learning rate of 0.003. This means we will be adding the 0.3% value of our gradient to the weights everytime.

Why Gradient?

Gradients have a unique property to point towards the minima of the curve. Thus resulting in, change of weights to reach a minima in the loss function.

In [8]:
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)


According to definition on tensorflow documentation:

A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.

Well, we can say this is a kind of main function to our computational graph. That is, it starts the execution of the graph in the order, we added the computational nodes.

In [9]:
sess = tf.Session()


Time to train the system. We run 2000 iterations on the train step and compute the accuracy and cross-entropy on each step. Along side we compute the accuracy and cross-entropy on the test set on each iteration, to see the improvement on alien data.

In [10]:
# no. of iterations
n_iter = 2000

# test set
test_data = {X: mnist.test.images, Y_: mnist.test.labels}

# lists to hold train accuracy and cross-entropy
acc_train_li = []
cross_train_li = []

# lists to hold test accuracy and cross-entropy
acc_test_li = []
cross_test_li = []

for i in range(n_iter):
    # load batch of images and correct answer
    bacth_X, batch_Y = mnist.train.next_batch(100)
    train_data = {X: bacth_X, Y_: batch_Y}
    # train, feed_dict=train_data)
    # find accuracy and cross entropy on current data
    a, c =[accuracy, cross_entropy], feed_dict=train_data)
    # find accuracy and cross entropy on test data
    a, c =[accuracy, cross_entropy], feed_dict=test_data)

Plot the graph

We plot 2 graphs:

  1. Accuracy graph – To display the accuracy on the train data and test data
  2. Cross-entropy graph – To display the cross-entropy loss on train and test data
In [11]:
x = list(range(n_iter))

blue_patch = mpatches.Patch(color='blue', label='Train Data')
red_patch = mpatches.Patch(color='red', label='Test Data')

plt.figure(0, figsize=(10, 12))

plt.legend(handles=[blue_patch, red_patch])
plt.plot(x, acc_train_li, color='blue')
plt.plot(x, acc_test_li, color='red')

plt.legend(handles=[blue_patch, red_patch])
plt.title("Cross-Entropy Loss")
plt.plot(x, cross_train_li, color='blue')
plt.plot(x, cross_test_li, color='red')

Final Loss and Accuracy

Let’s have a peek at the final loss and accuracy of the training and test sets

In [12]:
print('Train Set Accuracy: {} \t Train Set cross-entropy Loss: {}'.format(acc_train_li[-1], cross_train_li[-1]))
print('Test Set Accuracy: {} \t Test Set cross-entropy Loss: {}'.format(acc_test_li[-1], cross_test_li[-1]))
Train Set Accuracy: 0.9200000166893005 	 Train Set cross-entropy Loss: 25.540407180786133
Test Set Accuracy: 0.921000063419342 	 Test Set cross-entropy Loss: 2800.803466796875


Using a Single layered neural network resulted in an accuracy of approximately 92%. Considering the situation of using this system in a post office to detect hand written numbers can be a devastating.
Why? Because, according to our finding it will misinterpret 8 out every 100 images (92%).

Code available at this repository

Found an issue? Or just want to say hello? You can contact me on my website


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s