# Single Layer NN using TensorFlow¶

The following code classifies MNIST dataset using a single layer NN with softmax activation function, Cross entropy loss function and Mini-batch Technique

## Softmax Function¶

$\sigma(z){_j} = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_j}}$

## Cross Entropy¶

$Cross Entropy = -{\sum_{i=1}^{i=n}Y_{i}^{'}.\log(Y^i)}$

## Mini-batch Technique¶

Taking a btach of 100 images in a single iteration. 2 Reasons to use it:

1. Analyzing Single image results in a curvy descent. Knowledge of 100 images at a single time gives a more precise consensus of the gradient
2. We do distributed processing using GPUs on which matrix multiplications works faster. (Optimised for GPUs)
In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline


MNIST dataset is a handwritten numbers dataset. We download it from tensorflow examples.

Make sure to change the path according to your need

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
print("Extraction of images is complete.")

Extracting tutorials/data/MNIST/train-images-idx3-ubyte.gz
Extracting tutorials/data/MNIST/train-labels-idx1-ubyte.gz
Extracting tutorials/data/MNIST/t10k-images-idx3-ubyte.gz
Extracting tutorials/data/MNIST/t10k-labels-idx1-ubyte.gz
Extraction of images is complete.


## TensorFlow Placeholders¶

Tensorflow placeholders are like variables waiting for input. They are access points to the computational graphs on which we can just feed into the graph.

In [3]:
X = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
init = tf.global_variables_initializer()


## Model¶

Our model is a Softmax Function applied over the weighted sum of the image pixels along with a bias added to it.
$Y = softmax(X.W + b)$
Where,

X = flattened image vector (1, 784)

W = Weights matrix of shape (784, 10) -> 784 weights for each pixel and 1 column for each class [0-9]

b = bias vector with 10 columns, each column representing a class

## Role of Mini-batch¶

Since, we are using mini-batch technique. We will use X as a flattened image matrix with 100 rows, each row representing a flattened image.

In [4]:
Y = tf.nn.softmax(tf.matmul(X, W) + b)


## Placeholder for correct Label¶

We need a Placeholder, to hold the correct labels, which will help us to compute the accuracy and cross-entropy of our model.

In [5]:
Y_ = tf.placeholder(tf.float32, [None, 10])


## Loss Function¶

As discussed above we are using cross-entropy loss function
$Cross Entropy = -{\sum_{i=1}^{i=n}Y_{i}^{'}.\log(Y^i)}$

In [6]:
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))


## % of correct answers found in batch¶

Graph nodes to compute the accuracy of our model

In [7]:
is_correct = tf.equal(tf.arg_max(Y_, 1), tf.arg_max(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))


## Optimizer¶

We are using the simplest Gradient Descent technique as an optimizer, with a learning rate of 0.003. This means we will be adding the 0.3% value of our gradient to the weights everytime.

Gradients have a unique property to point towards the minima of the curve. Thus resulting in, change of weights to reach a minima in the loss function.

In [8]:
optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)


## Session¶

According to definition on tensorflow documentation:

A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.

Well, we can say this is a kind of main function to our computational graph. That is, it starts the execution of the graph in the order, we added the computational nodes.

In [9]:
sess = tf.Session()
sess.run(init)


## Training¶

Time to train the system. We run 2000 iterations on the train step and compute the accuracy and cross-entropy on each step. Along side we compute the accuracy and cross-entropy on the test set on each iteration, to see the improvement on alien data.

In [10]:
# no. of iterations
n_iter = 2000

# test set
test_data = {X: mnist.test.images, Y_: mnist.test.labels}

# lists to hold train accuracy and cross-entropy
acc_train_li = []
cross_train_li = []

# lists to hold test accuracy and cross-entropy
acc_test_li = []
cross_test_li = []

for i in range(n_iter):
bacth_X, batch_Y = mnist.train.next_batch(100)
train_data = {X: bacth_X, Y_: batch_Y}

# train
sess.run(train_step, feed_dict=train_data)

# find accuracy and cross entropy on current data
a, c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
acc_train_li.append(a)
cross_train_li.append(c)

# find accuracy and cross entropy on test data
a, c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
acc_test_li.append(a)
cross_test_li.append(c)


## Plot the graph¶

We plot 2 graphs:

1. Accuracy graph – To display the accuracy on the train data and test data
2. Cross-entropy graph – To display the cross-entropy loss on train and test data
In [11]:
x = list(range(n_iter))

blue_patch = mpatches.Patch(color='blue', label='Train Data')
red_patch = mpatches.Patch(color='red', label='Test Data')

plt.figure(0, figsize=(10, 12))

plt.subplot(211)
plt.title("Accuracy")
plt.legend(handles=[blue_patch, red_patch])
plt.plot(x, acc_train_li, color='blue')
plt.plot(x, acc_test_li, color='red')

plt.subplot(212)
plt.legend(handles=[blue_patch, red_patch])
plt.title("Cross-Entropy Loss")
plt.plot(x, cross_train_li, color='blue')
plt.plot(x, cross_test_li, color='red')

plt.show()


## Final Loss and Accuracy¶

Let’s have a peek at the final loss and accuracy of the training and test sets

In [12]:
print('Train Set Accuracy: {} \t Train Set cross-entropy Loss: {}'.format(acc_train_li[-1], cross_train_li[-1]))
print('Test Set Accuracy: {} \t Test Set cross-entropy Loss: {}'.format(acc_test_li[-1], cross_test_li[-1]))

Train Set Accuracy: 0.9200000166893005 	 Train Set cross-entropy Loss: 25.540407180786133
Test Set Accuracy: 0.921000063419342 	 Test Set cross-entropy Loss: 2800.803466796875


## Conclusion¶

Using a Single layered neural network resulted in an accuracy of approximately 92%. Considering the situation of using this system in a post office to detect hand written numbers can be a devastating.
Why? Because, according to our finding it will misinterpret 8 out every 100 images (92%).

Code available at this repository

Found an issue? Or just want to say hello? You can contact me on my website