Handwritten Digit Recognition
using Multilayer Neural Network
V Sugamya Katta
3/4 IT, CBIT 3/4
of Handwriting by humans may seem as a very easy task but when done by a
machine, it is a very complex one. It is unproductive for humans to spend a lot
of time trying to recognize characters in order to analyze any collected data.
Our main focus should be on analyzing the data rather than trying to recognize
the characters. Apart from this, the manual recognition of characters may not
yield the right results since it may vary from person to person. Hence, it is
not accurate to a great extent and may take a lot of time and energy.
Algorithms using neural networks have made this task a lot easier and more
accurate. Therefore, neural networks have been utilized with an aim to
determine the characters by training a neural network. In this paper, we
discuss the recognition of handwritten digits taken from the MNIST data set and
check the accuracy of our implementation. This is done by training a neural
network using stochastic gradient descent and back propagation.
recognition, Backpropagation, Mini batch Stochastic Gradient
is a form of writing peculiar to a person with variations in size, shape of
letters, spacing between letters. There are different styles of handwriting
including cursive, block letters, calligraphy, signature etc. This makes the
task of recognizing handwritten characters complex when using traditional rule
based programming. The task becomes more natural when it is approached from a
machine learning perspective by using neural networks. According to Tom
Mitchell “A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.”1 A neural
network consists of neurons which are simple processing units and there are
directed, weighted connections between these neurons. For a neuron j,
propagation function receives the outputs of other neurons and transforms them
in consideration of the weights into the network input that can be further
processed by the activation function. 2Mini batch gradient descent used in
the paper is a combination of batch gradient descent and stochastic gradient
descent algorithms. It calculates model error rate by splitting data set into
small batches.The back propagation algorithm used in this paper is used for
adjusting the weights in the neural network. The algorithm works by comparing
the actual output and the desired output for a given input and calculates error
value. The weights are adjusted based on the error value. The error is first
calculated at the output layer and then distributed for the other layers.
Python is a dynamic, interpreted language. There
are no type declarations of variables, parameters, functions, or methods in
source code. This makes the code short and flexible, and you lose the
compile-time type checking of the source code. Python tracks the types of all
values at runtime and flags code that does not make sense as it runs.
• Open source general-purpose language.
• Object Oriented, Procedural, Functional
• Easy to interface with
• Fairly easy to interface with C++ (via
• Great interactive environment
Digit recognition is done by training a
multi-layer feedforward neural network by using mini batch stochastic gradient
descent and backpropagation algorithm.
The MNIST data set obtained from 3 contains a modified
version of the original training set of 60,000 images. The original training
set is split into a training set with 50,000 examples and a validation set with
10,000 examples. This set is then used to train the neural network. Each image is
represented as numpy 1-dimensional array of 784 float values between 0 and 1.
The labels are numbers between 0 and 9 indicating which digit the image
Figure 1 MNIST Data Set 4
artificial neural network with sigmoid neurons is implemented. Therefore, the
output of each neuron is calculated using the sigmoid function.
The output of each neuron is given as. Where, w is the weight, b is
the bias and x is the input.
Initially, the weights and biases of the neural
network are initialized randomly using Gaussian distribution. They are later
adjusted by applying mini batch stochastic gradient descent and
The training data is split into a number of mini
batches. In each epoch, the training data is shuffled and split into mini
batches of a fixed size and gradient descent is applied. The neural network is
trained for a number of epochs. The labels generated for the training data in
each epoch are compared to the actual labels and cost function is calculated.
The gradient of the cost function is calculated by using the backpropagation
algorithm. This calculated gradient is then used to update the weights and
biases of the neural network. Starting from the output layer and moving
backwards, the biases and weights between connections are adjusted. The digits are labelled based on which neuron
has the highest activation out of the output layer neurons.
After training the network during each epoch,
the trained network is tested using the 10,000 test images. The labels
generated by the neural network are compared to the class labels given in the
MNIST test data. The number of correctly generated labels is identified.
The above results are obtained when the number
of epochs is set to 30, the mini batch size is 10 and the learning rate is 3.0.
The accuracy is calculated by identifying the number of correctly identified
images out of the 10,000 test images in the MNIST data set. The given results
are taken as the best out of five trials.
The accuracy peaks at 95.00 % at the 28th
epoch. The accuracy increases rapidly in the beginning with each successive
epoch. The accuracy becomes steady after a certain point and it continues with
approximately the same accuracy.
Figure 2 Results
networks are an effective technique for identification of handwritten digits.
The accuracy of a neural network in handwriting recognition is quite high and
they can still achieve higher accuracy by optimizing certain parameters. In the
current implementation using mini batch stochastic gradient descent and back
propagation, an accuracy of 95% was obtained in one of the trial runs.
Learning: Hands-On for Developers and Technical Professionals by Jason Bell,
publisher John Wiley & Sons,2015, pages 1-2
2 A Brief Introduction to Neural Networks by David
Kriesel : http://www.dkriesel.com/_media/science/neuronalenetze-en-zeta2-2col-dkrieselcom.pdf