In spite of
current technological advances, there are not still algorithms allowing a
computer to transcript the content of any “difficult” handwritten document
(e.g. a historical document). The general handwriting recognition problem
presents many difficulties produced by interpersonal and intrapersonal
variations when writing, the cursive nature of handwriting, the use of
different pen types or the presence of paper with noisy background. It has been
studied and determined with scientific rigor the individuality of handwriting.
Regarding the handwriting recognition problem, there are two variants: offline
and online recognition. The offline problem consists in recognizing
handwritten text that has previously been written on paper, and then digitized.
The online handwriting problem aims to recognize the text that was written
using some kind of electronic device. The sensors of this device also record a
set of dynamic measures about how the act of writing is produced (e.g. writing
pressure, pen altitude and azimuth, among others). In recent years, there has
been more progress on the online modality but the offline one is still far to
be solved in an unrestricted manner.
Psychology can also get benefits from research on
handwriting style since it could be possible to identify correlations between
the handwriting and some personality attributes of the writer. In the field of
Human-Computer Interaction, if gender of a user can be automatically predicted,
the computer applications could offer him/her a more personalized interaction
(e.g. gender-oriented advertising). Biometric Security can also benefit from
handwriting prediction since this fact can be combined with other biometric
modalities in order to improve security when accessing computer systems.
These handwriting-based demographic prediction
problems include gender, handedness, age ranges or even nationality of a person.
This group of supervised learning problems can be considered as binary or
multi-class ones. The most common binary problems are gender prediction (where
handwriting texts can be classified as written by men or by women), and
handedness prediction (where handwriting texts can be classified as produced by
right-handed or by left-handed writers). Among the multi-class problems, one
can discriminate among texts written by people included in different age
intervals, in specific human races or even in groups of nationalities. A
property of all these problems is that they can be either balanced (i.e. where
approximately half of the population belong to each class) as in the case of
gender classification, or they can be unbalanced as it is the case of the
handedness classification. In general, these demographic classification
problems are very complex, even for humans, since it is quite difficult to find
which handwriting features properly characterize each involved class. An
example of this occurs in the classification of gender. Although it is accepted
that feminine writing is rounder and neater than masculine one, there are some
cases where masculine writing may have a “feminine” appearance and
vice versa. In this paper, we additionally aim to analyze the relationships
between the gender handwriting features.
There are relatively few works in the literature on these
problems which have been started to be investigated recently in an automatic
form. One important difficulty is that there are few handwriting databases with
annotated demographic information of the writers. Other aspects that hinder
this problem are similar to those presented by the general handwriting
recognition problem (e.g. cursive features).
Neural networks have been applied for many years in the
analysis of high-dimensional, nonlinear and complex classification problems, as
it is the case of automatic handwriting recognition. The handwriting problem
has been investigated since many years using different types of NN for both
online and offline cases, and even also for alphabets different from Latin.
Two main situations can be distinguished in the
automatic offline handwriting recognition of text. First, the recognition of
isolated characters which is actually solved with error rates lower than 1%.
Second, the recognition of groups of connected characters (e.g. words or text
patches), where the success rates are still far from this value. Traditionally,
continuous handwriting recognition from digitized documents followed a sequence
of stages including: preprocessing, segmentation, feature extraction and
classification. Handwritten character segmentation is a particularly complex
problem because it is sometimes impossible to determine where one letter ends
and where the next one begins. To overcome this difficulty, holistic methods
have been recently proposed, which handle each word as a whole. These solutions
were usually based on Hidden Markov Models (HMM) or Neural Networks (NN). In
recent years, this has changed with the emergence of algorithms that allow
training deep networks presenting multiple hidden layers which are able to
extract more complex and relevant features. Since each hidden layer computes a
non-linear transformation of the previous layer, a deep network can have
significantly greater representational capacity (i.e. it can learn more complex
functions) than a shallow network.