Handwritten digit recognition is a simple project that should help you learn how a convolutional neural network works. In this lesson, I built a model with TensorFlow and Tflearn.

What will you learn?

1. Collect the images

First of all, we have to collect images of digits (from 0 to 9). You can use the pyscreenshot package to gather images. It's downloadable via pip, which is Python Package Installer. Pip typically gets installed automatically when you install Python. To install pyscreenshot, open any terminal and enter the following command:

pip install pyscreenshot

I'm also importing time since I want to pause my execution for a second so that I can draw the following image (digit). Everything is clear in the code below, but some may have difficulty locating the exact bbox coordinate value. This coordinate value determines the part of your screen that will be captured. It was found through trial and error. You might also try using the same coordinate that I am using for your first run. However, because our laptop sizes vary, it is possible that it will be different for you and me. So, you have to utilize the trial and error method to identify the correct coordinate value.

def screen_capture():
    import pyscreenshot as ImageGrab #pip install pyscreenshot
    import time

    images_folder = "captured_images/0/"
    #images_folder = "new_images/"

    for i in range(5):
        time.sleep(5)
        im = ImageGrab.grab(bbox=(60, 170, 400, 550)) # X1,Y1,X2,Y2
        print ("saved....",i)
        im.save(images_folder+str(i)+'.png')
        print ("clear screen now and redraw again...")
#screen_capture()

2. Create data with a label

This part involves converting the image name into a numpy array. We then use OpenCV to create data that can be utilized for building a model.

import numpy as np

def create_label(image_name):
    """ Create an one-hot encoded vector from image name """ 
    if image_name == '0':  
        return np.array([1,0,0,0,0,0,0,0,0,0])
    elif image_name == '1':
        return np.array([0,1,0,0,0,0,0,0,0,0])
    elif image_name == '2':
        return np.array([0,0,1,0,0,0,0,0,0,0])
    elif image_name == '3':
        return np.array([0,0,0,1,0,0,0,0,0,0])
    elif image_name == '4':
        return np.array([0,0,0,0,1,0,0,0,0,0])
    elif image_name == '5':
        return np.array([0,0,0,0,0,1,0,0,0,0])
    elif image_name == '6':
        return np.array([0,0,0,0,0,0,1,0,0,0])
    elif image_name == '7':
        return np.array([0,0,0,0,0,0,0,1,0,0])
    elif image_name == '8':
        return np.array([0,0,0,0,0,0,0,0,1,0])
    elif image_name == '9':
        return np.array([0,0,0,0,0,0,0,0,0,1])
import os
import cv2
from random import shuffle
from tqdm import tqdm

def create_data():
    data = []
    for folder in tqdm(os.listdir("captured_images")):
        for img in os.listdir("captured_images/"+folder):
            path = os.path.join("captured_images",folder, img)
            img_data = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
            try:
                img_data = cv2.resize(img_data, (28,28))
            except cv2.error as e:
                continue
            data.append([np.array(img_data), create_label(folder)])
    shuffle(data)
    return data
data = create_data()
 
3. Dividing data into training and testing part

We use training data to create a model and testing data to verify it.

train = data[:800]
test = data[800:]
X_train = np.array([i[0] for i in train]).reshape(-1, 28,28, 1)
y_train = [i[1] for i in train]
X_test = np.array([i[0] for i in test]).reshape(-1, 28,28, 1)
y_test = [i[1] for i in test]

4. Building the model

This is an important step. You can create a model using the training and testing data. Training data trains our model whereas testing data validates our model.

import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
## Steps of Convolutional neural network:
# 1. Convolution layers
# 2. Normalization
# 3. Pooling
# 4. Fully connected
tf.reset_default_graph()
convnet = input_data(shape=[28,28, 1], name='input') 
convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 128, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)
convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.5) #prevent a model from overfitting
convnet = fully_connected(convnet, 10, activation='softmax')
convnet = regression(convnet, optimizer='adam', learning_rate=0.001, loss='categorical_crossentropy', name='targets')  
model = tflearn.DNN(convnet, tensorboard_verbose=1)  
model.fit({'input': X_train}, {'targets': y_train}, n_epoch=12,
          validation_set=({'input': X_test}, {'targets': y_test}),
          show_metric=True)

5. Let's predict and display using matplotlib

Not it's time to make a prediction. Let's see if our model predicts correctly or not. You can visualize the output using the Matplotlib library.

def create_test_data():
    data = []
    for img in tqdm(os.listdir("new_images")):
        path = os.path.join("new_images", img)
        img_num = img.split('.')[0] 
        img_data = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        try:
            img_data = cv2.resize(img_data, (28,28))
        except cv2.error as e:
            continue
        data.append([np.array(img_data), img_num])

    shuffle(data)
    return data
test_data = create_test_data()
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10,10))
for num, data in enumerate(test_data[:10]):
    img_data = data[0]
    y = fig.add_subplot(5,5, num + 1)
    orig = img_data
    data = img_data.reshape(28,28, 1)
    model_out = model.predict([data])
    str_label = "Prediction: " + str(np.argmax(model_out))

    y.imshow(orig, cmap='gray')
    plt.title(str_label)
    y.axes.get_xaxis().set_visible(False)
    y.axes.get_yaxis().set_visible(False)
plt.show()

Final output:

Conclusion:

In this way, we can make Handwritten digit recognition using Convolutional Neural Network. Just we need to collect images of digits, create data, then use data to create a model, and then the final step is to make a prediction.

I hope this project is very helpful to you. If you have any questions, don't hesitate to ask me in the comment section. I will reply as soon as possible. Thanks.