whatsapp

whatsApp

Have any Questions? Enquiry here!
☎ +91-9972364704 LOGIN BLOG
× Home Careers Contact
Back
OCR HANDWRITTEN TEXT RECOGNITION USING IMAGE PROCESSING
OCR HANDWRITTEN TEXT RECOGNITION USING  IMAGE PROCESSING

OCR HANDWRITTEN TEXT RECOGNITION USING  IMAGE PROCESSING

Abstract :

Image Processing is a vital tool when one is dealing with several images and wishes to perform several complex actions on the same. With advances in technologies, one can now compress, manipulate, extract required information, etc. from any image one wants to. One such application of Image processing is detecting handwritten text and converting it to a digital text format. The main objective is to bridge the gap between the actual bit of paper and the digital world and in doing so, one can operate on the digital data much faster as compared to the actual data. Hence, in this paper, we aim to implement the detection of handwritten text via Optical Character Recognition (OCR). The entire paper will be implemented on Tensor Flow. This research work has  also analyzed various results and taken appropriate dataset to train the model. Further, the importance of this paper lies in the fact that it can facilitate and open various unexplored davenues. The key novelty of the paper lies in the fact that the data-set used is comprehensive which helps us to produce better result. In addition, the paper successfully analyzes handwritten scripts and extracts it in digital form. Analyzing the text can help combat forgery, understand certain temperaments of the person writing the text, and so on. Coupled with this, this paper has successfully implemented an improved version as compared to the pre-existing solutions by using the convergence of convoluted neural networks (CNN) and the Recurrent Neural Network (RNN).

Introduction:

Humans have constantly been evolving and working towards making their lives better. Technology forms one of those aspects, wherein the humans continuously make innovations and advancements to improve both the user experience and perform complex tasks in a very short span of time. Coupled with this, the internet penetration has increased by leaps and bounds. Since the inception of the World Wide Web [WWW], the number of users of the internet has been increasing at a striking rate. Commensurate to this increase, a lot of data has been digitized. This digitization has enabled a seamless transmission of data in various forms. This further enables us to extract a ton of information both in a very short span of time and efficiently. When one has digital data one can manipulate it according to one’s requirement and can arrive at results, which earlier used to take a lot of time, in a matter of seconds. Converting handwriting to digital data can be characterized into the above category. This conversion opens a myriad of avenues for us and can have its own wide range of applications

 

It is important for the converted data i.e in most cases digital text, to be in a palpable and understandable format, for the user to be able to make full use of the same. Hence, this paper has converted it into a digital text which is fairly easy to comprehend. Using optical character recognition (OCR), it aims to achieve this task. A Neural Network (NN) model is devised to be trained on the dataset. This neural network model will consist of various layers as discussed in detail afterward. The image of the word will act as the input to the entire model and pass through the several layers, eventually to come out as digital text data. Since the data-set chosen is a fairly exhaustive one, the training will also be fairly sufficient to keep the accuracy of the model satisfactory. Although this is an assumption, this have strengthened the proposition by bolstering the work in this paper with some performance metrics. This will help us to deduce the exact accuracy of the model and hence would indicate certain areas for further research. The speed of computing the same is also analyzed and kept in mind for comparison.

Objective:

The main aim of this project to predict the OCR HANDWRITTEN TEXT RECOGNITION using Image processing  techniques and algorithms  like CNN and RNN.  It is written in Python, using libraries such as numpy, Tensorflow, OpenCV, Tesseract and Keras. extracted by the website link entered by the user in the front end.

Problem Statement

As discussed before the main aspect of solving the limitations is the usage of NN. Text is an arbitrary sequence of characters, and for those reasons one requires a higher accuracy. This problem is efficiently solved by using Recurrent Neural Network (RNN).

Proposed System:

Currently, this end-to-end system is a prototype that is only able to detect a subset of the English language. Below is a list of improvements that we can implement when given more time:

·         Design and implement a custom detector that does not rely on Tesseract. This allows us to customize how we detect words and characters for the classifier.

·         Train the neural network so that it does not classify by words but by characters. This allows the system to recognize and translate entire documents without relying on a dictionary of words. This reduces the complexity and training time for the network since the total size of the English alphabet is much smaller than the dictionary of  English words.

·         Implement a neural network rule set that predicts words when given combinations of characters. This may increase the complexity of the network, and conflict with the second above point.

System Requirements:

System Requirements:

Hardware Requirement:-

      System :Pentium IV 2.4 GHz.

      Hard Disk : 500 GB.

      Ram : 4 GB.

      Any desktop / Laptop system with above configuration or higher level.

 

Software Requirements:-

      Operating system : Windows XP / 7

      Coding Language :Python, HTML

      Version       :Python 3.6.8

      IDE             : Python 3.6.8 IDLE

      ML Packages    :Numpy, Pandas ,Sklearn, Matplotlib, Seaborn, Flask, PymySql.

      Image processing  Algorithms: CNN AND RNN

      Other Requirements : Notepad, XAMPP Control Panel

Methodology

 

·         A OCR Handwritten Dataset is taken.

 

·         The dataset is loaded and preprocessed with various Image processing  techniques.

 

·         The preprocessed data is divided as training and testing data.

 

·         The prediction model is built using Image processing algorithms like  CNN and RNN.

 

·         The model is trained using training dataset and once the model has been trained successfully it has to be tested.

 

·         The trained model is tested using testing dataset and accuracy is calculated.

 

·         The algorithm which gives the best accuracy is taken as our final prediction model.

 

·         The finalized model is converted into pickle model (binary format data) and saved.

 

·         A Front End is developed with the help of Flask and HTML.

 

·         Now user will enter the website link in the front end.

 

 

·         The extracted parameters of the user entered website link in the front end are given as input to our finalized algorithm to predict whether the user entered website is phishing website or not.

 

·         Finally the predicted output is displayed on the front end.

Build a Handwritten Text Recognition System using TensorFlow | by Harald  Scheidl | Towards Data Science 

Summary

 

Our project consists of a model trained to convert the handwriting in images to text. The model will be trained on a desktop GPU to speed up the training. The model will then be saved and migrated to a server for use in a Flask web application. The application will be hosted on one of our laptops, but will be capable of being hosted on a service such as Heroku. Client-side, the page will upload images to the server, then output the results of the server’s analysis of the image. By using Tesseract and OpenCV to detect distinct blocks of text, we can output the text for each block separately. Client-side functionality could potentially be expanded by capturing photos from the user’s webcam, or even by saving handwriting drawn on the page.

 

Popular Coures