• Home
  • About
  • Join Us
  • Contact
Bharat Ideology
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
Bharat Ideology
No Result
View All Result
Home Science & Tech

Text Detection in OCR using CTPN

by bharatideology
January 10, 2025
in Science & Tech
0
Text Detection in OCR using CTPN
Share on FacebookShare on Twitter

Introduction

Text Detection is a major problem in optical character recognition (OCR) and there are various solutions attempted by different researchers. Connectionist Text Proposal Network (CTPN) being most successful in detection of Scenic Text detection that’s detection of text from images where background is normal street or billboard and model needs to detect text from that image. Later on, is realized that the same model can be very useful in text detection from scanned images as well, and that started the journey of CTPN implementation for text detection in OCR.

Why the Traditional Approach needs a revamp? 

Current approaches for text detection mostly employ a bottom-up pipeline which implies that it starts from low-level character or stroke detection, which is typically followed by a number of subsequent steps: non-text component filtering, text line construction and text line verification. These multi-step bottom-up approaches are generally complicated with less robustness and reliability and are thus not widely adopted for text detection.

Related articles

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

Top 10 Generative AI Tools and Platforms Reshaping the Future

In addition to this, their performance is heavily dependent on the results of character detection, and connected-components methods or sliding-window methods that have been proposed. These methods commonly explore low-level features to distinguish text candidates from background. However, they are not robust by identifying individual strokes or characters separately, without context information. These limitations lead to a large number of non-text components in character detection, causing main difficulties for handling them in following steps. Furthermore, these false detections are easily accumulated sequentially in bottom-up pipeline.

New Approach

New approach defined in this paper directly localizes text sequences in convolutional layers. This overcomes a number of main limitations raised by previous bottom-up approaches building on character detection.

Let’s now look at the key points of this approach:

  • The problem of text detection is being solved by localizing a sequence of fine scale text proposals. An anchor regression mechanism is developed that jointly predicts vertical location and text/non-text score of each text proposal, resulting in an excellent localization accuracy. This departs from the RPN prediction of a whole object, which is difficult to provide a satisfied localization accuracy.
  • The approach proposes an in-network recurrence mechanism that elegantly connects sequential text proposals in the convolutional feature maps. This connection allows our detector to explore meaningful context information of text line, making it powerful to detect extremely challenging text reliably.
  • This method is able to handle multi-scale and multi-lingual text in a single process, avoiding further post filtering or refinement.

From a computation perspective, this approach has been able to achieve new state-of-the-art results on a number of benchmarks, significantly improving recent results (e.g., 0.88 F-measure over 0.83 on the ICDAR 2013, and 0.61 F-measure over 0.54 on the ICDAR 2015). Furthermore, it is computationally efficient, resulting in a 0.14s/image running time (on the ICDAR 2013) by using the very deep VGG16 model.

Connectionist Text Proposal Network (CTPN)

CTPN is essentially a fully convolutional network that allows an input image of arbitrary size. It detects a text line by densely sliding a small window in the convolutional feature maps, and outputs a sequence of fine-scale (e.g., fixed 16-pixel width) text proposals.

If we look at the CTPN Architecture diagram above, below are the key highlights:

  • We densely slide a 3×3 spatial window through the last convolutional maps (conv5) of the VGG16 model.
  • The sequential windows in each row are recurrently connected by a Bi-directional LSTM (BLSTM), where the convolutional feature (3×3×C) of each window is used as input of the 256D BLSTM (including two 128D LSTMs).
  • The RNN layer is connected to a 512D fully-connected layer, followed by the output layer, which jointly predicts text/non-text scores, y-axis coordinates and side-refinement offsets of k anchors.
  • The CTPN outputs sequential fixed-width fine-scale text proposals. Color of each box indicates the text/non-text score. Only the boxes with positive scores are presented.

If you want to go deeper into CTPN and learn how you can do training as well using ICDAR SIROE Dataset then you can learn it with Live Demo.

CTPN Implementation

Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be found here. And github code can be downloaded from this Github link

Steps for CTPN Implementation:

Step 1. Clone the repository using following command in Ubuntu environment

git clone https://github.com/indiantechwarrior/text-detection-ctpn.git

Step 2. cd text-detection-ctpn

pip install -r requirements.txt (or pip3 install -r requirements.txt)

Make sure you have tensorflow==1.15.0 installed, command for same is:

pip install tensorflow==1.15.0  (or pip3 install tensorflow==1.15.0)

Step 3. cd utils/bbox

chmod +x make.sh

./make.sh

This will generate nms.so and a bbox.so in current folder

Running CTPN with pretrained model

Step 1.  Download checkpoints file. The checkpoints file can be downloaded from the Google Drive  

https://drive.google.com/file/d/1HcZuB_MHqsKhKEKpfF1pEU85CYy4OlWO/view

Step 2. Put checkpoints_mlt/ in text-detection-ctpn/ and images in data/demo

Step 3. Execute the demo python file    python ./main/demo.py

Step 4. the results will be saved in data/res

Training own CTPN model

Step 1. Download the pre-trained model of VGG16 and put it in data/vgg_16.ckpt. you can download it from tensorflow/models

Step 2. Download the dataset we prepared from google drive. Put the downloaded data in data/dataset/mlt, then start the training.

Step 3. Also, you can prepare your own dataset according to the following steps. However, if you want to use existing dataset then skip to Step 7

Step 4. Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root

python ./utils/prepare/split_label.py

it will generate the prepared data in data/dataset/

Step 5. The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. 

Step 6. Modify path for DATA_FOLDER in in utils/dataset/data_provider.py 

Step 7. For using pre- trained checkpoints and training your dataset on top of it, update max_steps (60000) in main/train.py to higher number, adding 10000 normally is a good starting point.

 Step 8. Execute the python code

python ./main/train.py

Step 9.  Post completion of training you will find updated checkpoints which can than be utilized for training, note for using new checkpoints don’t forget to update filename in ‘checkpoint’ file (originally this came in handy with downloaded pre- trained checkpoints)

Now you can run python ./main/demo.py to validate result on updated checkpoints

Tags: CTPNOCRText Detection

bharatideology

Related Posts

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

by bharatideology
February 17, 2024
0

The year is 2024, and India stands at a crossroads. The ghosts of the "fragile five" label still linger in the collective memory, but a new...

Top 10 Generative AI Tools and Platforms Reshaping the Future

Top 10 Generative AI Tools and Platforms Reshaping the Future

by bharatideology
January 9, 2025
0

Generative AI, the technology that conjures new ideas and content from thin air, is taking the world by storm. From crafting captivating images to writing eloquent...

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

by bharatideology
January 9, 2025
0

Introduction The past year has witnessed an explosive eruption in the realm of Generative AI (Gen AI), propelling it from a nascent technology to a pivotal...

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

by bharatideology
January 9, 2025
0

The landscape of Large Language Models (LLMs) has witnessed a rapid evolution, with Google playing a pivotal role in pushing boundaries. Enter Gemini, Google's latest LLM,...

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

by bharatideology
January 9, 2025
0

Overview The world of recommendation systems is undergoing a paradigm shift, propelled by the convergence of Generative AI (GenAI) and Large Language Models (LLMs). These powerful...

CATEGORIES

  • Culture
  • Economics
  • Insight
  • Parenting
  • Science & Tech

RECOMMENDED

India’s Economic Future: A Bright Outlook
Economics

India’s Economic Future: A Bright Outlook

July 26, 2023
Khadi: The Sustainable and Ethical Fabric of the Future
Culture

Khadi: The Sustainable and Ethical Fabric of the Future

July 25, 2023

Twitter Handle

TAGS

Agnipath Ambedkar Panchteerth Artificial Intelligence Ayodhya Ayushman Bharat Backpropogation Bhagwan Birsa Munda Museum CNN CNN Architecture Co-win Computer Vision Consecration Deep Learning Digital India Digital Revolution FutureSkills PRIME GenAI Hornbill Festival Image Segmentation International Space Station LLM Make in India Namami Gange Narendra Modi Neural Network Object Detection OCR OpenCV PLI PM Modi PRASHAD Python Ramayana Ram Mandir Recurrent Neural Network RNN Sangai Festival Semiconductor Shri Ram Janambhoomi Temple Skill India Statue of Unity Swadesh Darshan Tensorflow Vaccine Maitri Women empowerement
Bharat Ideology

Do not be led by others,
awaken your own mind,
amass your own experience,
and decide for yourself your own path - Atharv Ved

© Copyright Bharat Ideology 2023

  • About
  • Disclaimer
  • Terms & Conditions
  • Contact
No Result
View All Result
  • About
  • Contact
  • Disclaimer
  • Home
  • Terms and Conditions of use

© Copyright Bharat Ideology 2023