• Home
  • About
  • Join Us
  • Contact
Bharat Ideology
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
Bharat Ideology
No Result
View All Result
Home Science & Tech

Text Detection in OCR using EAST

by bharatideology
January 10, 2025
in Science & Tech
0
Text Detection in OCR using EAST
Share on FacebookShare on Twitter

Introduction

Text Detection is a major problem in optical character recognition (OCR) and there are various solutions attempted by different researchers. EAST is another attempt made for detection of Scenic Text detection that’s detection of text from images where background is normal street or billboard and model needs to detect text from that image. Later on, is realized that the same model can be very useful in text detection from scanned images as well.

EAST Model

EAST is a deep learning-based algorithm (EAST) that detects text with a single neural network with the elimination of multi-stage approaches. The key component in this proposed algorithm is a neural network model, which is trained to directly predict the existence of text instances and their geometries from full images. Also, since this model has been designed as a fully-convolutional neural network adapted for text detection, it outputs dense per-pixel predictions of words or text lines thus eliminating the need of intermediate steps which were there in traditional models such as candidate proposal, text region formation and word partition.

Related articles

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

Top 10 Generative AI Tools and Platforms Reshaping the Future

The key highlights of EAST (Efficient and Accurate Scene Text Detector) are as follows:

• They propose a scene text detection method that consists of two stages: a Fully Convolutional Network and an NMS merging stage. The FCN directly produces text regions, excluding redundant and time-consuming intermediate steps.

• The pipeline is flexible to produce either word level or line level predictions, whose geometric shapes can be rotated boxes or quadrangles, depending on specific applications.

• The proposed algorithm significantly outperforms state-of-the-art methods in both accuracy and speed.

If you want to go deeper into EAST and learn how you can do training as well using ICDAR SIROE Dataset then you can learn it with Live Demo.

these models work and are implemented then Coding exercises with live examples can be accessed at Code Implementation of Object Detection using CNN.

EAST Methodology

The key component in this proposed algorithm is a neural network model, which is trained to directly predict the existence of text instances and their geometries from full images. Also, since this model has been designed as a fully-convolutional neural network adapted for text detection, it outputs dense per-pixel predictions of words or text lines thus eliminating the need of intermediate steps which were there in traditional models such as candidate proposal, text region formation and word partition. Even, the post-processing steps only include thresholding and NMS on predicted geometric shapes.

If we look at the EAST network architecture diagram on screen, it has 3 branches that combine into a single neural network.

These branches are:

Feature Extractor Stem – It is used to extract features from different layers of the network. Within this, the stem can be a convolutional network pre-trained on ImageNet dataset, with interleaving convolution and pooling layers. As we can see in the diagram, four levels of feature maps, denoted as f1, f2, f3 and f4 are extracted from the stem. 

Feature Merging Branch – This branch of the EAST network merges the feature outputs from a different layer of the VGG16 network. The input image is passed through the VGG16 model and outputs from different four layers of VGG16 are taken and then feature maps are merged using U-net architecture. So, h1, h2, h3 and h4 are the merged feature maps.

Output Layer – The output layer consists of a score map and a geometry map. The score map tells us the probability of text in that region while the geometry map defines the boundary of the text box. This geometry map can be either a RBOX also called as rotated box or QUAD that is quadrangle. A rotated box consists of top-left coordinate, width, height and rotation angle for the text box. While quadrangle consists of all four coordinates of a rectangle.

EAST Implementation

This implementation has the following features:

  • Only RBOX part is implemented
  • A fast Locality-Aware NMS in C++ provided by the paper’s author
  • The pre-trained model provided achieves 80.83 F1-score on ICDAR 2015 Incidental Scene Text Detection Challenge using only training images from ICDAR 2015 and 2013

There are few differences from paper in the final implementation of paper which has been listed below:

  • Use dice loss (optimize IoU of segmentation) rather than balanced cross entropy
  • Use ResNet-50 rather than PVANETUse linear learning rate decay rather than staged learning rate decay

We will now understand the steps that we need to follow for EAST Implementation:

Step 1. Clone the repository using following command in Ubuntu environment,

git clone https://github.com/indiantechwarrior/EAST

Step 2. cd EAST

pip install -r requirements.txt (or pip3 install -r requirements.txt)

Make sure you have tensorflow==1.15.0 installed, command for same is,

pip install tensorflow==1.15.0  (or pip3 install tensorflow==1.15.0)

Running EAST with pretrained model

Step 1.  Download pre-trained model

The Pre-Trained Model can be downloaded from Google Drive 

https://drive.google.com/file/d/0B3APw5BZJ67ETHNPaU9xUkVoV0U/view

Step 2. Create a tmp folder, unzip the Pre-Trained model in this, create ‘images’ and ‘output’ folder within tmp folder

Step 3. Execute the python code 

python eval.py --test_data_path=/content/EAST/tmp/images/ --gpu_list=0 --checkpoint_path==/content/EAST/tmp/east_icdar2015_resnet_v1_50_rbox/  --output_dir=/content/EAST/tmp/output/

Step 4. the results will be saved in tmp/output

Training own EAST model

Download
  1. Models trained on ICDAR 2013 (training set) + ICDAR 2015 (training set):  GoogleDrive
  2. Resnet V1 50 provided by tensorflow slim: slim resnet v1 50
Train

If you want to train the model, you should provide the dataset path, in the dataset path, a separate gt text file should be provided for each image and run

python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=8 --checkpoint_path=/content/EAST/tmp/east_icdar2015_resnet_v1_50_rbox/ --text_scale=512 --training_data_path=/content/EAST/data/sroie_train/ --geometry=RBOX --learning_rate=0.0001 --num_readers=4 --pretrained_model_path=/content/EAST/data/resnet_v1_50.ckpt

If you have more than one gpu, you can pass gpu ids to gpu_list(like –gpu_list=0,1,2,3)

Note: You should change the gt text file of icdar2015’s filename to img_*.txt instead of gt_img_*.txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. See the examples in training_samples/

This executable line is updated with reduced num_readers and batch_size_per_gpu, and code is running on top of pre-trained checkpoints. Update current checkpoint file path in ‘checkpoint’ file in east_icdar2015_resnet_v1_50_rbox folder

Step 1. Download the pre-trained model of VGG16 and put it in data/vgg_16.ckpt. you can download it from tensorflow/models

Step 2. Download the dataset we prepared from google drive put the downloaded data in data/dataset/mlt, then start the training.

Step 3. Also, you can prepare your own dataset according to the following steps:

Step 4. Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root

python ./utils/prepare/split_label.py

it will generate the prepared data in data/dataset/

Step 5. The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. 

Step 6. Modify path for DATA_FOLDER in in utils/dataset/data_provider.py 

Step 7. For using pre- trained checkpoints and training your dataset on top of it, update max_steps (60000) in main/train.py to higher number, adding 10000 normally is a good starting point.

 Step 8. Execute

python ./main/train.py

Step 9.  Post completion of training you will find updated checkpoints which can than be utilized for training, note for using new checkpoints don’t forget to update filename in ‘checkpoint’ file (originally this came in handy with downloaded pre- trained checkpoints)

Now you can run python ./main/demo.py to validate result on updated checkpoints

Tags: EASTEAST ImplementationOCRText Detection

bharatideology

Related Posts

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

by bharatideology
February 17, 2024
0

The year is 2024, and India stands at a crossroads. The ghosts of the "fragile five" label still linger in the collective memory, but a new...

Top 10 Generative AI Tools and Platforms Reshaping the Future

Top 10 Generative AI Tools and Platforms Reshaping the Future

by bharatideology
January 9, 2025
0

Generative AI, the technology that conjures new ideas and content from thin air, is taking the world by storm. From crafting captivating images to writing eloquent...

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

by bharatideology
January 9, 2025
0

Introduction The past year has witnessed an explosive eruption in the realm of Generative AI (Gen AI), propelling it from a nascent technology to a pivotal...

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

by bharatideology
January 9, 2025
0

The landscape of Large Language Models (LLMs) has witnessed a rapid evolution, with Google playing a pivotal role in pushing boundaries. Enter Gemini, Google's latest LLM,...

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

by bharatideology
January 9, 2025
0

Overview The world of recommendation systems is undergoing a paradigm shift, propelled by the convergence of Generative AI (GenAI) and Large Language Models (LLMs). These powerful...

CATEGORIES

  • Culture
  • Economics
  • Insight
  • Parenting
  • Science & Tech

RECOMMENDED

Tensorflow Pooling layers in Convolutional Neural Network
Science & Tech

Tensorflow Pooling layers in Convolutional Neural Network

January 9, 2025
How to work with CNN Max Pooling Layers in TensorFlow
Science & Tech

How to work with CNN Max Pooling Layers in TensorFlow

January 12, 2025

Twitter Handle

TAGS

Agnipath Ambedkar Panchteerth Artificial Intelligence Ayodhya Ayushman Bharat Backpropogation Bhagwan Birsa Munda Museum CNN CNN Architecture Co-win Computer Vision Consecration Deep Learning Digital India Digital Revolution FutureSkills PRIME GenAI Hornbill Festival Image Segmentation International Space Station LLM Make in India Namami Gange Narendra Modi Neural Network Object Detection OCR OpenCV PLI PM Modi PRASHAD Python Ramayana Ram Mandir Recurrent Neural Network RNN Sangai Festival Semiconductor Shri Ram Janambhoomi Temple Skill India Statue of Unity Swadesh Darshan Tensorflow Vaccine Maitri Women empowerement
Bharat Ideology

Do not be led by others,
awaken your own mind,
amass your own experience,
and decide for yourself your own path - Atharv Ved

© Copyright Bharat Ideology 2023

  • About
  • Disclaimer
  • Terms & Conditions
  • Contact
No Result
View All Result
  • About
  • Contact
  • Disclaimer
  • Home
  • Terms and Conditions of use

© Copyright Bharat Ideology 2023