• Home
  • About
  • Join Us
  • Contact
Bharat Ideology
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
Bharat Ideology
No Result
View All Result
Home Science & Tech

Training Spacy Models on custom data using NER

by bharatideology
January 9, 2025
in Science & Tech
0
Training Spacy Models on custom data using NER
Share on FacebookShare on Twitter

Introduction

In this article we will learn how to use Transfer Learning to train our existing Spacy model which is failing to identify certain entities to start recognizing those words while we develop our NER model.

Related articles

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

Top 10 Generative AI Tools and Platforms Reshaping the Future

If you want to go deeper into how to train Spacy Model on custom data using NET then Coding exercises with live examples can be accessed at Code Demo.

Prepare Training Data

First import following packages:

from tqdm import tqdm
import spacy
from spacy.tokens import DocBin
Lets load structure for our basic Spacy Model for custom training 

nlp = spacy.blank("en") # For new model creation

nlp = spacy.load("en_core_web_lg") # To train the existing model with new parameters

Now lets define DocBin() object which store the related value and length of each word defined by our training dataset.

db = DocBin() 

Now lets prepare our training data in below format:

training_data = [
    ("John Water ",{'entities': [(0, 4, "PERSON")]}),
    ("Taj Mahal ", {'entities':[(0, 3, "NAME")]}),
]

Next we will write code to read above training data and store same in DocBin object format with their labels defined
for text, annot in tqdm(training_data): 
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot["entities"]: # add character indexes
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    doc.ents = ents # label the text with the ents
    db.add(doc)

In the last section for preparation of data we will save the docbin object

db.to_disk("./train.spacy") 

Preparing Config file for training of Spacy model on custom data
copy default_config.cfg from website https://spacy.io/usage/training#config 
And save it as base_config.cfg 

Now open this file and edit below parameters,

[nlp]
lang = en
pipeline = ["ner"]

[components]
components = ner
[components.ner]
source = "en_core_web_sm"

Now go to cmd or terminal and execute following command,

python3 -m spacy init fill-config base_config.cfg config.cfg

This will generate config.cfg file in the same folder

Training the Spacy Model

In order to train existing Spacy model with new parameters, we will execute below command from cmd or terminal

python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy

This will train the model and following would be displayed on cmd


/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
ℹ Using CPU

=========================== Initializing pipeline ===========================
Set up nlp object from config
Pipeline: ['ner']
Resuming training for: ['ner']
Created vocabulary
Finished initializing nlp object
Initialized pipeline components: []
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['ner']
ℹ Initial learn rate: 0.001
E    #       LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  --------  ------  ------  ------  ------
  0       0      4.57    0.00    0.00    0.00    0.00
200     200     29.14  100.00  100.00  100.00    1.00
.....
19800   19800      0.00  100.00  100.00  100.00    1.00
20000   20000      0.00  100.00  100.00  100.00    1.00
✔ Saved pipeline to output directory
output/model-last

The updated trained model will be saved in output/model-last folder.


Re-validating our input with trained model

Lets execute below code to validate if our trained model is able to give us new values.

import spacy
nlp_new = spacy.load(R"./output/model-best") 
tokens = nlp_new("Did you see John Water nearby ?") 
print([(X, X.ent_iob_, X.ent_type_) for X in tokens])

Output:
 
[(Did, 'O', ''), (you, 'O', ''), (see, 'O', ''), (John, 'B', 'PERSON'), (Water, 'O', ''), (nearby, 'O', ''),  (?, 'O', '')]

Tags: OCROpenCVspacy

bharatideology

Related Posts

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

by bharatideology
February 17, 2024
0

The year is 2024, and India stands at a crossroads. The ghosts of the "fragile five" label still linger in the collective memory, but a new...

Top 10 Generative AI Tools and Platforms Reshaping the Future

Top 10 Generative AI Tools and Platforms Reshaping the Future

by bharatideology
January 9, 2025
0

Generative AI, the technology that conjures new ideas and content from thin air, is taking the world by storm. From crafting captivating images to writing eloquent...

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

by bharatideology
January 9, 2025
0

Introduction The past year has witnessed an explosive eruption in the realm of Generative AI (Gen AI), propelling it from a nascent technology to a pivotal...

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

by bharatideology
January 9, 2025
0

The landscape of Large Language Models (LLMs) has witnessed a rapid evolution, with Google playing a pivotal role in pushing boundaries. Enter Gemini, Google's latest LLM,...

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

by bharatideology
January 9, 2025
0

Overview The world of recommendation systems is undergoing a paradigm shift, propelled by the convergence of Generative AI (GenAI) and Large Language Models (LLMs). These powerful...

CATEGORIES

  • Culture
  • Economics
  • Insight
  • Parenting
  • Science & Tech

RECOMMENDED

India’s Global Pledge: The Impact of COVID-19 Vaccines on Worldwide Health and Solidarity
Insight

India’s Global Pledge: The Impact of COVID-19 Vaccines on Worldwide Health and Solidarity

May 19, 2023
PM Surya Ghar Yojana: Shining a Light on Free Electricity and a Brighter Future
Insight

PM Surya Ghar Yojana: Shining a Light on Free Electricity and a Brighter Future

February 16, 2024

Twitter Handle

TAGS

Agnipath Ambedkar Panchteerth Artificial Intelligence Ayodhya Ayushman Bharat Backpropogation Bhagwan Birsa Munda Museum CNN CNN Architecture Co-win Computer Vision Consecration Deep Learning Digital India Digital Revolution FutureSkills PRIME GenAI Hornbill Festival Image Segmentation International Space Station LLM Make in India Namami Gange Narendra Modi Neural Network Object Detection OCR OpenCV PLI PM Modi PRASHAD Python Ramayana Ram Mandir Recurrent Neural Network RNN Sangai Festival Semiconductor Shri Ram Janambhoomi Temple Skill India Statue of Unity Swadesh Darshan Tensorflow Vaccine Maitri Women empowerement
Bharat Ideology

Do not be led by others,
awaken your own mind,
amass your own experience,
and decide for yourself your own path - Atharv Ved

© Copyright Bharat Ideology 2023

  • About
  • Disclaimer
  • Terms & Conditions
  • Contact
No Result
View All Result
  • About
  • Contact
  • Disclaimer
  • Home
  • Terms and Conditions of use

© Copyright Bharat Ideology 2023