• Home
  • About
  • Join Us
  • Contact
Bharat Ideology
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
  • Insight
  • Culture
  • Economics
  • Parenting
  • Science & Tech
Subscribe
No Result
View All Result
Bharat Ideology
No Result
View All Result
Home Science & Tech

An Insight into YOLO Deep Learning

by bharatideology
January 12, 2025
in Science & Tech
0
An Insight into YOLO Deep Learning

YOLO - You Live Only Once - hand drawn doodle lettering

Share on FacebookShare on Twitter

Object detection algorithms enable many advanced technologies and are a primary research focus for many industries from transportation to healthcare. For example, a common use for object detection algorithms is to implement them in sensors, such as Lidar, in the systems of autonomous cars to enable self-driving.

There are several object detection algorithms with different capabilities. These algorithms are mostly split into two groups according to how they perform their tasks.

Related articles

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

Top 10 Generative AI Tools and Platforms Reshaping the Future

The first group is composed of algorithms based on classification and work in two stages. First, they select the interesting parts of the image, and then they classify objects within those regions using Convolutional Neural Networks (CNN) . This group, which includes solutions such as R-CNN, is usually too slow to be applied in real-time situations.

The algorithms in the second group are based on regression, they scan the whole image and make predictions to localize, identify and classify objects within the image. Algorithms in this group, such as You Only Look Once (YOLO), are faster and can be used for real-time object detection. If you want to train a deep learning algorithm for object detection, you need to understand the different solutions available to you and know which one better suits your needs. Read this article to learn why YOLO is a better overall solution for real-time object detection.

What Is YOLO Object Detection?

You Only Look Once (YOLO) is a network that uses Deep Learning (DL) algorithms for object detection. YOLO performs object detection by classifying certain objects within the image and determining where they are located on it.

For example, if you input an image of a herd of sheep into a YOLO network, it will generate an output of a vector of bounding boxes for each individual sheep and classify it as such.

How YOLO improves over previous object detection methods-

Previous object detection methods like Region-Convolutional Neural Networks (R-CNN), including other variations of it like fast R-CNN, performed object detection tasks in a pipeline of multi-step series. R-CNN focuses on a specific region within the image and trains each individual component separately. This process requires the R-CNN to classify 2000 regions per image, which makes it very time-consuming (47 seconds per individual test image). Thus it, cannot be implemented in real-time. Additionally, R-CNN uses a fixed selective algorithm, which means no learning process occurs during this stage so the network might generate an inferior region proposal.

This makes object detection networks such as R-CNN harder to optimize and slower compared to YOLO. YOLO is much faster (45 frames per second) and easier to optimize than previous algorithms, as it is based on an algorithm that uses only one neural network to run all components of the task. To gain a better understanding of what YOLO is, we first have to explore its architecture and algorithm.

YOLO Architecture – Structure Design and Algorithm Operation

A YOLO network consists of three main parts. First, the algorithm, also known as the predictions vector. Second, the network. Third, the loss function.

The YOLO Algorithm

Once you insert input an image into a YOLO algorithm, it splits the images into an SxS grid that it uses to predict whether the specific bounding box contains the object (or parts of it) and then uses this information to predict a class for the object.

Before we can go into details and explain how the algorithm functions, we need to understand how the algorithm builds and specifies each bounding box. The YOLO algorithm uses four components and additional value to predict an output.

  1. The center of a bounding box (bx by)
  2. Width (bw)
  3. Height (bh)
  4. The Class of the object (c)

The final predicted value is confidence (pc). It represents the probability of the existence of an object within the bounding box. The (x,y) coordinates represent the center of the bounding box.

Typically, most of the bounding boxes will not contain an object, so we need to use the pc prediction. We can use a process called non-max suppression to remove unnecessary boxes with low probability to contain objects and those who share big areas with other boxes.

The Network

A YOLO network is structured like a regular CNN, it contains convolutional and max-pooling layers and then two fully connected CNN  layers.

The Loss Function

We only want one of the bounding boxes to be responsible for the object within the image since the YOLO algorithm predicts multiple bounding boxes for each grid cell. To achieve this, we use the loss function to compute the loss for each true positive. To make the loss function more efficient, we need to select the bounding box with the highest Intersection over Union (IoU) with the ground truth. This method improves predictions by making specialized bounding boxes which improves the predictions for some aspect ratios and sizes.

Comparing YOLO Versions – YOLO V1 vs YOLO V2 vs YOLO V3

The most current version of YOLO is the third iteration of the object detection network. The creators of YOLO designed new versions so to make improvements over previous versions, mostly focusing on improving the detection accuracy.

YOLO V1

The first version of YOLO was introduced in 2015, it used a limited Darknet framework that trained on ImageNet-1000 dataset. This dataset has many limitations and restricts the usability of YOLO V1. Namely, YOLO V1 struggled to identify small objects that appeared as a cluster and was inefficient at generalizing objects in images that had different dimensions than the trained image. This resulted in poor localization of objects within the input image.

YOLO V2

YOLO V2 was released in 2016 with the name YOLO9000. YOLO V2 used darknet-19, a 19-layer network with 11 more layers charged with object detection. YOLO V2 is designed to take on the Faster R-CNN and Single Shot multi-box Detector (SSD) which showed better object detection scores.

YOLO V2 upgrades over YOLO V1 include:

  • Improved mean average precision (MAP)—the new higher resolution classifier increased input size from 224*224 in YOLO V1 to 448*448 and improved the MAP.
  • Better detection of smaller objects—divides the image into a smaller 13*13 grid cells to improve localization and identification of smaller objects in the image.
  • Improved detection within images of varying sizes—training the algorithm with random images of different dimensions to improve the network’s prediction accuracy of objects from input images of different dimensions.
  • Anchor boxes—provides a single framework for classification and the prediction of bounding boxes. Anchor boxes are designed for specific datasets using k-means clustering.

YOLO V3

YOLO V3 is an incremental upgrade over YOLO V2, which uses another variant of Darknet. This YOLO V3 architecture consists of 53 layers trained on Imagenet and another 53 tasked with object detection which amounts to 106 layers. While this has dramatically improved the accuracy of the network, it has also reduced the speed from 45 fps to 30 fps.

YOLO V3 upgrades over YOLO V2 include:

  • Improved bounding box prediction—uses logistic regression to predict to make a prediction score for all the objects within each bounding box.
  • More accurate class predictions—the softmax which has been used for YOLO V2 has been replaced with logistic classifiers for each class for multi-labeling purposes.
  • Improved abilities at different scales—makes 3 predictions for every location within the input image to allow for upsampling from previous layers to get fine-grained information and full semantic information and improve the quality of the output.
Tags: YOLOYOLO V1YOLO V2YOLO V3

bharatideology

Related Posts

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

India’s Digital Revolution: A Quantum Leap Towards a $5 Trillion Dream

by bharatideology
February 17, 2024
0

The year is 2024, and India stands at a crossroads. The ghosts of the "fragile five" label still linger in the collective memory, but a new...

Top 10 Generative AI Tools and Platforms Reshaping the Future

Top 10 Generative AI Tools and Platforms Reshaping the Future

by bharatideology
January 9, 2025
0

Generative AI, the technology that conjures new ideas and content from thin air, is taking the world by storm. From crafting captivating images to writing eloquent...

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

Decoding the Future: Gen AI’s Evolution in 2024 – Trends, Strategies, and Business Impact

by bharatideology
January 9, 2025
0

Introduction The past year has witnessed an explosive eruption in the realm of Generative AI (Gen AI), propelling it from a nascent technology to a pivotal...

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

Will Gemini be the AI to Rule Them All? Exploring the Rise of Google’s Multimodal Colossus

by bharatideology
January 9, 2025
0

The landscape of Large Language Models (LLMs) has witnessed a rapid evolution, with Google playing a pivotal role in pushing boundaries. Enter Gemini, Google's latest LLM,...

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

GenAI, LLMs, and Vector Databases: Revolutionizing Recommendation Systems in 2024

by bharatideology
January 9, 2025
0

Overview The world of recommendation systems is undergoing a paradigm shift, propelled by the convergence of Generative AI (GenAI) and Large Language Models (LLMs). These powerful...

CATEGORIES

  • Culture
  • Economics
  • Insight
  • Parenting
  • Science & Tech

RECOMMENDED

Robotics and AI: The Skills Kids Need to Succeed in the 21st Century
Parenting

Robotics and AI: The Skills Kids Need to Succeed in the 21st Century

August 2, 2023
Tensorflow Reinforcement Learning
Science & Tech

Tensorflow Reinforcement Learning

January 12, 2025

Twitter Handle

TAGS

Agnipath Ambedkar Panchteerth Artificial Intelligence Ayodhya Ayushman Bharat Backpropogation Bhagwan Birsa Munda Museum CNN CNN Architecture Co-win Computer Vision Consecration Deep Learning Digital India Digital Revolution FutureSkills PRIME GenAI Hornbill Festival Image Segmentation International Space Station LLM Make in India Namami Gange Narendra Modi Neural Network Object Detection OCR OpenCV PLI PM Modi PRASHAD Python Ramayana Ram Mandir Recurrent Neural Network RNN Sangai Festival Semiconductor Shri Ram Janambhoomi Temple Skill India Statue of Unity Swadesh Darshan Tensorflow Vaccine Maitri Women empowerement
Bharat Ideology

Do not be led by others,
awaken your own mind,
amass your own experience,
and decide for yourself your own path - Atharv Ved

© Copyright Bharat Ideology 2023

  • About
  • Disclaimer
  • Terms & Conditions
  • Contact
No Result
View All Result
  • About
  • Contact
  • Disclaimer
  • Home
  • Terms and Conditions of use

© Copyright Bharat Ideology 2023