- Computer Vision Before Deep Learning
- Anatomy of Convolutional Neural Networks
- Classic CNN Architecture
- Vanishing Gradients and The Degradation Problem
- Skip Connections
- ResNet in Action
In December of 2015, a paper was published that rocked the deep learning world.
This paper is widely regarded as one of the most influential papers in modern deep learning and has been cited over 110,000 times.
The name of this paper?
Deep Residual Learning for Image Recognition (aka, the ResNet paper).
The prevailing wisdom of the time suggested adding more layers to neural networks would lead to better results.
But researchers observed that the accuracy of deep networks would increase up to a saturation point before levelling off.
In addition to that, an unusual phenomenon was observed: Adding layers to an already deep network, the training error would actually increase.
This was primarily due to two problems:
1) Vanishing/exploding gradients
2) The degradation problem
The vanishing/exploding gradients problem is a by-product of the chain rule.
The chain rule multiplies error gradients for weights in the network.
Multiplying lots of values that are less than one will result in smaller and smaller values.
As those error gradients approach the earlier layers of a network, their value will tend to zero.
This results in smaller and smaller updates to earlier layers (not much learning happening).
The inverse problem is the exploding gradient which happens when large error gradients accumulate during training and result in massive updates to model weights in the earlier layers.
The degradation problem is unexpected because it's not caused by overfitting.
Researchers were finding that as networks got deeper, the training loss would decrease but then shoot back up as more layers were added to the networks.
Which is counterintuitive…
Because you’d expect your training error to decrease, converge, and plateau out as the number of layers in your network increases.
Both of these issues threatened to halt the progress of deep neural networks until this paper came out...
The ResNet paper introduced a novel solution to these two pesky problems that plagued the architects of deep neural networks:
The Skip Connection.
Skip connections, which are housed in residual blocks, allow you to take the activation value from an earlier layer and pass it to a deeper layer in a network.
Skip connections enable deep networks to learn the identity function.
Learning the identity function allows a deeper layer to perform as well as an earlier layer, or at the very least it won’t perform any worse
The result is a smoother gradient flow, ensuring important features are preserved in the training process.
The invention of the skip connection has given us the ability to build deeper and deeper networks while avoiding the problem of vanishing/exploding gradients and degradation.
Wanna learn more about ResNet? Check out this short course that I've prepared for you using the SuperGradients training library!
Learn How to Speak English Fluently and Build Your Vocabulary Quickly - Learning English Speaking was Never so Easy
How to greet in local Arabic dialect effectively. Best ever solution for Arab countries visitors, Hajj & Umrah pilgrims
Discover how to use design patterns to structure and simplify your Java program on this FREE Java Tutorial for Beginners
Create a backend app from start to finish and run it on auto-scaling containers on AWS
A quick start to building your REST API with Spring Boot and Spring MVC
Learn modern Java using Hands-on Step by Step approach. Learn Java 13, Java 12, Java 11, Java10, Java 9, Java 8 features
Hey, I'm Harpreet.
I'm a data science and deep learning practitioner working in the industry.
Throughout undergrad and grad school I've studied economics, actuarial sciences, statistics, and mathematics.
I've worked as an actuary, biostatistician, and data scientist (senior and lead) in a variety of industries.
I love working in:
• Python ?
• PyTorch ?
• SuperGradients ???
I'm currently working in developer relations and have worked at companies like Comet, Pachyderm and now at Deci AI and absolutely loving it!
I love sharing my knowledge and experience with others, whether it's here, on LinkedIn, Twitter, or in my community Deep Learning Daily?
I also host a podcast called The Artists of Data Science. I've interviewed over 300 people on topics ranging from breaking into data science, how to become a leader in data science, self-improvement, philosophy, production machine learning, and so much more.