Resnet: Deep Residual Learning for Image Recognition :: Wizna's Study Room

Resnet: Deep Residual Learning for Image Recognition

Wizna 2022. 9. 7. 13:12

2022. 9. 7. 13:12

Abstract
- We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously
- Reformulate the layers → Residual functions with reference to the layer inputs (instead of learing unreferenced functions)
- Provide empirical evidence of the power of residual connnections (ImageNet-ILSVRC 2015-, CIFAR-10/100, PASCAL, MS COCO) Deeper than VGGNet, still lower complexity
- Performance
  - 28% relative improvement on the COCO object detection dataset
  - 3.57% error on the ImageNet test (+1st place on the task of ImageNet competitions)

Introduction
- Deep networks naturally integrate low/mid/high-level features and classifiers in an end-to-end multi-layer fashion
- Driven by the significance of depth, a question arises: Is learning better networks as easy as stacking more layers? → Obstacle: vanishing/exploding gradients → Remedy: nomalized initialization, normalization layers
- Degradation problem: with the network depth increasing, accuracy gets saturated and then degrades rapidly
- We address the degradation problem by introducing a deep residual learning framework
  - Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping
  - Hypothesis: it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping
- Experiments in two points
  - Easier training: Our deep residual nets are easy to optimize, but the counterpart plain nets are hard to train when the depth is increasing
  - Higher performance: Our deep residual nets can easily enjoy accuracy gains from greatly increased depth
- This strong evidence shows that the residual learning principle is generic, and we expect that it is applicable in other vision and non-vision problems

Related Work
- Residual Representations
  - Image Recognition - VLAD: representation that encodes by the residual vectors with respect to a dictionary - Fisher Vector: probabilistic version of VLAD
  - For vector quantization, encoding residual vectors is effective than encoding original vectors
  - Low-level vision and computer graphics (solving PDEs-Partial Differential Equations-) - Multigrid: reformulates the system as subproblems at multiple scales (each subproblem is responsible for the residual lsoultion betw. a coarser and a finer scale) - Hierarchical basis preconditioning: relies on variables that represent residual vectors betw. two scales
- Shortcut Connections
  - Several techniques - MLPs: add a linear layer connected from the network input to the output - Few intermediate layers directly connected to auxiliary classifiers - Methods(using shortcut connections) considering centering layer responses, gradients, and propagated errors - Inception: composed of a shortcut branch and a few deeper branches
  - Concurrent work - Highway networks: shortcut connections with gating functions (Data-dependent gate, trainable. Resnet is parameter-free)

Deep Residual Learning
- Residual Learning $x$ : the inputs to the first of these layers (Input) $H(x)$ : underlying mapping (Output) $F(x) := H(x) - x$ : neural network layer/block, thus $F(x) + x = H(x)$ : original function - If the added layers can be constructed as identity mappings, a deeper model should have training error no greater than its shallower counterpart - The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers
- Identity Mapping by Shortcuts $y = F(x, {W_i}) + x$ modified equation: $y = F(x, {W_i}) + W_sx$
- Network Architectures
  - Plain Network Our plain baselines are mainly inspired by the philosophy of VGGNets (i) for the same output feature map size, the layers have the same number of filters (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer - downsampling directly by convolutional layers that have a stride of 2
  - Residual Network

Experiments
- ImageNet Classification
- CIFAR-10 and Analysis
- Object Detection on PASCAL and MS COCO

Uploaded by N2T

'Paper_Reading' 카테고리의 다른 글

Faster R-CNN (0)	2022.09.07
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (0)	2022.09.07

+ Recent posts

Powered by Tistory, Designed by wallel

티스토리툴바