Wizna's Study Room

참고: Selective Search(https://donghwa-kim.github.io/SelectiveSearch.html) 참고: EdgeBoxes(https://donghwa-kim.github.io/EdgeBoxes.html) 참고: https://www.youtube.com/watch?v=nDPWywWRIRo

Abstract
- SPPNet [1] and Fast R-CNN [2] have reduced the running time of detection networks, exposing region proposal computation as a bottleneck
- We introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network (predicts object bounds and objectness scores at each position)
- Use Fast R-CNN for detection, changed Region Proposal Algorithm into RPN merging RPN and Fast R-CNN into a single network (Only 300 proposals; 2000 proposals in R-CNN)

Introduction
- Now, proposals are the test-time computational bottleneck in state-of-the-art detection systems
  - Selective Search [4], one of the most popular methods, greedily merges superpixels based on engineered low-level features
  - EdgeBoxes [6] currently provides the best tradeoff between proposal quality and speed
- We introduce novel Region Proposal Networks that share convolutional layers with state-of-the-art object detection networks
- Our observation is that the convolutional feature maps used by region-based detectors, like Fast RCNN, can also be used for generating region proposals
- RPNs are designed to efficiently predict region proposals with a wide range of scales and aspect ratios (Figure 1-a, 1-b, 1-c)
- RPNs completely learn to propose regions from data, and thus can easily benefit from deeper and more expressive features

Related Work
- Object Proposals
  - Grouping super-pixels: Selective Search [4], CPMC [22], MCG [23]
  - Sliding windows: objectness in windows [24], EdgeBoxes [6]
- Deep Networks for Object Detection
  - R-CNN [5]
  - Predicting object bounding boxes: [25], OverFeat method [9], MultiBox methods [26, 27], DeepMask method [28]
  - Shared Computation of convolutions: [9, 1, 29, 7, 2]

Faster R-CNN
- Region Proposal Networks 참고: ZFNet(https://oi.readthedocs.io/en/latest/computer_vision/cnn/zfnet.html)
  - Network takes as input an $n * n$ spatial window
  - Intermediate layer maps input into a lower dimensional feature
  - Box-regression layer and a box-classification layer are used
  - Anchor is centered at the sliding window in question, and is associated with a scale and aspect (by default 3 scales([128, 256, 512]) * 3 aspect ratios([1:1, 1:2, 2:1]), total k=9 anchors at each sliding position)
  - Translation-Invariant Anchors: Anchors and the Functions that compute proposals relative to the anchors
  - Multi-Scale Anchors as Regression References
  - Loss Function (Multi-task Loss)
  - dd

Experiments

Conclusion

Uploaded by N2T

'Paper_Reading' 카테고리의 다른 글

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (0)	2022.09.07
Resnet: Deep Residual Learning for Image Recognition (0)	2022.09.07

Abstract
- Key of Semantic Segmentation: rich spatial information, sizable receptive field
- For realtime inference speed, spatial resoltuion is compromised
- Design
  - Spatial Path with a small stride (for spatial information + generate high-resoltuion features)
  - Context Path with a fast downsampling (for sufficient receptive field)
  - Feature Fusion Module to combine two features
- Experiments
  - Cityscapes, CamVid, COCO-Stuff

Introduction
- [34, 39] try to restrict the input size
- [1, 8, 25] try to prune the channels of the network
- [25] proposes to drop the last stage of the model
- [1, 25, 35] use U-shape architecture to fuse the hierarchical features
- Propose a novel approach to decouple the function of spatial information preservation and receptive field offering into two paths.
  - Spatial Path for spatial information
  - Context Path for large receptive field

Related work
- Spatial information
  - DUC [32], PSPNet [40], DeepLab v2, v3 [5, 6] use the dilated convolution to preserve the spatial size of the feature map
  - Global Convolution Network [26] uses large kernel to enlarge the receptive field
- U-Shape method
  - FCN [22], U-Net [27] encode multi level features by skip connection
  - [1, 24] use deconvolution layers to construct U-shape structure
  - Laplacian Pyramid Reconstruction Network [10]
  - RefineNet [18] adds multi-path refinement structure
  - DFN [36] designs channel attention block
- Context information
  - [5, 6, 32, 37] employ the different dilation rates in conv layers.
  - [5] uses ASPP module
  - PSPNet [40] applies “PSP” module (different scales of average pooling layers)
  - [6] uses ASPP with global average pooling
  - [38] uses scale adaptive convolution layer
  - DFN [36] adds global pooling on the top of the U-shape structure
- Real time segmentation
  - Lightweight model: SegNet [1], E-Net [25]
  - Image Cascade or Cascade network: ICNet [39], [17]
  - [34] designs a novel two-column network and spatial sparsity
1. Bilateral Segmentation Network
  - Spatial Path → Rich spatial information
    - Three layers (Conv with stride=2 + BN + ReLU): 1/8 of original image
  - Context Path → Large receptive field
    - ASPP, Pyramid Pooling, Large Kernel require both computation and memory a lot
    - Lightweight model + Global average pooling
  - Training
    - Loss: Softmax + Cross Entropy (Main) + Cross Entropy loss from different stages (Aux)
    - Used backbone model(Xception39) for Context path
1. Experimental Results
  - Accuracy & Speed Analysis
    Table 7~9
  - Ablation Studies…

Uploaded by N2T

'Paper_Reading' 카테고리의 다른 글

Faster R-CNN (0)	2022.09.07
Resnet: Deep Residual Learning for Image Recognition (0)	2022.09.07

Abstract
- We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously
- Reformulate the layers → Residual functions with reference to the layer inputs (instead of learing unreferenced functions)
- Provide empirical evidence of the power of residual connnections (ImageNet-ILSVRC 2015-, CIFAR-10/100, PASCAL, MS COCO) Deeper than VGGNet, still lower complexity
- Performance
  - 28% relative improvement on the COCO object detection dataset
  - 3.57% error on the ImageNet test (+1st place on the task of ImageNet competitions)

Introduction
- Deep networks naturally integrate low/mid/high-level features and classifiers in an end-to-end multi-layer fashion
- Driven by the significance of depth, a question arises: Is learning better networks as easy as stacking more layers? → Obstacle: vanishing/exploding gradients → Remedy: nomalized initialization, normalization layers
- Degradation problem: with the network depth increasing, accuracy gets saturated and then degrades rapidly
- We address the degradation problem by introducing a deep residual learning framework
  - Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping
  - Hypothesis: it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping
- Experiments in two points
  - Easier training: Our deep residual nets are easy to optimize, but the counterpart plain nets are hard to train when the depth is increasing
  - Higher performance: Our deep residual nets can easily enjoy accuracy gains from greatly increased depth
- This strong evidence shows that the residual learning principle is generic, and we expect that it is applicable in other vision and non-vision problems

Related Work
- Residual Representations
  - Image Recognition - VLAD: representation that encodes by the residual vectors with respect to a dictionary - Fisher Vector: probabilistic version of VLAD
  - For vector quantization, encoding residual vectors is effective than encoding original vectors
  - Low-level vision and computer graphics (solving PDEs-Partial Differential Equations-) - Multigrid: reformulates the system as subproblems at multiple scales (each subproblem is responsible for the residual lsoultion betw. a coarser and a finer scale) - Hierarchical basis preconditioning: relies on variables that represent residual vectors betw. two scales
- Shortcut Connections
  - Several techniques - MLPs: add a linear layer connected from the network input to the output - Few intermediate layers directly connected to auxiliary classifiers - Methods(using shortcut connections) considering centering layer responses, gradients, and propagated errors - Inception: composed of a shortcut branch and a few deeper branches
  - Concurrent work - Highway networks: shortcut connections with gating functions (Data-dependent gate, trainable. Resnet is parameter-free)

Deep Residual Learning
- Residual Learning $x$ : the inputs to the first of these layers (Input) $H(x)$ : underlying mapping (Output) $F(x) := H(x) - x$ : neural network layer/block, thus $F(x) + x = H(x)$ : original function - If the added layers can be constructed as identity mappings, a deeper model should have training error no greater than its shallower counterpart - The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers
- Identity Mapping by Shortcuts $y = F(x, {W_i}) + x$ modified equation: $y = F(x, {W_i}) + W_sx$
- Network Architectures
  - Plain Network Our plain baselines are mainly inspired by the philosophy of VGGNets (i) for the same output feature map size, the layers have the same number of filters (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer - downsampling directly by convolutional layers that have a stride of 2
  - Residual Network

Experiments
- ImageNet Classification
- CIFAR-10 and Analysis
- Object Detection on PASCAL and MS COCO

Uploaded by N2T

'Paper_Reading' 카테고리의 다른 글

Faster R-CNN (0)	2022.09.07
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (0)	2022.09.07

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Wizna's Study Room

전체 글

Faster R-CNN

'Paper_Reading' 카테고리의 다른 글

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

'Paper_Reading' 카테고리의 다른 글

Resnet: Deep Residual Learning for Image Recognition

'Paper_Reading' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역