참고: Selective Search(https://donghwa-kim.github.io/SelectiveSearch.html) 참고: EdgeBoxes(https://donghwa-kim.github.io/EdgeBoxes.html) 참고: https://www.youtube.com/watch?v=nDPWywWRIRo

  1. Abstract
    • SPPNet [1] and Fast R-CNN [2] have reduced the running time of detection networks, exposing region proposal computation as a bottleneck
    • We introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network (predicts object bounds and objectness scores at each position)
    • Use Fast R-CNN for detection, changed Region Proposal Algorithm into RPN merging RPN and Fast R-CNN into a single network (Only 300 proposals; 2000 proposals in R-CNN)
  1. Introduction
    • Now, proposals are the test-time computational bottleneck in state-of-the-art detection systems
      • Selective Search [4], one of the most popular methods, greedily merges superpixels based on engineered low-level features
      • EdgeBoxes [6] currently provides the best tradeoff between proposal quality and speed
    • We introduce novel Region Proposal Networks that share convolutional layers with state-of-the-art object detection networks
    • Our observation is that the convolutional feature maps used by region-based detectors, like Fast RCNN, can also be used for generating region proposals
    • RPNs are designed to efficiently predict region proposals with a wide range of scales and aspect ratios (Figure 1-a, 1-b, 1-c)
    • RPNs completely learn to propose regions from data, and thus can easily benefit from deeper and more expressive features
  1. Related Work
    • Object Proposals
      • Grouping super-pixels: Selective Search [4], CPMC [22], MCG [23]
      • Sliding windows: objectness in windows [24], EdgeBoxes [6]
    • Deep Networks for Object Detection
      • R-CNN [5]
      • Predicting object bounding boxes: [25], OverFeat method [9], MultiBox methods [26, 27], DeepMask method [28]
      • Shared Computation of convolutions: [9, 1, 29, 7, 2]
  1. Faster R-CNN
    • Region Proposal Networks 참고: ZFNet(https://oi.readthedocs.io/en/latest/computer_vision/cnn/zfnet.html)
      • Network takes as input an nnn * n spatial window
      • Intermediate layer maps input into a lower dimensional feature
      • Box-regression layer and a box-classification layer are used
      • Anchor is centered at the sliding window in question, and is associated with a scale and aspect (by default 3 scales([128, 256, 512]) * 3 aspect ratios([1:1, 1:2, 2:1]), total k=9 anchors at each sliding position)
      • Translation-Invariant Anchors: Anchors and the Functions that compute proposals relative to the anchors
      • Multi-Scale Anchors as Regression References
      • Loss Function (Multi-task Loss)
      • dd
  1. Experiments
  1. Conclusion


Uploaded by N2T

+ Recent posts