Hybrid of semantic segmentaiton and object detetion
pixelwise
(SoTa) Mask R-CNN
Input image → NN → use RPN to project onto feature map (RoIAlign) → CNN →
- Faster RCN style classification + coodrinates
- Semantic segmentation CNN (ie. mask prediction)
Kind of
- CNN → RPN to get ROI in feature map
- Then, semantic segmentation for each ROI