2 problems covered here

  • Predicting 3D shapes from single images
  • Processing 3D input data

Lots more topics in 3D vision

  • Multi-view sterio
  • Differntiable graphics
  • 3D Sensors
  • Simulataneous Localization and Mapping
  • Etc.

3D shape representations

  • Depth map
  • Voxel grid
  • Pointcloud
  • Mesh
  • Implicit Surfaces

Depth map

Predicting

  • F-CNN
  • L2 per-pixel loss

Issue: scale-depth ambiguity

  • Can use scale invariant loss

Surface normals

  • surface normal gives vector a normal vector to object in the world for that pixel

Voxel grid

Represent shape with V x V x V grid of occupancies (think- segmentation mask from Mask R-CNN, but 3D)

Pros: conceptually simple

Cons: Need high spatial resolution for fine structures + high resolution scaling is non trivial

Architecture

  • Input → 2 d features → 3D Features → Upscaling → 1 x V x V x V

Voxel tubes: Final convolutional layer. V filters. Interpret as a tube of voxel scores

Problems with Voxel: shit ton of memory usage

Scaling Voxels

Oct-Trees : use heterogenous resolutions to add where necessary (finer details)

Point Coud

Pros: don’t need ton of points for fine sructures

Cons: doesn’t explicitly represent surface- would need to post-process to get a mesh

PointNet:

Generating PointCloud outputs

  • Input → (2d CNN) image feautres → (FC + 2d CNN) points + points → pointcoud

Loss function: differentiable way to compare the pointclouds (as sets!)

  • Chamfer distance (sum of L2 distance to each point’s nearest neighbor in other set)

Triangle mesh

Represent 3dD shape with set of triangles

  • Vertices: set of V poitns in 3D space
  • Faces: set of trinangles over the vertices

Pros:

  • Standard representation for graphics
  • Explicitly represents 3D shapes
  • Adaptive
  • Can attach data to vetices

Pixel2Mesh

  • Input: RGB Image
  • Output: triangle mesh for object
  • Key ideas:
    • Itertive refinement: starts with initial ellipsoid mesha nd predicts osfsets for each vertex
    • Graph convolution:
      • Given vertex has feature , new feature depends on feature of neighboring vertices
      • use same weights and for all outputs
    • Vertex aligned-features
      • For each vertex of the msh
        • Use camera info to project onto image plane
        • Use bilinear interpolation to sample CNN feature
        • Similar to RoI-Align operation from OD
    • Chamfer loss function
      • Convert mesh to point cloud and then compute the loss (Chamfer)
      • Also sample points frmo teh surface of ground truth mesh (offline)

Mesh R-CNN

Goal

  • Input: single RGB Image
  • Output: set of detected objects and their
    • Bounding box (Mask-RCNN)
    • Category label (Mask-RCNN)
    • Instance segmentation (Mask-RCNN)
    • 3D triangle mesh (Mesh Head)

Issues with Mesh deformation- topology is fixed by the initial mesh

  • Solution: use voxel predictions to create initial mesh prediction

Pipeline

  • Input image → 2D object recognition → 3D object voxels → 3D object meshes

Implicit surface

Goal: function to classify arbitrary 3D points as inside / utisde the shape

  • Surface would be the level set
  • Signed distance function (SDF): Euclidean distance to surfaec of the shape

Extracting explicit shape outputs requires post-processing

NeRF for view synthesis

View synthesis

  • Input: many images of same scence (with known camera parameters)
  • Output: images from novel viewpoints

Volume rendering

Abstract away light soures, objects. For each pint in space, we need to know

  1. How much light does it emit?
  2. How opaque?

Each ray:

  • Volume density:
  • Color in direction d:

Volume rendering equation (color observed by camera

    • : near point
    • : current point
    • : far point
    • : transmittance- how much light from the current point will reach the camera?
    • : Opacity- how opaque is the current point?
    • : what color does current point emit along directio towards camera?

NeRF (neural radiance fields)

  • Input: and
  • Output: and

Archcitecture

  • Fully connected network

Very strong results, but very slow