Integrating RGBD-VSLAM with Object Detection and Tracking

Visual Simultaneous Localization and Mapping (VSLAM) is the process of taking camera feed, as well as its position, and building a map of the current local world, specifically using visual input. This project uses this process, and builds upon it by also tracking objects within a frame. In this comes two problems: object detection, and then subsequent mapping and tracking of objects within a 3D space. For more information go here

These are the steps to this process:

  1. Taking a video source as input, start the VSLAM algorithm to iteratively build a worldmap
  2. Save the finalized and error-corrected worldmap, as well as the camera position for each important frame
  3. Use YOLOv4 ConvNet on each frame, a model that returns 2D bounding boxes around objects found within an image
  4. Project the 3D points collected from VSLAM onto each frame
  5. For each 2D bounding box, create a potential object containing all 3D points that are projected within its bounds
  6. Check if there is overlap in this set of points when comparing to past objects from other frames
  7. Given sufficient overlap, combine these objects using intersection on the points

Given some video of an environment, the output of this algorithm is essentially 3D groupings of points that correspond to objects within that environment.

Select a dataset from the dropdown below. It will take some time to process the dataset.

Select An Object To View Point Cloud With That Object Highlighted