Commit 7f41cc65 authored by Rohan  Kumar's avatar Rohan Kumar
Browse files

Update README.md

parent 706a5d16
Loading
Loading
Loading
Loading
+17 −6
Original line number Diff line number Diff line
@@ -71,17 +71,28 @@ graph LR;

### Features

<<<<<<< HEAD
- [10%] Object Recognition: using YOLOv4, we used the weights from the model, and created a YoloNet, which is trained as a convolutional neural network that detects objects within a 2D image.
- [30%] Performance Optimization: we developed the ObjectSet data structure, that iterates over the frames of a video, and on each frame, incorporates more information. The ObjectSet holds a list of PointSets, and each PointSet represents a collection of 3D points that constitute an object within the real world. At each iteration, we find new candidate objects from the new frame, and iteratively check if these new objects are instances of previously found PointSets, or previously undiscovered objects within the real world.
    > For this feature, we have not fully optimized it, and we plan on improving in before the final report
- [30%] Performance Optimization: we developed the ObjectSet data structure, that iterates over the frames of a video, and on each frame, incorporates more information. The ObjectSet holds a list of PointSets, and each PointSet represents a collection of 3D points that constitute an object within the real world. 
    - For optimization, we turned the list of points in each PointSet into a HashSet, implementing an equals() and hashcode() method for the Point class. This substantially improved performance.
- [15%] Integration with EuRoC MAV and other Datasets: 
    > This feature has NOT been implemented, but we plan on building this before the final report
- [30%] Object Tracking: object tracking occurs when we iterate over each KeyFrame, placing objects into the objectset.
    - we integrated the system with two datasets, the tum_rgbd and the imperial_london subdirectories within the codebase. However, with the current system, the only addition necessary would be to give the VSLAM algorithm any RGBD (RGB + depth) video. Given that video, VSLAM should build all data we need, and then we can perform object detection and  tracking.
- [30%] Object Tracking: object tracking occurs when we iterate over each KeyFrame, placing objects into the objectset. At each iteration, we find new candidate objects from the new frame, and iteratively check if these new objects are instances of previously found PointSets, or previously undiscovered objects within the real world. 
- [15%] Comprehensive Benchmark: 
    > This feature has NOT been implemented, but we plan on building this before the final report
    - see benchmark section below
- [10%] Server and Database: we implemented the server with Spring-Boot, and wrote a GUI in JavaScript/CSS/HTML that is served by a Java backend. This makes it easy to view the pointcloud view of the room, and choose objects to be highlighted on the display.

# Benchmarking

Benchmarking Time Analysis:

Processing time for single frame:
158,000 points: avg 17ms, max 57ms, min 4ms
83,000 points: avg 9ms, max 33ms, min 2ms
49,000 points: avg 6ms, max 25ms, min 1ms
22,000 points: avg 2ms, max 16ms, min under 1ms

It should be noted that before implementation optimization on the PointSet class (changing from a list to a hashset), the system could significantly longer (on the magnitude of minutes). Now, the full system takes less than 2 seconds to process an 87 KeyFrame video. 

# Code

The following links: