updated readme (63d31bb1) · Commits · Zane Safi Mroue / ObjectDetectionVSLAM

README.md

+28 −7

Original line number	Diff line number	Diff line
		@@ -28,8 +28,9 @@ For simplicity, the general system framework is shown below:
		graph LR;
		Z[Camera/Video] -->\|Input\| A
		A[VSLAM]-->\|KeyFrames\| B[YOLOv4];
		A-->\|Features\| C[Object Tracking];
		A-->\|Features\| E
		A-->\|Pointcloud\| C[Object Tracking];
		A-->\|Camera Pose\| C;
		A-->\|Pointcloud\| E
		B-->\|Object Detection\|C;
		C-->\|Objects\| D[Database];
		D-->\|Objects\| E[GUI];
		@@ -88,14 +89,34 @@ graph LR;

		### Features

		- [10%] Object Recognition: using YOLOv4, we used the pretrained weights for the model and implemented YoloNet, which is trained as a convolutional neural network that detects objects within a 2D image.
		- [30%] Performance Optimization: we developed the ObjectSet data structure, that iterates over the frames of a video, and on each frame, incorporates more information. The ObjectSet holds a list of PointSets, and each PointSet represents a collection of 3D points that constitute an object within the real world. At each iteration, we find new candidate objects from the new frame, and iteratively check if these new objects are instances of previously found PointSets, or previously undiscovered objects within the real world.
		- [15%] Integration with TUM RGBD and Imperial College London Datasets (multiple dataset integration)
		- [30%] Object Tracking: object tracking occurs when we iterate over each KeyFrame, placing objects into the objectset.
		- [15%] Comprehensive Benchmark: see below for analysis
		- [10%] Object Recognition: using YOLOv4, we used the weights from the model, and created a YoloNet, which is trained as a convolutional neural network that detects objects within a 2D image.
		- [30%] Performance Optimization: we developed the ObjectSet data structure, that iterates over the frames of a video, and on each frame, incorporates more information. The ObjectSet holds a list of PointSets, and each PointSet represents a collection of 3D points that constitute an object within the real world.
		- For optimization, we turned the list of points in each PointSet into a HashSet, implementing an equals() and hashcode() method for the Point class. This substantially improved performance.
		- [15%] Integration with EuRoC MAV and other Datasets:
		- We integrated the system with two datasets, the tum_rgbd and the imperial_london subdirectories within the codebase. However, with the current system, the only addition necessary would be to give the VSLAM algorithm any RGBD (RGB + depth) video. Given that video, VSLAM should build all data we need, and then we can perform object detection and tracking.
		- [30%] Object Tracking: object tracking occurs when we iterate over each KeyFrame, placing objects into the objectset. At each iteration, we find new candidate objects from the new frame, and iteratively check if these new objects are instances of previously found PointSets, or previously undiscovered objects within the real world.
		- [15%] Comprehensive Benchmark:
		- see benchmark section below
		- [10%] Server and Database: we implemented the server with Spring-Boot, and wrote a GUI in JavaScript/CSS/HTML that is served by a Java backend. This makes it easy to view the pointcloud view of the room, and choose objects to be highlighted on the display.
		- [10%] The original Tello drone integration failed due to contraints of our drone itself. We are able to demonstrate performing VSLAM on footage collected from our phone isntead

		# Benchmarking

		Benchmarking Time Analysis:

		Processing time for single frame:

		158,000 points: avg 17ms, max 57ms, min 4ms

		83,000 points: avg 9ms, max 33ms, min 2ms

		49,000 points: avg 6ms, max 25ms, min 1ms

		22,000 points: avg 2ms, max 16ms, min under 1ms


		It should be noted that before implementation optimization on the PointSet class (changing from a list to a hashset), the system could significantly longer (on the magnitude of minutes). Now, the full system takes less than 2 seconds to process an 87 KeyFrame video.

		# Code

		The following links:

UpdateEdits.md

0 → 100644

+4 −0

Original line number	Diff line number	Diff line
		The biggest issue that was raised was the GUI, due to its lack of clarity. We have completely overhauled the frontend, and improved the backend to provide better accuracy. The following list details how we did this:

		1. Migrated from the THREE.js pointcloud plotting library to Plotly.js
		2. Improved styling of the