-
Nafis A Abeer authoredNafis A Abeer authored
Documentation
Nafis Abeer: nafis@bu.edu
Rohan Kumar: roku@bu.edu
Zane Mroue: zanem@bu.edu
Samuel Gulinello: samgul@bu.edu
Sanford Edelist: edelist@bu.edu
Description
Visual Simultaneous Localization and Mapping (VSLAM) is the process of taking camera feed, as well as its position, and building a map of the current local world, specifically using visual input. This project uses this process, and builds upon it by also tracking objects within a frame. In this comes two problems: object detection, and then subsequent mapping and tracking of objects within a 3D space.
Implementation
For simplicity, the general system framework is shown below:
graph LR;
Z[Camera/Video] -->|Input| A
A[VSLAM]-->|KeyFrames| B[YOLOv4];
A-->|Features| C[Object Tracking];
A-->|Features| E
B-->|Object Detection|C;
C-->|Objects| D[Database];
D-->|Objects| E[GUI];
-
Camera/Video: as of right now, we use prerecorded video as our examples and tests, but this system can be easily extended to real time camera systems or drone footage
- the output are the frames that constitute the video
-
VSLAM: using the MATLAB VSLAM algorithm, this process takes the raw frame and does two things: finds "features", or important points used for tracking, and finds "keyframes", which are a subset of the entire set of frames that capture the most information about the video (i.e. movement gives points their 3D position)
- the outputs are the keyframes and the features
-
YOLOv4: using a Java library, we perform deep learning using the YOLOv4 model, which is a convolutional neural network that takes a keyframe, and finds bounding boxes around each object that the model can discern
- the output are the bounding boxes around each object for each frame
-
Object Tracking: the significant contribution using data structures and algorithms, this system takes the bounding boxes of each object (in 2D on a single frame), and the features (in 3D on the same frame), finds each feature in each bounding box, and then tries to rectify the objects in the current frame with objects found in past frames. We solve this by implementing a data structure called an ObjectSet. For each object that has already been found, we compare a new object, and if there is some percentage of similarity in features contained in both objects, we combine these two objects and update the database correspondingly.
- there is further explanation and runtime analysis in the Appendix A
- the output is an iteratively more accurate set of objects that correspond to reality
-
Database: for ease of retrieving, updating, and storing Objects and corresponding features, we use a MongoDB database
- the output is storage for those objects from object tracking
-
GUI: for an outward facing display of our work, we implemented a Javascript UI, that creates a server, such that we can view the system's output in any browser.
- the output is a clean point cloud view of objects and features that the camera has seen
Features
need to fill in this area
Code
The following links:
- The branch and directory containing Java 17 Code to be executed
- The data needed for the examples used for this system
- The testing code for this system
Work Breakdown
Nafis Abeer:
Rohan Kumar:
Zane Mroue:
Samuel Gulinello:
Sanford Edelist:
Appendix
A: Runtime and Space Analysis of ObjectSet
TODO
B: References and Material Used
need to fill in references and also ALL LIBRARIES USED (MATLAB, YOLO, Javascript stuff, etc)
Personal Access token
Qzkmjrtrda1yGxkxyz8C