@@ -103,7 +103,10 @@ In this structure, each integer key corresponds to an ArrayList containing molec
**How it was implemented:**
Our database utilizes a hashmap-based organization where each key corresponds to the number of atoms in a molecule. The hashmap is structured so molecules with the same number of atoms are grouped together in an ArrayList, stored as the value for each key in the hashmap. We were able to take advantage of this when searching for a molecule. We begin by examining the number of atoms we are searching for in the molecule. If no molecules are stored at the corresponding key in the database, we immediately know that the molecule is not present. If molecules are stored in this key of the hashmap, we only have to focus on these.
Next, we can analyze the specific characteristics of both the target molecule and each molecule sharing the same count of atoms. We use the function areMoleculesEqual() to compare the molecule we are searching for with the other molecules individually. Each molecule has an array, numElements, of size 118, where each index corresponds to an atomic number of an element. Within this array, each index holds a count representing the number of occurrences of each element in the molecule. We compare the numElements arrays of both molecules, and we can determine if they are not the same and if there are any inconsistencies. Afterward, we verify that each molecule's total number of edges matches. Following this comparison, we proceed with a deeper examination of each atom in both molecules. For every atom in the first molecule, we ensure a corresponding atom exists in the second molecule. We compare the atoms by looking at their atomic number, degree of edges, and bonds. We ensure that each bond of the atom is exactly the same as the bonds of the atom in the second molecule. If, at any point, inconsistencies between the two molecules arise, we conclude that the molecules are not identical. The function will only return that a molecule has been found in the database if it satisfies all the tests.
Next, we can analyze the specific characteristics of both the target molecule and each molecule sharing the same count of atoms. We use the function areMoleculesEqual() to compare the molecule we are searching for with the other molecules individually. Each molecule has an array, numElements, of size 118, where each index corresponds to an atomic number of an element. Within this array, each index holds a count representing the number of occurrences of each element in the molecule. We compare the numElements arrays of both molecules, and we can determine if they are not the same and if there are any inconsistencies. Afterward, we verify that each molecule's total number of edges matches. Following this comparison, we proceed with a deeper examination of each atom in both molecules. For every atom in the first molecule, we ensure a corresponding atom exists in the second molecule.
We compare the atoms by looking at their atomic number, degree of edges, and bonds. We ensure that each bond of the atom is exactly the same as the bonds of the atom in the second molecule. If, at any point, inconsistencies between the two molecules arise, we conclude that the molecules are not identical. The function will only return that a molecule has been found in the database if it satisfies all the tests.
@@ -159,13 +162,13 @@ The graphical user interface (GUI), constructed using the built-in Java Swing JF
Choose File/Folder: Initiates a window allowing users to select a molecule file or folder for processing. Upon selection, the chosen path is displayed in the designated field.
Add Molecule: This button allows users to add a molecule to the database. The program reads the file specified by the user and adds the molecule to the database. Upon successful addition, a message "Molecule added: [molecule name]" appears. This button calls the function addMolecule() to add the molecule to the database.
Add Molecule: This button allows users to add a molecule to the database. The program reads the file specified by the user and adds the molecule to the database. Upon successful addition, a message "Molecule added: [molecule name]" appears. This button calls the function `addMolecule()` to add the molecule to the database.
Delete Molecule: This button allows users to delete a molecule from the database. The program reads the file specified by the user and deletes the molecule from the database. Upon successful deletion, a message “Successfully Deleted” appears. If the molecule was already gone from the database, the message “Molecule not in the database” appears. This button calls the deleteMolecule() function to delete.
Delete Molecule: This button allows users to delete a molecule from the database. The program reads the file specified by the user and deletes the molecule from the database. Upon successful deletion, a message “Successfully Deleted” appears. If the molecule was already gone from the database, the message “Molecule not in the database” appears. This button calls the `deleteMolecule()` function to delete.
Find Molecule: Users can select a molecule file and click this button to search for an isomorphic molecule in the database. Once the button is clicked, the findMolecule() method is called to handle the request. The GUI displays a "FOUND" message if an isomorphic molecule is found. If not found, it shows "NOT FOUND" and the program will call similarMolecule() method to return the name of the most similar molecule in the database.
Find Molecule: Users can select a molecule file and click this button to search for an isomorphic molecule in the database. Once the button is clicked, the findMolecule() method is called to handle the request. The GUI displays a "FOUND" message if an isomorphic molecule is found. If not found, it shows "NOT FOUND" and the program will call `similarMolecule()` method to return the name of the most similar molecule in the database.
Find Subgraph: To find a subgraph, the user selects a file containing the desired subgraph and then clicks on this button to initiate the search for all molecules containing the provided subgraph. Upon clicking the button, the GUI activates the findSubgraph() method to execute the operation.
Find Subgraph: To find a subgraph, the user selects a file containing the desired subgraph and then clicks on this button to initiate the search for all molecules containing the provided subgraph. Upon clicking the button, the GUI activates the `findSubgraph()` method to execute the operation.
Display Molecule: Similarly, users can select a molecule file and click this button to view the 2D Lewis structure of the molecule in a separate pop-up window. Note that the file must be in the correct format, and the molecule must be registered in the PubChem database for viewing. This button makes use of the following API URL that returns the image of the molecule: https://cactus.nci.nih.gov/chemical/structure/"molecule name"/"representation", where molecule name is the name of the molecule and representation is the desired returning format [1]. Once the button is clicked, the GUI reads the file to extract the molecule's name and creates the URL that can return the Lewis structure image of the molecule.
@@ -175,7 +178,7 @@ PubChem database. For example, type 14,16 in the input section and click this bu
Database Statistics: Clicking this option prints database statistics, including the total number of molecules, a list of molecules with their names and the number of atoms, and the names of the smallest and largest molecules in the database. This button activates the printDb() method inside the MoleculeDatabase class, which is responsible for the printing executions.
Add Multiple Molecules: Users can select a folder containing multiple molecule text files and click this button to add all molecules in the specified folder to the database. This button invokes the addMultipleMolecules() method.
Add Multiple Molecules: Users can select a folder containing multiple molecule text files and click this button to add all molecules in the specified folder to the database. This button invokes the `addMultipleMolecules()` method.
Make Simple Molecules: Users can click this button to generate 10 million molecule files, each having between 52 and 136 atoms. Please monitor the terminal output for progress updates. Molecules are saved in folder named `simple` that is located in the same directory as the project folder. This button calls the manySimpleProteins() function.
@@ -209,7 +212,7 @@ Two scripts are provided for generating input files, one that is meant to be cal
**How it was implemented:**
10 million molecules are unavailable in the PubChem database or any readily accessible database. Therefore, a ProteinFactory class is created to procedurally generate unique proteins from a set of amino acids. These proteins are saved to a designated location in the file system in the same format as the user input files, which then can be added to a database with the --addProteins command.
10 million molecules are unavailable in the PubChem database or any readily accessible database. Therefore, a ProteinFactory class is created to procedurally generate unique proteins from a set of amino acids. These proteins are saved to a designated location in the file system in the same format as the user input files, which then can be added to a database with the `--addProteins` command.
Because saving 10 million files in a single directory is too demanding, 100 child directories are created, each containing 100,000 protein files. The command for generating 10 million protein files is --makeManySimple, and the default location is ../simple. The reason for traveling to the parent directory is to hide the files from the IDE in use, which may throw an error in an attempt to index the files.