@@ -105,6 +105,7 @@ In this structure, each integer key corresponds to an ArrayList containing molec
**How it was implemented:**
Our database utilizes an array-based organization where each index corresponds to the number of atoms in a molecule. The array is structured so molecules with the same number of atoms are grouped together. We were able to take advantage of this when searching for a molecule. We begin by examining the number of atoms we are searching for in the molecule. If no molecules are stored at the corresponding index in the database, we immediately know that the molecule is not present. If molecules are stored in this index, we only have to focus on these.
Next, we can analyze the specific characteristics of both the target molecule and each molecule sharing the same count of atoms. We use the function areMoleculesEqual() to compare the molecule we are searching for with the other molecules individually. Each molecule has an array, numElements, of size 118, where each index corresponds to an atomic number of an element. Within this array, each index holds a count representing the number of occurrences of each element in the molecule. We compare the numElements arrays of both molecules, and we can determine if they are not the same and if there are any inconsistencies. Afterward, we verify that each molecule's total number of edges matches. Following this comparison, we proceed with a deeper examination of each atom in both molecules. For every atom in the first molecule, we ensure a corresponding atom exists in the second molecule. We compare the atoms by looking at their atomic number, degree of edges, and bonds. We ensure that each bond of the atom is exactly the same as the bonds of the atom in the second molecule. If, at any point, inconsistencies between the two molecules arise, we conclude that the molecules are not identical. The function will only return that a molecule has been found in the database if it satisfies all the tests.
@@ -115,22 +116,38 @@ Next, we can analyze the specific characteristics of both the target molecule an
**How it was implemented:**
To execute the program via the command-line interface, navigate to the directory where the md file is located. Once in the directory, users can run one of the following commands:
./md --addMolecule [FILE PATH]: This command adds the molecule specified in the file path to the database. For example, to add biotin from the file biotin.txt to the database, run: ./md --addMolecule /path/to/directory/biotin.txt.
./md --findMolecule [FILE PATH]: This command searches for the isomorphic molecule specified in the file path within the database. For instance, to find an isomorphic molecule to biotin within the database using the file biotin.txt, run: ./md --findMolecule /path/to/directory/biotin.txt. If the program finds an exact match, it will display “FOUND.” If not, it will show “NO EXACT MATCH FOUND” and return the most similar molecule to the given molecule within the database.
./md --addProteins [FILE PATH]: This command adds protein molecules generated by `--makeManySimple` command and `--makeFewComplex` command to the database. The default path for --makeManySimple is “../simple” and `--makeFewComplex` “../complex”. For instance, ./md --addproteins ../simple and ./md --addProteins ../complex.
./md --findSubgraph [FILE PATH]: This command finds and outputs all molecules containing a subgraph provided in the file path. If no subgraph is found, the program outputs "No subgraph found."
./md --downloadPubChem start,end: This command downloads molecules from the PubChem database. "start" and "end" are indices or Compound ID (CID) numbers of the molecules in the PubChem database. For example, to download molecules 20-27, type: ./md --downloadPubChem 20,27
./md --printDb: This command prints the list of molecules inside the database and their number of atoms.
./md --verbose: Upon entering this command, all subsequent commands will display additional information about the database. If the user runs this command again, subsequent commands will not output additional information.
./md --quit: This command exits the program. Upon exiting, the molecule database is automatically saved in the project folder as molecule.db, and a confirmation message is displayed in the command interface. When the program is relaunched, the database is loaded, allowing the user to resume working with the previously saved data.
./md --printName: This command prints the name of the database.
./md --verbose: Upon entering this command, all subsequent commands will display additional information about the database (e.g., error messages). If the user runs this command again, subsequent commands will not output additional information.
./md --makeManySimple: This command generates 10 million molecule files with between 52 and 136 atoms.
./md --makeFewComplex: This command generates 10,000 million molecule files with over 10,000 atoms each.
./md --marco: This command pings the server to check if it is still alive.
./md --quit: This command saves the database and exits the program.
The Main.java class, which facilitates the command-line interface, also includes a client-server connection feature. When the program is executed, it first attempts to determine whether it can function as a client or server and establishes connections accordingly.