Update INSTALL.txt (a64346eb) · Commits · EC504 Spring 2024 Group Projects / Group4

INSTALL.txt

+19 −12

Original line number	Diff line number	Diff line
		1. Pre-conditions
		- The command-line interface is only compatible with Linux (e.g., lab machines) and MacOS systems.
		- The GUI can be run on the lab machine using the command "source idea-IE.sh" to open IntelliJ and click the "Run" button on the GUI.java file. If none of the IDEs on the lab machine work for you due to licensing requirements, please try the GUI on your local machine.
		- The GUI can be run on any system.
		- To use the PubChem download molecule capabilities, Python 3.10, 3.11, or 3.12, with the latest version of pip, is required. If you are using the lab machine, please make sure to update Python and pip to the latest version. If you are using the lab machine, load an updated module of Python by running: module load anaconda/3
		- In addition, please make sure to update Java 17 on the lab computer by running the following terminal command: source /ad/eng/opt/java/add_jdk17.sh

		2. Supporting files
		a. A list of non-standard libraries needed in order to generate Molecule inputs using the Python script and the command --downloadPubChem:
		- NetworkX: $ pip install --user networkx[default] On the lab machines, NetworkX is installed by default (after loading anaconda/3), and this step can be ignored.
		- NetworkX: `$ pip install --user networkx[default]`. On the lab machines, NetworkX is installed by default (after loading anaconda/3), and this step can be ignored.

		b. Users can download and store all molecules for their work with the molecule database. Here are some examples of how the molecule database can benefit:
		- Users can utilize the GUI to visualize molecules from text files in 2D Lewis structures before adding them to the database.
		@@ -16,35 +16,42 @@ b. Users can download and store all molecules for their work with the molecule d
		c. Descriptions of testing patterns and instructions on how to exercise them:
		- We have prepared 25 molecule text files in the directory group4/testcases/molecules for testing the program.
		- Additionally, to test the finding isomorphic molecule feature, we have set up a separate directory called group4/testcases/isomorphic_test. This directory contains text files of isomorphic molecules. For example, the file named biotin_iso.txt contains the isomorphic molecule of biotin. Users can use this file and biotin.txt inside group4/testcases/molecules to test the isomorphic molecule search feature.
		- Finally, we have test cases for the subgraph finding feature located in the project folder, within the SubgraphTestcases directory.
		- To test with the provided text files, please read the Execution section.

		3. Execution
		a. Instructions on how to run the program on the terminal: To execute the program via the command-line interface, navigate to the directory where the md file is located. Once in the directory, users can run one of the following commands:
		a. Instructions on how to run the program on the terminal: To execute the program via the command-line interface, navigate to the root directory where the md file is located. Once in the directory, users can run one of the following commands. Please ensure that the file or folder path provided is the complete path to the selected file or folder.
		- ./md --addMolecule [FILE PATH]: This command adds the molecule specified in the file path to the database. For example, to add biotin from the file biotin.txt to the database, run: ./md --addMolecule /path/to/directory/biotin.txt.
		- ./md --delete [FILE PATH]: This command deletes a specified molecule in the file path to the database. For example, to delete biotin from the file biotin.txt, run: ./md --delete /path/to/directory/biotin.txt
		- ./md --addBulk [FOLDER PATH]: This command allows users to add multiple molecules in bulk from a specified folder. For example, to add all molecules inside the Molecules folder, simply run: ./md --addBulk path/to/Molecules.
		- ./md --findMolecule [FILE PATH]: This command searches for the isomorphic molecule specified in the file path within the database. For instance, to find an isomorphic molecule to biotin within the database using the file biotin.txt, run: ./md --findMolecule /path/to/directory/biotin.txt.
		- ./md --findSubgraph [FILE PATH]: This command finds and outputs all molecules containing a subgraph provided in the file path.
		- ./md --downloadPubChem start,end: This command downloads molecules from the PubChem database. "start" and "end" are indices or Compound ID (CID) numbers of the molecules in the PubChem database. For example, to download molecules 20-27, type: ./md --downloadPubChem 20,27. A verbose output will print the names of all the created molecule files, while a normal output will print Download Complete on finishing successfully.
		- ./md --printDb: This command prints the list of molecules inside the database and their number of atoms.
		- ./md --printName: This command prints the name of the database.
		- ./md --verbose: Upon entering this command, all subsequent commands will display additional information about the database (e.g., error messages). If the user runs this command again, subsequent commands will not output additional information.
		- ./md --makeManySimple: This command generates 10 million molecule files, each having between 52 and 136 atoms. Generated molecules are saved in a folder called `simple` that is located in the same directory as the project folder.
		- ./md --makeFewComplex: This command creates 10,000 million molecule files, each with over 10,000 atoms. Generated molecules are saved in a folder called `complex` that is located in the same directory as the project folder.
		- ./md --addProteins [FILE PATH]: This command adds proteins created by the `--makeManySimple` and `--makeFewComplex` commands. Please make sure to select a file path, which is either the `complex` or `simple` directory.
		- ./md --makeManySimple: This command generates 10 million molecule files, each having between 52 and 136 atoms. Generated molecules are saved in a folder called `simple` that is located in the same directory as the project folder. Users do not need to specify a file or folder path for this feature.
		- ./md --makeFewComplex: This command creates 10,000 million molecule files, each with over 10,000 atoms. Generated molecules are saved in a folder called `complex` that is located in the same directory as the project folder. Users do not need to specify a file or folder path for this feature.
		- ./md --addProteins [FILE PATH]: This command adds proteins created by the `--makeManySimple` and `--makeFewComplex` commands. Please make sure to select a file path, which is either the Complex or Simple directory.
		- ./md --marco: This command pings the server to check if it is still alive.
		- ./md --quit: This command saves the database and exits the program.
		- ./md --quit: This command exits the program. Upon exiting, the molecule database is automatically saved in the project folder as molecule.db. When the program is relaunched, the database is loaded, allowing the user to resume working with the previously saved data.

		b. If the prompt does not reappear after command execution, simply press enter or continue issuing commands. This is a minor visual bug and does not affect the program execution.

		b. Instructions on how to run the program using the GUI: To launch the GUI, simply click "Run" on the GUI.java file if using an Integrated Development Environment (IDE) like IntelliJ. Upon initialization, the GUI presents seven buttons for user interaction, accompanied by a sizable output area to display program results. Here are the instructions on how to use each button:
		- Choose File: Initiates a window allowing users to select a molecule file for processing. Upon selection, the chosen file's path is displayed in the designated field.
		c. Instructions on how to run the program using the GUI:
		- To launch the GUI on an Integrated Development Environment (IDE) like IntelliJ, simply click "Run" on the GUI.java file.
		- If users want to launch the GUI from the terminal, please go to the group4 directory and run the following command: ./gui.

		Upon initialization, the GUI presents seven buttons for user interaction, accompanied by a sizable output area to display program results. Here are the instructions on how to use each button:
		- Choose File/Folder: Initiates a window allowing users to select a molecule file for processing. Upon selection, the chosen file's path is displayed in the designated field.
		- Add Molecule: Users can select a molecule file and click this button to add the molecule to the database.
		- Delete Molecule: Users can select a molecule file and click this button to delete the molecule from the database. This button will only delete one instance of the molecule at a time, giving the user greater control over how they want to handle duplicate molecules. If the molecule isn’t in the database, it will be indicated in the GUI output.
		- Find Molecule: Users can select a molecule file and click this button to search for an isomorphic molecule in the database.
		- Find Subgraph: To find a subgraph, the user selects a file containing the desired subgraph and then clicks on this button to initiate the search for all molecules containing the provided subgraph.
		- Display Molecule: Similarly, users can select a molecule file and click this button to view the molecule's 2D Lewis structure in a separate pop-up window.
		- Download PubChem: Users can specify a range of CID indices (start, end) in the file path to download molecules from the PubChem database. For example, typing 14,16 in the file/folder path and clicking the button download molecules 14-16.
		- Download PubChem: Users can specify a range of CID indices (start, end) in the “Start,End CID Indices” input section and click this button to download molecules from the PubChem database. For example, type 10,20 in the input section and click the button to download molecules with CIDs 10-20 from PubChem to the molecule database. Please note that certain molecules without a title name or with non-integer bond orders, as indicated in the PubChem database, will be skipped during the download.
		- Database Statistics: Clicking this option prints database statistics, including the total number of molecules, a list of molecules with their names and the number of atoms, and the names of the smallest and largest molecules in the database.
		- Add Multiple Molecules: Users can choose a folder containing molecule text files and click this button to add all molecules in the specified folder to the database.
		- Make Simple Molecules: Users can click this button to generate 10 million molecule files, each having between 52 and 136 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `Simple` that is located in the same directory as the project folder.
		- Make Complex Molecules: Users can click this button to create 10,000 million molecule files, each with over 10,000 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `Complex` that is located in the same directory as the project folder.
		- Make Simple Molecules: Users can click this button to generate 10 million molecule files, each having between 52 and 136 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `simple` that is located in the same directory as the project folder.
		- Make Complex Molecules: Users can click this button to create 10,000 million molecule files, each with over 10,000 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `complex` that is located in the same directory as the project folder.
		- Add Proteins: After creating proteins from the `Make Simple Molecules` and `Make Complex Molecules` buttons, users can add these protein molecules to the database. Please make sure to choose a file path, either from Complex or Simple folders, as indicated above.