Merge branch '7-graphical-user-interface' into 'master' (5738ffbc) · Commits · EC504 Spring 2024 Group Projects / Group4

INSTALL.txt

+41 −51

Original line number	Diff line number	Diff line
		1. Pre-conditions
		* The command-line interface is only compatible with Linux (e.g., lab machines) and MacOS systems.
		* The GUI can be run on the lab machine using the command "source idea-IE.sh" to open IntelliJ and clicking "Run" button on the GUI.java file. If none of the IDEs on the lab machine work for you due to licensing requirements, please try the GUI on your local machine.
		* To use the download molecule capabilities, Python 3.10, 3.11, or 3.12, with the latest version of pip, is required.
		* In addition, please make sure to update Java 17 on the lab computer by running the following terminal command: source /ad/eng/opt/java/add_jdk17.sh
		- The command-line interface is only compatible with Linux (e.g., lab machines) and MacOS systems.
		- The GUI can be run on the lab machine using the command "source idea-IE.sh" to open IntelliJ and click the "Run" button on the GUI.java file. If none of the IDEs on the lab machine work for you due to licensing requirements, please try the GUI on your local machine.
		- To use the PubChem download molecule capabilities, Python 3.10, 3.11, or 3.12, with the latest version of pip, is required. If you are using the lab machine, please make sure to update Python and pip to the latest version.
		- In addition, please make sure to update Java 17 on the lab computer by running the following terminal command: source /ad/eng/opt/java/add_jdk17.sh

		2. Supporting files
		* A list of non-standard libraries needed in order to generate Molecule inputs using the Python script and the command --downloadPubChem:
		* NetworkX: $ pip install --user networkx[default]

		* Users can download and store all molecules for their work with the molecule database. Here are some examples of how the molecule database can benefit:
		* Users can utilize the GUI to visualize molecules from text files in 2D Lewis structures before adding them to the database.
		* If users have multiple molecule files, they can start with a clean database, add all molecules, and then search for isomorphic molecules they are interested in. The same applies when users want to find all molecules that contain a given subgraph.
		* Lastly, users can examine all molecule names, along with statistics like the number of atoms in each molecule, and identify the largest or smallest molecule among the files by printing the molecule database.

		* Descriptions of testing patterns and instructions on how to exercise them:
		* We have prepared 25 molecule text files in the directory group4/testcases/molecules for testing the program.
		* Additionally, to test the finding isomorphic molecule feature, we have set up a separate directory called group4/testcases/isomorphic_test. This directory contains text files of isomorphic molecules. For example, the file named biotin_iso.txt contains the isomorphic molecule of biotin. Users can use this file and biotin.txt inside group4/testcases/molecules to test the isomorphic molecule search feature.
		* To test with the provided text files, please read the Execution section.
		a. A list of non-standard libraries needed in order to generate Molecule inputs using the Python script and the command --downloadPubChem:
		- NetworkX: $ pip install --user networkx[default]
		b. Users can download and store all molecules for their work with the molecule database. Here are some examples of how the molecule database can benefit:
		- Users can utilize the GUI to visualize molecules from text files in 2D Lewis structures before adding them to the database.
		- If users have multiple molecule files, they can start with a clean database, add all molecules, and then search for isomorphic molecules they are interested in. The same applies when users want to find all molecules that contain a given subgraph.
		- Lastly, users can examine all molecule names, along with statistics like the number of atoms in each molecule, and identify the largest or smallest molecule among the files by printing the molecule database.
		c. Descriptions of testing patterns and instructions on how to exercise them:
		- We have prepared 25 molecule text files in the directory group4/testcases/molecules for testing the program.
		- Additionally, to test the finding isomorphic molecule feature, we have set up a separate directory called group4/testcases/isomorphic_test. This directory contains text files of isomorphic molecules. For example, the file named biotin_iso.txt contains the isomorphic molecule of biotin. Users can use this file and biotin.txt inside group4/testcases/molecules to test the isomorphic molecule search feature.
		- To test with the provided text files, please read the Execution section.

		3. Execution
		Instruction to run the program on the terminal: To execute the program via the command-line interface, navigate to the directory where the md file is located. Once in the directory, users can run one of the following commands:

		./md --addMolecule [FILE PATH]: This command adds the molecule specified in the file path to the database. For example, to add biotin from the file biotin.txt to the database, run: ./md --addMolecule /path/to/directory/biotin.txt.

		./md --addProteins [FILE PATH]: This command adds protein molecules generated by `--makeManySimple` command and `--makeFewComplex` command to the database. The default path for --makeManySimple is “../simple” and `--makeFewComplex` “../complex”. For instance, ./md --addproteins ../simple and ./md --addProteins ../complex.

		./md --findMolecule [FILE PATH]: This command searches for the isomorphic molecule specified in the file path within the database. For instance, to find an isomorphic molecule to biotin within the database using the file biotin.txt, run: ./md --findMolecule /path/to/directory/biotin.txt.

		./md --findSubgraph [FILE PATH]: This command finds and outputs all molecules containing a subgraph provided in the file path.

		./md --downloadPubChem start,end: This command downloads molecules from the PubChem database. "start" and "end" are indices or Compound ID (CID) numbers of the molecules in the PubChem database. For example, to download molecules 20-27, type: ./md --downloadPubChem 20,27

		./md --printDb: This command prints the list of molecules inside the database and their number of atoms.

		./md --printName: This command prints the name of the database.

		./md --verbose: Upon entering this command, all subsequent commands will display additional information about the database (e.g., error messages). If the user runs this command again, subsequent commands will not output additional information.

		./md --makeManySimple: This command generates 10 million molecule files with between 52 and 136 atoms.

		./md --makeFewComplex: This command generates 10,000 million molecule files with over 10,000 atoms each.

		./md --marco: This command pings the server to check if it is still alive.

		./md --quit: This command saves the database and exits the program.

		Instruction to run the program using the GUI: To launch the GUI, simply click "Run" on the GUI.java file if using an Integrated Development Environment (IDE) like IntelliJ. Upon initialization, the GUI presents seven buttons for user interaction, accompanied by a sizable output area to display program results. Here are the instructions on how to use each button:

		* Choose File: Initiates a window allowing users to select a molecule file for processing. Upon selection, the chosen file's path is displayed in the designated field.
		* Add Molecule: This button allows users to add a molecule to the database.
		* Find Molecule: Users can select a molecule file and click this button to search for an isomorphic molecule in the database.
		* Find Subgraph: To find a subgraph, the user selects a file containing the desired subgraph and then clicks on this button to initiate the search for all molecules containing the provided subgraph.
		* Display Molecule: Similarly, users can select a molecule file and click this button to view the molecule's 2D Lewis structure in a separate pop-up window.
		* Download PubChem: Users can specify a range of CID indices (start, end) in the file path to download molecules from the PubChem database. For example, typing 14,16 in the file path and clicking the button download molecules 14-16.
		* Database Statistics: Clicking this option prints database statistics, including the total number of molecules, a list of molecules with their names and the number of atoms, and the names of the smallest and largest molecules in the database.
		a. Instructions on how to run the program on the terminal: To execute the program via the command-line interface, navigate to the directory where the md file is located. Once in the directory, users can run one of the following commands:
		- ./md --addMolecule [FILE PATH]: This command adds the molecule specified in the file path to the database. For example, to add biotin from the file biotin.txt to the database, run: ./md --addMolecule /path/to/directory/biotin.txt.
		- ./md --findMolecule [FILE PATH]: This command searches for the isomorphic molecule specified in the file path within the database. For instance, to find an isomorphic molecule to biotin within the database using the file biotin.txt, run: ./md --findMolecule /path/to/directory/biotin.txt.
		- ./md --findSubgraph [FILE PATH]: This command finds and outputs all molecules containing a subgraph provided in the file path.
		- ./md --downloadPubChem start,end: This command downloads molecules from the PubChem database. "start" and "end" are indices or Compound ID (CID) numbers of the molecules in the PubChem database. For example, to download molecules 20-27, type: ./md --downloadPubChem 20,27
		- ./md --printDb: This command prints the list of molecules inside the database and their number of atoms.
		- ./md --printName: This command prints the name of the database.
		- ./md --verbose: Upon entering this command, all subsequent commands will display additional information about the database (e.g., error messages). If the user runs this command again, subsequent commands will not output additional information.
		- ./md --makeManySimple: This command generates 10 million molecule files, each having between 52 and 136 atoms. Generated molecules are saved in a folder called `simple` that is located in the same directory as the project folder.
		- ./md --makeFewComplex: This command creates 10,000 million molecule files, each with over 10,000 atoms. Generated molecules are saved in a folder called `complex` that is located in the same directory as the project folder.
		- ./md --addProteins [FILE PATH]: This command adds proteins created by the `--makeManySimple` and `--makeFewComplex` commands. Please make sure to select a file path, which is either the `complex` or `simple` directory.
		- ./md --marco: This command pings the server to check if it is still alive.
		- ./md --quit: This command saves the database and exits the program.


		b. Instructions on how to run the program using the GUI: To launch the GUI, simply click "Run" on the GUI.java file if using an Integrated Development Environment (IDE) like IntelliJ. Upon initialization, the GUI presents seven buttons for user interaction, accompanied by a sizable output area to display program results. Here are the instructions on how to use each button:
		- Choose File: Initiates a window allowing users to select a molecule file for processing. Upon selection, the chosen file's path is displayed in the designated field.
		- Add Molecule: Users can select a molecule file and click this button to add the molecule to the database.
		- Find Molecule: Users can select a molecule file and click this button to search for an isomorphic molecule in the database.
		- Find Subgraph: To find a subgraph, the user selects a file containing the desired subgraph and then clicks on this button to initiate the search for all molecules containing the provided subgraph.
		- Display Molecule: Similarly, users can select a molecule file and click this button to view the molecule's 2D Lewis structure in a separate pop-up window.
		- Download PubChem: Users can specify a range of CID indices (start, end) in the file path to download molecules from the PubChem database. For example, typing 14,16 in the file/folder path and clicking the button download molecules 14-16.
		- Database Statistics: Clicking this option prints database statistics, including the total number of molecules, a list of molecules with their names and the number of atoms, and the names of the smallest and largest molecules in the database.
		- Add Multiple Molecules: Users can choose a folder containing molecule text files and click this button to add all molecules in the specified folder to the database.
		- Make Simple Molecules: Users can click this button to generate 10 million molecule files, each having between 52 and 136 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `Simple` that is located in the same directory as the project folder.
		- Make Complex Molecules: Users can click this button to create 10,000 million molecule files, each with over 10,000 atoms. Please monitor the terminal output for progress updates. Molecules are saved in a folder called `Complex` that is located in the same directory as the project folder.
		- Add Proteins: After creating proteins from the `Make Simple Molecules` and `Make Complex Molecules` buttons, users can add these protein molecules to the database. Please make sure to choose a file path, either from Complex or Simple folders, as indicated above.

src/GUI.java

+151 −25

File changed.

Preview size limit exceeded, changes collapsed.

src/MDB.java

+28 −4

Original line number	Diff line number	Diff line
		@@ -214,20 +214,44 @@ public class MDB {
		filenames.add(file);
		}

		int exitCode = process.waitFor();
		outputTextArea.append("Exited with code: " + exitCode + "\n\n");

		// add created files to the database
		for (String filename : filenames) {
		this.addMolecule(new Molecule(filename));
		}

		outputTextArea.append("Download complete!" + "\n\n");

		} catch (Exception e) {
		outputTextArea.append("Error downloading from PubChem" + "\n\n");
		e.printStackTrace();
		}
		}

		/**
		* Add all molecules from a specified folder
		*/
		public void addMultipleMolecules(String path) {
		File directory = new File(path);
		// Check if the directory exists
		if (!directory.exists() \|\| !directory.isDirectory()) {
		outputTextArea.append("Invalid directory: " + path + "\n\n");
		return;
		}
		// Get list of files in the directory
		File[] files = directory.listFiles();
		if (files == null) {
		outputTextArea.append("No files found inside directory: " + path + "\n\n");
		return;
		}
		// Iterate over the files in the directory
		for (File file : files) {
		// Check if the file is a text file
		if (file.isFile() && file.getName().endsWith(".txt")) {
		addMolecule(new Molecule(file.getAbsolutePath()));
		}
		}
		outputTextArea.append("Complete adding all molecules from directory!" + "\n\n");
		}

		/**
		* Save database to file system
		*/

src/Main.java

+4 −0

Original line number	Diff line number	Diff line
		@@ -118,10 +118,14 @@ public class Main {
		String start = indexes[0];
		String end = indexes[1];
		moleculeDb.downloadPubChem(start, end);
		System.out.println("Download Complete!");
		} else {
		printVerbose("invalid Input");
		}
		break;
		case "--addBulk":
		moleculeDb.addMultipleMolecules(moleculePath);
		break;
		case "--delete":
		boolean delete= moleculeDb.deleteMolecule(new Molecule(moleculePath));
		if(delete)

src/MoleculeDatabase.java

+29 −52

Original line number	Diff line number	Diff line
		@@ -141,19 +141,44 @@ public class MoleculeDatabase {
		filenames.add(file);
		}

		int exitCode = process.waitFor();
		printVerbose("Exited with code: " + exitCode);

		// add created files to the database
		for (String filename : filenames) {
		this.addMolecule(new Molecule(filename));
		}

		System.out.println("Download complete!");

		} catch (Exception e) {
		e.printStackTrace();
		System.out.println("Error downloading from PubChem.");
		}
		}

		/**
		* Add all molecules from a specified folder
		*/
		public void addMultipleMolecules(String path) {
		File directory = new File(path);
		// Check if the directory exists
		if (!directory.exists() \|\| !directory.isDirectory()) {
		System.out.println("Invalid directory: " + path);
		return;
		}
		// Get list of files in the directory
		File[] files = directory.listFiles();
		if (files == null) {
		System.out.println("No files found inside directory: " + path);
		return;
		}
		// Iterate over the files in the directory
		for (File file : files) {
		// Check if the file is a text file
		if (file.isFile() && file.getName().endsWith(".txt")) {
		addMolecule(new Molecule(file.getAbsolutePath()));
		}
		}
		System.out.println("Complete adding all molecules from directory!");
		}


		/**
		* Find all molecules that contain the @param subgraph
		@@ -231,52 +256,4 @@ public class MoleculeDatabase {
		fileInStream.close();
		}

		/**
		* Get database statistics as a string
		* Note: this method is designed to work with the webpage
		*/
		public String showDb() {
		StringBuilder stringBuilder = new StringBuilder();

		// Print number of molecules
		int size = 0;
		for (ArrayList<Molecule> molecules : db.values()) {
		size += molecules.size();
		}
		stringBuilder.append("# of molecules: ").append(size).append("\n\n");

		if (size == 0)
		return stringBuilder.toString(); // if database is empty, return early

		// Print the list of molecules
		stringBuilder.append("List of molecules: ").append("\n\n");
		for (Integer atomCount : this.db.keySet()) {
		ArrayList<Molecule> moleculesWithSameNumAtoms = this.db.get(atomCount);
		for (Molecule molecule : moleculesWithSameNumAtoms) {
		stringBuilder.append("Molecule name: ").append(molecule.moleculeName).append("\n");
		stringBuilder.append("# of atoms: ").append(atomCount.toString()).append("\n\n");
		}
		}

		// Print the largest and smallest molecules
		int maxAtoms = Integer.MIN_VALUE;
		int minAtoms = Integer.MAX_VALUE;
		Molecule largestMolecule = null;
		Molecule smallestMolecule = null;
		for (Map.Entry<Integer, ArrayList<Molecule>> entry : db.entrySet()) {
		int numAtoms = entry.getKey();
		if (numAtoms > maxAtoms) {
		maxAtoms = numAtoms;
		largestMolecule = entry.getValue().get(0); // only print 1 representative molecule
		}
		if (numAtoms < minAtoms) {
		minAtoms = numAtoms;
		smallestMolecule = entry.getValue().get(0); // only print 1 representative molecule
		}
		}
		stringBuilder.append("Smallest molecule: ").append(smallestMolecule.moleculeName).append("\n");
		stringBuilder.append("Largest molecule: ").append(largestMolecule.moleculeName).append("\n\n");

		return stringBuilder.toString();
		}
		}