# pysmiles: The lightweight and pure-python SMILES reader and writer
This directory should be used to generate test cases for the Molecule Database. `downloadPubChem.py` is a script used by the Molecule Database class in order to directly download chemical compounds from PubChem and should not be run by the user. `molecule_input.py` is a user-operated script that allows the user to generate input files based on user needs.
This is a small project I started because I couldn't find any SMILES reader or
writer that was easy to install (read: Python only). Currently, the writer is
extremely basic, and although it should produce valid SMILES they won't be
pretty, but see also issue #17. The reader is in a better state, and should be usable.
In `molecule_input.py`, molecule input files can be generated from User-specified SMILES strings or from the Pub Chem database, and they can either be standard or isomorphic. PubChem chemicals are requested from their PUG REST API by their Compound ID (CID) and the response is in the form of a JSON file, which contains the Title of the compound as well as their representative SMILES String. An example can be found [here](https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1/property/Title,CanonicalSMILES/json). When selected, generated isomorphic input files will have their atoms and edgelist representations scrambled from the standard order, in order to accurately test the find functions in the Molecule Database.
SMILES strings are assumed to be as specified by the
[OpenSmiles standard][opensmiles].
## Instructions:
1) In this directory, run `python molecule_input.py`
2) Follow the prompts in the CLI.
3) The generated molecules will be created in `./molecules` or `./isomorphic_test`, based on whether the selected file type is standard or isomorphic.
## Molecules
Molecules are depicted as [Networkx][networkx] graphs. Atoms are the nodes of
the graph, and bonds are the edges. Nodes can have the following attributes:
- element: str. This describes the element of the atom. Defaults to '\*'
meaning unknown.
- aromatic: bool. Whether the atom is part of an (anti)-aromatic system.
Defaults to False.
- isotope: float. The mass of the atom. Defaults to unknown.
- hcount: int. The number of implicit hydrogens attached to this atom.
Defaults to 0.
- charge: int. The charge of this atom. Defaults to 0.
- class: int. The "class" of this atom. Defaults to 0.
## Notes
- Any SMILES strings with aromatic bonds (1.5 order) will be ignored by the script.
- If the provided PubChem index does not have a provided `Title` in its JSON file, that Compound will be ignored by the script.
Edges have the following attributes:
- order: Number. The bond order. 1.5 is used for aromatic bonds. Defaults to 1.