Commit c7825094 authored by Seyed Reza  Sajjadinasab's avatar Seyed Reza Sajjadinasab
Browse files

Update README.md

parent b8b63115
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -34,7 +34,7 @@ Our checker is using two different methods to assign confidence points. The firs
In order to have a good working State machine we need to first update the grammar and roles of each word manually. This step will be automatized to some extent in the next milestones. Tokens are provided in `CheckerCorrector/SQLite/mydatabase.db`, and also the basic graph provided by `CheckerCorrector/DirectedGraph/BasicGraph.java`. A sentence will first go through a typo checker and get updated if needed (it will also affect the confidence score). Then the sentence will be tokenized, and using the provided graph it will check whether the sentence is following the correct format or not, for each miss on any edge of the graph a penalty will be added to the confidence score.

#### n-Grams checker
This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence.
This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence. In order to store the data of the crawled data we used SHA-256 hash and store the result in SQLite/hash_database.db.

### Proof of Effictiveness
In order to show the effictiveness of our tool we used ChatGBT to write a script that is using a third-party Python library to make the exact same score for each sentence. These can be found in CheckerCorrector/samples/ directory. Both Json generated by our tool and the third party are available.