Update README.md (c7825094) · Commits · EC504 Spring 2024 Group Projects / Group6

README.md

+1 −1

Original line number	Diff line number	Diff line
		@@ -34,7 +34,7 @@ Our checker is using two different methods to assign confidence points. The firs
		In order to have a good working State machine we need to first update the grammar and roles of each word manually. This step will be automatized to some extent in the next milestones. Tokens are provided in `CheckerCorrector/SQLite/mydatabase.db`, and also the basic graph provided by `CheckerCorrector/DirectedGraph/BasicGraph.java`. A sentence will first go through a typo checker and get updated if needed (it will also affect the confidence score). Then the sentence will be tokenized, and using the provided graph it will check whether the sentence is following the correct format or not, for each miss on any edge of the graph a penalty will be added to the confidence score.

		#### n-Grams checker
		This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence.
		This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence. In order to store the data of the crawled data we used SHA-256 hash and store the result in SQLite/hash_database.db.

		### Proof of Effictiveness
		In order to show the effictiveness of our tool we used ChatGBT to write a script that is using a third-party Python library to make the exact same score for each sentence. These can be found in CheckerCorrector/samples/ directory. Both Json generated by our tool and the third party are available.