Update README.md (a78a109e) · Commits · EC504 Spring 2024 Group Projects / Group6

README.md

+2 −1

Original line number	Diff line number	Diff line
		@@ -32,9 +32,10 @@ Our checker is using two different methods to assign confidence points. The firs

		#### State Machine Checker
		In order to have a good working State machine we need to first update the grammar and roles of each word manually. This step will be automatized to some extent in the next milestones. Tokens are provided in `CheckerCorrector/SQLite/mydatabase.db`, and also the basic graph provided by `CheckerCorrector/DirectedGraph/BasicGraph.java`. A sentence will first go through a typo checker and get updated if needed (it will also affect the confidence score). Then the sentence will be tokenized, and using the provided graph it will check whether the sentence is following the correct format or not, for each miss on any edge of the graph a penalty will be added to the confidence score.
		The typo corrector is calculating the distance of a word to dictionary(supposly provided by crawler). Here we used the small dictionary of homewrok2, and replace the misspelled word with closest distant word with some condition.

		#### n-Grams checker
		This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence. In order to store the data of the crawled data we used SHA-256 hash and store the result in SQLite/hash_database.db.
		This checker used the crawled data and gave a score by summing up all the n_grams probabilities of phrases in a sentence. In order to store the data of the crawled data we used SHA-256 hash and store the result in CheckerCorrector/SQLite/hash_database.db.

		### Proof of Effictiveness
		In order to show the effictiveness of our tool we used ChatGBT to write a script that is using a third-party Python library to make the exact same score for each sentence. These can be found in CheckerCorrector/samples/ directory. Both Json generated by our tool and the third party are available.