readability module
- features.readability.classify_text_dalechall(score)
Classifies the Dale-Chall score into a category of “easy,” “medium,” or “difficult”:
A score is easy if it is below 4.9 (readable by a 4th grader or below)
A score is medium if is is between 4.9 and 5.9 (readable by a middle school student)
A score is difficult if it is above 5.9
- Parameters:
score – the Dale-Chall readability score.
- Returns:
The label/classification associated with the text.
- Return type:
str
- features.readability.count_difficult_words(text, easy_words)
Count the number of difficult words in a text. The difficult words are those that are not in an “easy words” list (passed in from the ChatLevelFeaturesCalculator, and originating in the get_dale_chall_easy_words() in Utilities).
- Parameters:
text (str) – The message (utterance) being analyzed.
easy_words (list) – A list of “easy” words according to Dale-Chall. This comes from the Utilities.
- Returns:
The number of difficult words.
- features.readability.count_syllables(word)
- features.readability.dale_chall_helper(text, easy_words)
Calculate the Dale-Chall readability score of a text. The Dale-Chall score are defined as:
0.1579 * ((difficult_words / words) * 100) + 0.0496 * (words / sentences)
If the percentage of difficult words is above 5%, then add 3.6365 to the raw score to get the adjusted score, otherwise the adjusted score is equal to the raw score.
In general, lower scores mean that a text is easier to read, and higher scores indicate that a text is harder to read.
Because of the need to split up sentences, this function requires the version of pre-processed text WITH punctuation retained.
Source: https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula
Citation (of example usage) in Cao et al. (2020): https://dl.acm.org/doi/pdf/10.1145/3432929
- Parameters:
text (str) – The message (utterance) being analyzed.
easy_words (list) – A list of “easy” words according to Dale-Chall. This comes from the Utilities.
- Returns:
The Dale-Chall Readability Score.
- Return type:
float