readability module

features.readability.classify_text_dalechall(score)

Classifies the Dale-Chall score into a category of “easy,” “medium,” or “difficult”:

  • A score is easy if it is below 4.9 (readable by a 4th grader or below)

  • A score is medium if is is between 4.9 and 5.9 (readable by a middle school student)

  • A score is difficult if it is above 5.9

Parameters:

score – the Dale-Chall readability score.

Returns:

The label/classification associated with the text.

Return type:

str

features.readability.count_difficult_words(text, easy_words)

Count the number of difficult words in a text. The difficult words are those that are not in an “easy words” list (passed in from the ChatLevelFeaturesCalculator, and originating in the get_dale_chall_easy_words() in Utilities).

Parameters:
  • text (str) – The message (utterance) being analyzed.

  • easy_words (list) – A list of “easy” words according to Dale-Chall. This comes from the Utilities.

Returns:

The number of difficult words.

features.readability.count_syllables(word)
features.readability.dale_chall_helper(text, easy_words)

Calculate the Dale-Chall readability score of a text. The Dale-Chall score are defined as:

0.1579 * ((difficult_words / words) * 100) + 0.0496 * (words / sentences)

If the percentage of difficult words is above 5%, then add 3.6365 to the raw score to get the adjusted score, otherwise the adjusted score is equal to the raw score.

In general, lower scores mean that a text is easier to read, and higher scores indicate that a text is harder to read.

Because of the need to split up sentences, this function requires the version of pre-processed text WITH punctuation retained.

Source: https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula

Citation (of example usage) in Cao et al. (2020): https://dl.acm.org/doi/pdf/10.1145/3432929

Parameters:
  • text (str) – The message (utterance) being analyzed.

  • easy_words (list) – A list of “easy” words according to Dale-Chall. This comes from the Utilities.

Returns:

The Dale-Chall Readability Score.

Return type:

float