other_lexical_features module

features.other_lexical_features.classify_NTRI(text)

Classify whether the message contains clarification questions, such as “what?” “sorry?” etc.

Performs a simple regex matching over a series of repair indicators from Ranganath et al. (2013). Source: https://sites.socsci.uci.edu/~lpearl/courses/readings/RanganathEtAl2013_DetectingFlirting.pdf

Parameters:: text (str) – The message (utterance) being analyzed.
Returns:: The number of matches for repair indicators.
Return type:: int

features.other_lexical_features.get_proportion_first_pronouns(df)

Get the proportion of first person pronouns: the total number of first person words divided by the total number of words.

Note that the function assumes that lexical features and basic features have already been run, and have generated a raw count of first-person pronouns (stored in “first_person_raw”) and a total number of words (stored in “num_words”).

Parameters:: df (pd.DataFrame) – This is a pandas dataframe of the chat level features.
Returns:: A column in the chat-level dataframe, in which we calculate the number of first-person pronouns over the total number of words. Defaults to 0 in case of a DIV/0 error.
Return type:: pd.Series

features.other_lexical_features.get_word_TTR(text)

Get the word type-token ratio, calculated as follows:

Number of Unique Words / Number of Total Words.

The function assumes that punctuation is retained when being inputted, but parses it out within the function.

Parameters:: text (str) – The message (utterance) being analyzed.
Returns:: The word type-token ratio.
Return type:: float