Certainty

High-Level Intuition

This feature measures how “certain”, i.e. confident or sure, a given utterance is, using the Certainty Lexicon published in Rocklage et al. (2023).

Citation

Rocklage et al. (2023)

Implementation Basics

The Certainty Lexicon provided a dictionary of words and phrases and their associated certainty metric. This feature uses a regular expression to identify these words and phrases within a given utterance, and subsequently averages the corresponding certainty metrics to get the overall certainty score. If no matches are found, the default value is 4.5 (indicating that an utterance is neither certain nor uncertain).

Implementation Notes/Caveats

Several of the keys in the Certainty Lexicon are substrings of each other, i.e. “I know it” and “I know it is”. In these cases, we match the utterance with the longer substring to avoid double counting.

Interpreting the Feature

Each key in the Certainty Lexicon is associated with a certainty score between 0 (very uncertain) and 9 (very certain). The feature computes the average certainty score of all matched words/phrases in a given utterance, so the score remains within this 0-9 range. If there are no matches found, then the default value of 4.5 (no certainty indicators present, “neutral certainty”) is returned.

Output File

message

certainty

I’m confident that she is on her way.

8.4

I’m not too sure.

0.7

My name is Emily.

4.5