burstiness module
- features.burstiness.burstiness(df, timediff)
Computes the level of “burstiness” in a conversation, or the extent to which messages in a conversation occur periodically (e.g., every X seconds), versus in a “bursty” pattern (e.g., with long pauses and many messages in rapid succession.)
The coefficient of variation, B, is sourced from Reidl and Wooley (2016): https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384068
B = (standard deviation of wait times - mean of wait times) / (standard deviation of wait times + mean of wait times)
- Parameters:
df (pd.DataFrame) – The input dataframe, grouped by the conversation index, to which this function is being applied.
timediff (str) – The column name associated with the time differences between messages in a conversation (computed in a pre-processing step.)
- Returns:
The team burstiness score (B)
- Return type:
float
- features.burstiness.get_team_burstiness(df, timediff, conversation_id_col)
Applies the burstiness coefficient to each conversation in the utterance (chat)-level dataframe and returns a conversation-level dataframe. The Burstiness feature takes advantage of the fact that we already compute the time difference between messages as one of the utterance (chat)-level features.
- Parameters:
df (pd.DataFrame) – The utterance (chat)-level dataframe.
timediff (str) – The column name associated with the time differences between messages in a conversation (computed by the utterance-level feature, get_temporal_features.)
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
a grouped dataframe that contains the conversation identifier as the key, and contains a new column (“team_burstiness”) for each group’s burstiness coefficient.
- Return type:
pd.DataFrame