burstiness module

features.burstiness.burstiness(df, timediff)

Computes the level of “burstiness” in a conversation, or the extent to which messages in a conversation occur periodically (e.g., every X seconds), versus in a “bursty” pattern (e.g., with long pauses and many messages in rapid succession.)

The coefficient of variation, B, is sourced from Reidl and Wooley (2016): https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384068

B = (standard deviation of wait times - mean of wait times) / (standard deviation of wait times + mean of wait times)

Parameters:
  • df (pd.DataFrame) – The input dataframe, grouped by the conversation index, to which this function is being applied.

  • timediff (str) – The column name associated with the time differences between messages in a conversation (computed in a pre-processing step.)

Returns:

The team burstiness score (B)

Return type:

float

features.burstiness.get_team_burstiness(df, timediff, conversation_id_col)

Applies the burstiness coefficient to each conversation in the utterance (chat)-level dataframe and returns a conversation-level dataframe. The Burstiness feature takes advantage of the fact that we already compute the time difference between messages as one of the utterance (chat)-level features.

Parameters:
  • df (pd.DataFrame) – The utterance (chat)-level dataframe.

  • timediff (str) – The column name associated with the time differences between messages in a conversation (computed by the utterance-level feature, get_temporal_features.)

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

a grouped dataframe that contains the conversation identifier as the key, and contains a new column (“team_burstiness”) for each group’s burstiness coefficient.

Return type:

pd.DataFrame