calculate_conversation_level_features module

class utils.calculate_conversation_level_features.ConversationLevelFeaturesCalculator(chat_data: DataFrame, user_data: DataFrame, conv_data: DataFrame, vect_data: DataFrame, vector_directory: str, conversation_id_col: str, speaker_id_col: str, message_col: str, timestamp_col: str, convo_aggregation: bool, convo_methods: list, convo_columns: list, user_aggregation: bool, user_methods: list, user_columns: list, chat_features: list, logger)

Bases: object

Initialize variables and objects used by the ConversationLevelFeaturesCalculator class.

This class uses various feature modules to define conversation-level features. It reads input data and initializes variables required to compute the features.

Parameters:
  • chat_data (pd.DataFrame) – Pandas dataframe of chat-level features read from the input dataset

  • user_data (pd.DataFrame) – Pandas dataframe of user-level features derived from the chat-level dataframe

  • conv_data (pd.DataFrame) – Pandas dataframe of conversation-level features derived from the chat-level dataframe

  • vect_data (pd.DataFrame) – Pandas dataframe of processed vectors derived from the chat-level dataframe

  • vector_directory (str) – Directory where vector files are stored

  • convo_aggregation (bool) – If true, will aggregate features at the conversational level

  • convo_methods (list) – Specifies which functions users want to aggregate with (e.g., mean, stdev…)

  • convo_columns (list) – Specifies which columns (at the chat level) users want aggregated

  • user_aggregation – If true, will aggregate features at the user level

  • user_methods (list) – Specifies which functions users want to aggregate with (e.g., mean, stdev…) at the user level

  • user_columns (list) – Specifies which columns (at the chat level) users want aggregated for the user level

  • chat_features (list) – Tracks all the chat-level features generated by the toolkit

calculate_conversation_level_features(feature_methods: list) DataFrame

Main driver function for creating conversation-level features.

This function computes various conversation-level features by aggregating chat-level and user-level features, and appends them as new columns to the input conversation-level data.

Parameters:

feature_methods – The list of methods to use to generate features

Returns:

The conversation-level dataset with new columns for each conversation-level feature

Return type:

pd.DataFrame

calculate_info_diversity() None

Calculate an information diversity score for the team.

This function computes the information diversity score by looking at the cosine similarity between the mean topic vector of the team and each message’s topic vectors, and merges the results into the conversation-level data.

Returns:

None

Return type:

None

calculate_team_burstiness() None

Calculate the team burstiness coefficient.

This function computes the team burstiness coefficient by looking at the differences in standard deviation and mean of the time intervals between chats, and merges the results into the conversation-level data.

Returns:

None

Return type:

None

get_conversation_level_aggregates() None

Aggregate summary statistics from chat-level features to conversation-level features.

This function calculates and merges the following aggregation functions for each summarizable feature: - Average/Mean - Standard Deviation - Minimum - Maximum

For countable features (e.g., num_words, num_chars, num_messages), it also calculates and merges the sum.

Returns:

None

Return type:

None

get_discursive_diversity_features() None

Calculate discursive diversity features for each conversation.

This function computes discursive diversity based on the word embeddings (SBERT) and chat-level information, and merges the features into the conversation-level data.

Returns:

None

Return type:

None

get_gini_features() None

Calculate the Gini index for relevant features in the conversation.

This function computes the Gini index for features involving counts, such as: - Word count - Character count - Message count

The Gini index is then merged into the conversation-level data.

Returns:

None

Return type:

None

get_turn_taking_features() None

Calculate the turn-taking index in the conversation.

This function merges turn-taking features into the conversation-level data.

Returns:

None

Return type:

None

get_user_level_aggregates() None

Aggregate summary statistics from user-level features to conversation-level features.

This function calculates and merges the following aggregation functions for each user-level feature: - Average/Mean of summed user-level features - Standard Deviation of summed user-level features - Minimum of summed user-level features - Maximum of summed user-level features - Average/Mean of averaged user-level features - Standard Deviation of averaged user-level features - Minimum of averaged user-level features - Maximum of averaged user-level features

Returns:

None

Return type:

None