calculate_conversation_level_features module

class utils.calculate_conversation_level_features.ConversationLevelFeaturesCalculator(chat_data: DataFrame, user_data: DataFrame, conv_data: DataFrame, vect_data: DataFrame, vector_directory: str, conversation_id_col: str, speaker_id_col: str, message_col: str, timestamp_col: str, convo_aggregation: bool, convo_methods: list, convo_columns: list, user_aggregation: bool, user_methods: list, user_columns: list, chat_features: list, logger)

Bases: object

Initialize variables and objects used by the ConversationLevelFeaturesCalculator class.

This class uses various feature modules to define conversation-level features. It reads input data and initializes variables required to compute the features.

Parameters:

chat_data (pd.DataFrame) – Pandas dataframe of chat-level features read from the input dataset
user_data (pd.DataFrame) – Pandas dataframe of user-level features derived from the chat-level dataframe
conv_data (pd.DataFrame) – Pandas dataframe of conversation-level features derived from the chat-level dataframe
vect_data (pd.DataFrame) – Pandas dataframe of processed vectors derived from the chat-level dataframe
vector_directory (str) – Directory where vector files are stored
convo_aggregation (bool) – If true, will aggregate features at the conversational level
convo_methods (list) – Specifies which functions users want to aggregate with (e.g., mean, stdev…)
convo_columns (list) – Specifies which columns (at the chat level) users want aggregated
user_aggregation – If true, will aggregate features at the user level
user_methods (list) – Specifies which functions users want to aggregate with (e.g., mean, stdev…) at the user level
user_columns (list) – Specifies which columns (at the chat level) users want aggregated for the user level
chat_features (list) – Tracks all the chat-level features generated by the toolkit

calculate_conversation_level_features(feature_methods: list) → DataFrame

Main driver function for creating conversation-level features.

This function computes various conversation-level features by aggregating chat-level and user-level features, and appends them as new columns to the input conversation-level data.

Parameters:: feature_methods – The list of methods to use to generate features
Returns:: The conversation-level dataset with new columns for each conversation-level feature
Return type:: pd.DataFrame

calculate_info_diversity() → None

Calculate an information diversity score for the team.

This function computes the information diversity score by looking at the cosine similarity between the mean topic vector of the team and each message’s topic vectors, and merges the results into the conversation-level data.

Returns:: None
Return type:: None

calculate_team_burstiness() → None

Calculate the team burstiness coefficient.

This function computes the team burstiness coefficient by looking at the differences in standard deviation and mean of the time intervals between chats, and merges the results into the conversation-level data.

Returns:: None
Return type:: None

get_conversation_level_aggregates() → None

Aggregate summary statistics from chat-level features to conversation-level features.

This function calculates and merges the following aggregation functions for each summarizable feature: - Average/Mean - Standard Deviation - Minimum - Maximum

For countable features (e.g., num_words, num_chars, num_messages), it also calculates and merges the sum.

Returns:: None
Return type:: None

get_discursive_diversity_features() → None

Calculate discursive diversity features for each conversation.

This function computes discursive diversity based on the word embeddings (SBERT) and chat-level information, and merges the features into the conversation-level data.

Returns:: None
Return type:: None

get_gini_features() → None

Calculate the Gini index for relevant features in the conversation.

This function computes the Gini index for features involving counts, such as: - Word count - Character count - Message count

The Gini index is then merged into the conversation-level data.

Returns:: None
Return type:: None

get_turn_taking_features() → None

Calculate the turn-taking index in the conversation.

This function merges turn-taking features into the conversation-level data.

Returns:: None
Return type:: None

get_user_level_aggregates() → None

Aggregate summary statistics from user-level features to conversation-level features.

This function calculates and merges the following aggregation functions for each user-level feature: - Average/Mean of summed user-level features - Standard Deviation of summed user-level features - Minimum of summed user-level features - Maximum of summed user-level features - Average/Mean of averaged user-level features - Standard Deviation of averaged user-level features - Minimum of averaged user-level features - Maximum of averaged user-level features

Returns:: None
Return type:: None