calculate_user_level_features module
- class utils.calculate_user_level_features.UserLevelFeaturesCalculator(chat_data: DataFrame, user_data: DataFrame, vect_data: DataFrame, conversation_id_col: str, speaker_id_col: str, user_aggregation: bool, user_methods: list, user_columns: list, chat_features: list)
Bases:
objectInitialize variables and objects used by the UserLevelFeaturesCalculator class.
This class uses various feature modules to define user- (speaker) level features. It reads input data and initializes variables required to compute the features.
- Parameters:
chat_data (pd.DataFrame) – Pandas dataframe of chat-level features read from the input dataset
user_data (pd.DataFrame) – Pandas dataframe of user-level features derived from the chat-level dataframe
vect_data (pd.DataFrame) – Pandas dataframe of message embeddings corresponding to each instance of the chat data
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID. Defaults to “conversation_num”.
speaker_id_col (str) – A string representing the column name that should be selected as the speaker ID. Defaults to “speaker_nickname”.
user_aggregation (bool) – If true, will aggregate features at the user level
user_methods (list) – Specifies which functions users want to aggregate with (e.g., mean, stdev…) at the user level
user_columns (list) – Specifies which columns (at the chat level) users want aggregated for the user level
chat_features (list) – Tracks all the chat-level features generated by the toolkit
- calculate_user_level_features() DataFrame
Main driver function for creating user-level features.
This function computes various user-level features by aggregating chat-level data, and appends them as new columns to the input user-level data.
- Returns:
The user-level dataset with new columns for each user-level feature
- Return type:
pd.DataFrame
- get_centroids() None
Calculate the centroid of each user’s chats in a given conversation for future discursive metric calculations.
This function computes and appends the mean embedding (centroid) of each user’s chats to the user-level data.
- Returns:
None
- Return type:
None
- get_user_level_summary_statistics_features() None
This function is used to aggregate the summary statistics from chat level features to user level features.
There are many possible ways to aggregate user level features, e.g.: - Mean of all chats by a given user; - Max of all chats by a given user; - Weighted mean (e.g., looking at different time points?) … and so on.
This is an open question, so we are putting a TODO here.
- get_user_level_summed_features() None
Aggregate summary statistics from chat-level features that need to be summed together.
Features for which summing makes sense include: - Word count (total number of words) - Character count - Message count
This function calculates and merges the summed features into the user-level data.
- Returns:
None
- Return type:
None
- get_user_network() None
Get the user list per user per conversation.
This function calculates and appends the list of other users in a given conversation to the user-level data.
- Returns:
None
- Return type:
None