summarize_features module
- utils.summarize_features.get_max(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the maximum value of a specified column per conversation.
This function calculates the maximum value of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated maximum value.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be aggregated for maximum value.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
A DataFrame with the conversation number and the maximum value of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_mean(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the mean of a specified column per conversation.
This function calculates the mean of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated mean.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be averaged.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
A DataFrame with the conversation number and the mean of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_median(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the median of a specified column per conversation.
This function calculates the median of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated median.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be aggregated for median.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
A DataFrame with the conversation number and the median of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_min(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the minimum value of a specified column per conversation.
This function calculates the minimum value of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated minimum value.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be aggregated for minimum value.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
A DataFrame with the conversation number and the minimum value of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_stdev(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the standard deviation of a specified column per conversation.
This function calculates the standard deviation of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated standard deviation.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be aggregated for standard deviation.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
- Returns:
A DataFrame with the conversation number and the standard deviation of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_sum(input_data, column_to_summarize, new_column_name, conversation_id_col)
Generate a summary DataFrame with the sum of a specified column per conversation.
This function calculates the sum of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated sum.
- Parameters:
input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.
column_to_summarize (str) – The name of the column to be aggregated for the sum.
new_column_name (str) – The desired name for the new summary column.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID
- Returns:
A DataFrame with the conversation number and the sum of the specified column.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_max_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame by maximizing a specified column per individual.
This function groups chat-level data by user and conversation, calculates the max values of a specified numeric column for each user, and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to max for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the max of the specified column per individual.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_mean_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame by averaging a specified column per individual.
This function groups chat-level data by user and conversation, calculates the mean values of a specified numeric column for each user, and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to mean for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the mean of the specified column per individual.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_median_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame with the median of a specified column per individual.
This function groups chat-level data by user and conversation, calculates the median values of a specified numeric column for each user, and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to median for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the median of the specified column per individual.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_min_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame by minmizing a specified column per individual.
This function groups chat-level data by user and conversation, calculates the min values of a specified numeric column for each user, and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to max for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the min of the specified column per individual.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_stdev_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame with the standard deviation a specified column per individual.
This function groups chat-level data by user and conversation, calculates the standard deviation values of a specified numeric column for each user, and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to standard deviation for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the standard deviation of the specified column per individual.
- Return type:
pandas.DataFrame
- utils.summarize_features.get_user_sum_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)
Generate a user-level summary DataFrame by summing a specified column per individual.
This function groups chat-level data by user and conversation, sums the values of a specified numeric column for each user (speaker), and returns the resulting DataFrame.
- Parameters:
chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.
on_column (str) – The name of the numeric column to sum for each user.
conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.
speaker_id (str) – The column name representing the user identifier.
- Returns:
A grouped DataFrame with the total sum of the specified column per individual.
- Return type:
pandas.DataFrame