summarize_features module

utils.summarize_features.get_max(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the maximum value of a specified column per conversation.

This function calculates the maximum value of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated maximum value.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be aggregated for maximum value.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

A DataFrame with the conversation number and the maximum value of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_mean(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the mean of a specified column per conversation.

This function calculates the mean of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated mean.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be averaged.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

A DataFrame with the conversation number and the mean of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_median(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the median of a specified column per conversation.

This function calculates the median of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated median.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be aggregated for median.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

A DataFrame with the conversation number and the median of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_min(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the minimum value of a specified column per conversation.

This function calculates the minimum value of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated minimum value.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be aggregated for minimum value.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

A DataFrame with the conversation number and the minimum value of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_stdev(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the standard deviation of a specified column per conversation.

This function calculates the standard deviation of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated standard deviation.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be aggregated for standard deviation.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

Returns:

A DataFrame with the conversation number and the standard deviation of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_sum(input_data, column_to_summarize, new_column_name, conversation_id_col)

Generate a summary DataFrame with the sum of a specified column per conversation.

This function calculates the sum of a specified column for each conversation in the input data, and returns a DataFrame containing the conversation number and the calculated sum.

Parameters:
  • input_data (pandas.DataFrame) – The DataFrame containing data at the chat or user level.

  • column_to_summarize (str) – The name of the column to be aggregated for the sum.

  • new_column_name (str) – The desired name for the new summary column.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID

Returns:

A DataFrame with the conversation number and the sum of the specified column.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_max_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame by maximizing a specified column per individual.

This function groups chat-level data by user and conversation, calculates the max values of a specified numeric column for each user, and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to max for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the max of the specified column per individual.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_mean_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame by averaging a specified column per individual.

This function groups chat-level data by user and conversation, calculates the mean values of a specified numeric column for each user, and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to mean for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the mean of the specified column per individual.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_median_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame with the median of a specified column per individual.

This function groups chat-level data by user and conversation, calculates the median values of a specified numeric column for each user, and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to median for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the median of the specified column per individual.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_min_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame by minmizing a specified column per individual.

This function groups chat-level data by user and conversation, calculates the min values of a specified numeric column for each user, and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to max for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the min of the specified column per individual.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_stdev_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame with the standard deviation a specified column per individual.

This function groups chat-level data by user and conversation, calculates the standard deviation values of a specified numeric column for each user, and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to standard deviation for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the standard deviation of the specified column per individual.

Return type:

pandas.DataFrame

utils.summarize_features.get_user_sum_dataframe(chat_level_data, on_column, conversation_id_col, speaker_id_col)

Generate a user-level summary DataFrame by summing a specified column per individual.

This function groups chat-level data by user and conversation, sums the values of a specified numeric column for each user (speaker), and returns the resulting DataFrame.

Parameters:
  • chat_level_data (pandas.DataFrame) – The DataFrame in which each row represents a single chat.

  • on_column (str) – The name of the numeric column to sum for each user.

  • conversation_id_col (str) – A string representing the column name that should be selected as the conversation ID.

  • speaker_id (str) – The column name representing the user identifier.

Returns:

A grouped DataFrame with the total sum of the specified column per individual.

Return type:

pandas.DataFrame