get_all_DD_features module

features.get_all_DD_features.conv_to_float_arr(df)

Converts message embeddings in pd.DataFrame from string format to float arrays.

Parameters:

df (pd.DataFrame) – pd.DataFrame containing ‘message_embedding’ column with string-encoded embeddings.

Returns:

pd.DataFrame with ‘message_embedding’ column containing float arrays.

Return type:

pd.DataFrame

features.get_all_DD_features.get_DD_features(chat_data, vect_data, conversation_id_col, speaker_id_col, timestamp_col)

This is an “umbrella” feature called at the conversation level. Returns four discusive metrics: discursive diversity, variance in discursive diversity, incongruent modulation, and within person discursive range.

Parameters:
  • chat_data (pd.DataFrame) – pd.DataFrame containing conversation-level chat data.

  • vect_data (pd.DataFrame) – pd.DataFrame containing vectorized data.

  • conversation_id_col (str) – Column name for conversation identifiers.

  • speaker_id_col (str) – Column name for speaker identifiers.

  • timestamp_col (str) – Column name for message timestamps.

Returns:

pd.DataFrame containing merged discursive metrics for each conversation.

Return type:

pd.DataFrame