get_all_DD_features module
- features.get_all_DD_features.conv_to_float_arr(df)
Converts message embeddings in pd.DataFrame from string format to float arrays.
- Parameters:
df (pd.DataFrame) – pd.DataFrame containing ‘message_embedding’ column with string-encoded embeddings.
- Returns:
pd.DataFrame with ‘message_embedding’ column containing float arrays.
- Return type:
pd.DataFrame
- features.get_all_DD_features.get_DD_features(chat_data, vect_data, conversation_id_col, speaker_id_col, timestamp_col)
This is an “umbrella” feature called at the conversation level. Returns four discusive metrics: discursive diversity, variance in discursive diversity, incongruent modulation, and within person discursive range.
- Parameters:
chat_data (pd.DataFrame) – pd.DataFrame containing conversation-level chat data.
vect_data (pd.DataFrame) – pd.DataFrame containing vectorized data.
conversation_id_col (str) – Column name for conversation identifiers.
speaker_id_col (str) – Column name for speaker identifiers.
timestamp_col (str) – Column name for message timestamps.
- Returns:
pd.DataFrame containing merged discursive metrics for each conversation.
- Return type:
pd.DataFrame