politeness_v2_helper module

features.politeness_v2_helper.Question(doc)

Counts the number of sentences containing question words and question marks.

features.politeness_v2_helper.adverb_limiter(keywords, doc)

Searches for adverb modifiers in the text that match a list of keywords.

Parameters:
  • keywords (dict) – A dictionary where the key ‘Adverb_Limiter’ contains a list of adverb modifiers to search for.

  • doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

The count of adverb modifier matches in the text.

Return type:

int

features.politeness_v2_helper.bare_command(doc)

Checks if the first word of each sentence is a verb and not in a list of keywords.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

The count of sentences that start with a verb not in the keyword list.

Return type:

int

features.politeness_v2_helper.clean_text(text)

Cleans and normalizes text by replacing certain patterns and characters.

Parameters:

text (str) – The input text to be cleaned.

Returns:

The cleaned and normalized text.

Return type:

str

features.politeness_v2_helper.commit_data(path, path_in, folders, words_in_line)

Loads data from .txt files, creates one dictionary per folder and outputs each folder as a dictionary in a pickle file

Parameters:
  • path (str) – The base directory path containing the folders with text files.

  • path_in (str) – The directory path to save the pickle files.

  • folders (list) – A list of folder names containing the text files.

  • words_in_line (list) – A list specifying whether each folder contains ‘single’ or ‘multiple’ words per line.

Returns:

None

features.politeness_v2_helper.conjection_seperator(text)

Separates text into segments based on conjunctions.

Parameters:

text (str) – The input text to be separated by conjunctions.

Returns:

A list of text segments separated by conjunctions.

Return type:

list

features.politeness_v2_helper.count_matches(keywords, doc)

Counts the occurrences of prespecified keywords in a text.

Parameters:
  • keywords (dict) – A dictionary where keys are feature names and values are lists of phrases to search for.

  • doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

A DataFrame with the counts of keyword matches for each feature.

Return type:

pd.DataFrame

features.politeness_v2_helper.count_spacy_matches(keywords, dep_pairs)

Counts occurrences of prespecified dependency pairs in a list of dependency pairs.

Parameters:
  • keywords (dict) – A dictionary where keys are feature names and values are lists of dependency pairs to search for.

  • dep_pairs (list) – A list of dependency pairs extracted from the text.

Returns:

A DataFrame with the counts of dependency pair matches for each feature.

Return type:

pd.DataFrame

features.politeness_v2_helper.feat_counts(text, kw)

Extracts various linguistic features from a text using predefined keywords and dependency pairs.

Parameters:
  • text (str) – The text to be analyzed.

  • kw (dict) – A dictionary containing predefined keywords and dependency pairs.

Returns:

A DataFrame with counts of various linguistic features.

Return type:

pd.DataFrame

features.politeness_v2_helper.get_dep_pairs(doc)

Extracts dependency pairs from a spaCy Doc object and handles negations.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

A tuple containing a list of dependency pairs and a list of negations.

Return type:

tuple

features.politeness_v2_helper.get_dep_pairs_noneg(doc)

Extracts dependency pairs from a spaCy Doc object without handling negations.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

A list of dependency pairs from the input text.

Return type:

list

features.politeness_v2_helper.is_in_subordinate_clause(tok, sent)

Check if a token is inside a subordinate clause rather than the main clause.

features.politeness_v2_helper.load_saved_data(path_in, folders)

Loads predefined keywords and dependency pairs

Parameters:
  • path_in (str) – The directory path containing the pickle files.

  • folders (list) – A list of folder names to load the pickle files from.

Returns:

A dictionary where keys are folder names and values are dictionaries of keywords and dependency pairs.

Return type:

dict

features.politeness_v2_helper.load_to_dict(path, words)

Loads keywords from text files in a specified directory into a dictionary.

Parameters:
  • path (str) – The directory path containing the text files.

  • words (str) – Specifies whether to load ‘single’ or ‘multiple’ words per line.

Returns:

A dictionary where keys are filenames and values are lists of keywords.

Return type:

dict

features.politeness_v2_helper.load_to_lists(path, words)

Loads keywords from text files in a specified directory into lists.

Parameters:
  • path (str) – The directory path containing the text files.

  • words (str) – Specifies whether to load ‘single’ or ‘multiple’ words per line.

Returns:

A tuple containing a list of feature names and a list of keywords.

Return type:

tuple

features.politeness_v2_helper.phrase_split(text)

Splits text into phrases based on punctuation and conjunctions.

Parameters:

text (str) – The input text to be split into phrases.

Returns:

A list of phrases from the input text.

Return type:

list

features.politeness_v2_helper.prep_simple(text)

Preprocesses text by cleaning and removing certain characters.

Parameters:

text (str) – The input text to be preprocessed.

Returns:

The preprocessed text.

Return type:

str

features.politeness_v2_helper.prep_whole(text)

Preprocesses text by cleaning, removing certain characters, and filtering out stopwords.

Parameters:

text (str) – The input text to be preprocessed.

Returns:

The preprocessed text with stopwords removed.

Return type:

str

features.politeness_v2_helper.punctuation_seperator(text)

Separates text into segments based on punctuation.

Parameters:

text (str) – The input text to be separated by punctuation.

Returns:

A list of text segments with punctuation removed.

Return type:

list

features.politeness_v2_helper.sentence_pad(doc)

Pads the sentences of a spaCy Doc object by concatenating them with simple preprocessing.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be padded.

Returns:

A single string with all sentences concatenated and preprocessed.

Return type:

str

features.politeness_v2_helper.sentence_split(doc)

Splits a spaCy Doc object into a list of sentences, each with simple preprocessing.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be split into sentences.

Returns:

A list of preprocessed sentences from the input Doc object.

Return type:

list

features.politeness_v2_helper.sentenciser(text)

Splits text into sentences using spaCy.

Parameters:

text (str) – The input text to be split into sentences.

Returns:

A list of sentences from the input text.

Return type:

list

features.politeness_v2_helper.token_count(doc)

Counts the number of tokens (words) in a spaCy Doc object.

Parameters:

doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

The number of tokens in the input text.

Return type:

int

features.politeness_v2_helper.wh_is_real_question(tok, sent, auxiliaries, ends_with_question_mark=False)

Returns True if the WH-word token is part of a real main-clause question.

features.politeness_v2_helper.word_start(keywords, doc)

Finds the first words in text that match a list of keywords.

Parameters:
  • keywords (dict) – A dictionary where keys are feature names and values are lists of first words to search for.

  • doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.

Returns:

A DataFrame with the counts of first word matches for each feature.

Return type:

pd.DataFrame