politeness_v2_helper module
- features.politeness_v2_helper.Question(doc)
Counts the number of sentences containing question words and question marks.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
A tuple containing the counts of Yes/No questions and WH-questions.
- Return type:
tuple
- features.politeness_v2_helper.adverb_limiter(keywords, doc)
Searches for adverb modifiers in the text that match a list of keywords.
- Parameters:
keywords (dict) – A dictionary where the key ‘Adverb_Limiter’ contains a list of adverb modifiers to search for.
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
The count of adverb modifier matches in the text.
- Return type:
int
- features.politeness_v2_helper.bare_command(doc)
Checks if the first word of each sentence is a verb and not in a list of keywords.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
The count of sentences that start with a verb not in the keyword list.
- Return type:
int
- features.politeness_v2_helper.clean_text(text)
Cleans and normalizes text by replacing certain patterns and characters.
- Parameters:
text (str) – The input text to be cleaned.
- Returns:
The cleaned and normalized text.
- Return type:
str
- features.politeness_v2_helper.commit_data(path, path_in, folders, words_in_line)
Loads data from .txt files, creates one dictionary per folder and outputs each folder as a dictionary in a pickle file
- Parameters:
path (str) – The base directory path containing the folders with text files.
path_in (str) – The directory path to save the pickle files.
folders (list) – A list of folder names containing the text files.
words_in_line (list) – A list specifying whether each folder contains ‘single’ or ‘multiple’ words per line.
- Returns:
None
- features.politeness_v2_helper.conjection_seperator(text)
Separates text into segments based on conjunctions.
- Parameters:
text (str) – The input text to be separated by conjunctions.
- Returns:
A list of text segments separated by conjunctions.
- Return type:
list
- features.politeness_v2_helper.count_matches(keywords, doc)
Counts the occurrences of prespecified keywords in a text.
- Parameters:
keywords (dict) – A dictionary where keys are feature names and values are lists of phrases to search for.
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
A DataFrame with the counts of keyword matches for each feature.
- Return type:
pd.DataFrame
- features.politeness_v2_helper.count_spacy_matches(keywords, dep_pairs)
Counts occurrences of prespecified dependency pairs in a list of dependency pairs.
- Parameters:
keywords (dict) – A dictionary where keys are feature names and values are lists of dependency pairs to search for.
dep_pairs (list) – A list of dependency pairs extracted from the text.
- Returns:
A DataFrame with the counts of dependency pair matches for each feature.
- Return type:
pd.DataFrame
- features.politeness_v2_helper.feat_counts(text, kw)
Extracts various linguistic features from a text using predefined keywords and dependency pairs.
- Parameters:
text (str) – The text to be analyzed.
kw (dict) – A dictionary containing predefined keywords and dependency pairs.
- Returns:
A DataFrame with counts of various linguistic features.
- Return type:
pd.DataFrame
- features.politeness_v2_helper.get_dep_pairs(doc)
Extracts dependency pairs from a spaCy Doc object and handles negations.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
A tuple containing a list of dependency pairs and a list of negations.
- Return type:
tuple
- features.politeness_v2_helper.get_dep_pairs_noneg(doc)
Extracts dependency pairs from a spaCy Doc object without handling negations.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
A list of dependency pairs from the input text.
- Return type:
list
- features.politeness_v2_helper.load_saved_data(path_in, folders)
Loads predefined keywords and dependency pairs
- Parameters:
path_in (str) – The directory path containing the pickle files.
folders (list) – A list of folder names to load the pickle files from.
- Returns:
A dictionary where keys are folder names and values are dictionaries of keywords and dependency pairs.
- Return type:
dict
- features.politeness_v2_helper.load_to_dict(path, words)
Loads keywords from text files in a specified directory into a dictionary.
- Parameters:
path (str) – The directory path containing the text files.
words (str) – Specifies whether to load ‘single’ or ‘multiple’ words per line.
- Returns:
A dictionary where keys are filenames and values are lists of keywords.
- Return type:
dict
- features.politeness_v2_helper.load_to_lists(path, words)
Loads keywords from text files in a specified directory into lists.
- Parameters:
path (str) – The directory path containing the text files.
words (str) – Specifies whether to load ‘single’ or ‘multiple’ words per line.
- Returns:
A tuple containing a list of feature names and a list of keywords.
- Return type:
tuple
- features.politeness_v2_helper.phrase_split(text)
Splits text into phrases based on punctuation and conjunctions.
- Parameters:
text (str) – The input text to be split into phrases.
- Returns:
A list of phrases from the input text.
- Return type:
list
- features.politeness_v2_helper.prep_simple(text)
Preprocesses text by cleaning and removing certain characters.
- Parameters:
text (str) – The input text to be preprocessed.
- Returns:
The preprocessed text.
- Return type:
str
- features.politeness_v2_helper.prep_whole(text)
Preprocesses text by cleaning, removing certain characters, and filtering out stopwords.
- Parameters:
text (str) – The input text to be preprocessed.
- Returns:
The preprocessed text with stopwords removed.
- Return type:
str
- features.politeness_v2_helper.punctuation_seperator(text)
Separates text into segments based on punctuation.
- Parameters:
text (str) – The input text to be separated by punctuation.
- Returns:
A list of text segments with punctuation removed.
- Return type:
list
- features.politeness_v2_helper.sentence_pad(doc)
Pads the sentences of a spaCy Doc object by concatenating them with simple preprocessing.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be padded.
- Returns:
A single string with all sentences concatenated and preprocessed.
- Return type:
str
- features.politeness_v2_helper.sentence_split(doc)
Splits a spaCy Doc object into a list of sentences, each with simple preprocessing.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be split into sentences.
- Returns:
A list of preprocessed sentences from the input Doc object.
- Return type:
list
- features.politeness_v2_helper.sentenciser(text)
Splits text into sentences using spaCy.
- Parameters:
text (str) – The input text to be split into sentences.
- Returns:
A list of sentences from the input text.
- Return type:
list
- features.politeness_v2_helper.token_count(doc)
Counts the number of tokens (words) in a spaCy Doc object.
- Parameters:
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
The number of tokens in the input text.
- Return type:
int
- features.politeness_v2_helper.word_start(keywords, doc)
Finds the first words in text that match a list of keywords.
- Parameters:
keywords (dict) – A dictionary where keys are feature names and values are lists of first words to search for.
doc (spacy.tokens.Doc) – The spaCy Doc object containing the text to be analyzed.
- Returns:
A DataFrame with the counts of first word matches for each feature.
- Return type:
pd.DataFrame