Named Entity Recognition

High-Level Intuition

This feature detects whether a user is talking about (or to) someone else in a conversation.

Citation

N/A

Implementation Basics

In conversations, named entities often matter: angry statements mean something different when they are directed at no one in particular, versus when they are directed at someone (e.g., who is being blamed for something). This feature uses a named entity recognizer (https://spacy.io/api/entityrecognizer) to identify whether someone is talking about (or to) someone else in a conversation.

Implementation Notes/Caveats

Users should pass in a training file and a threshold for confidence in the FeatureBuilder constructor. The parameter names are as follows:

‘ner_training_df’: This parameter expects a pandas DataFrame that contains the training data for named entity recognition.

Named Entity Training Examples
sentence_to_train	name_to_train
Helena’s idea sounds great!	Helena
I agree with Emily, what does everyone else think?	Emily
I think we can also work with Shruti’s idea.	Shruti
Maybe we should also ask Amy about this	Amy

The feature will not run without a provided training file. The file should contain ¼ of the quantity of named entities you expect to see as examples. For example, in a dataset with 100 named entities, the training file should provide 25 examples.

‘ner_cutoff’: This integer parameter specifies the threshold for confidence score for each prediction.

Each predicted named entity is associated with a confidence score that evaluates the probability of prediction of each entity. Users can pass in a cutoff value for the confidence scores. If this value is not provided, the default value is 0.9.

The model was tested on a dataset of 100 sentences with 50 unique names. Here are the following evaluation metrics:

Precision: 0.9855072464 Recall: 0.68

Interpreting the Feature

This feature will output the number of named entities in a message, the named entity, and its confidence scores. This is an example output format:

Output File
message	expected_value	num_named_entity	named_entities
Helena’s idea sounds great!	Helena	1	((Helena, 1.0))
Sounds great, Emily	Emily	1	((Emily, 0.95))
See you next week, Shruti!	Shruti	1	((Shruti, 0.9992))
Priya, did you see Amy today?	Priya, Amy	2	((Priya, 0.99954), (Amy, 0.9123))