Colab

Sentiment Analysis#

Sentiment analysis (or opinion mining) is a method used to determine whether text is positive, negative or neutral. Sentiment analysis is e.g. used by businesses to monitor brand and product sentiment in customer feedback, or in research to e.g. examine political biases.

Sentiment analysis can be split into rule-based and neural approaches. Rule-based approaches typically used a dictionary of rated positive and negative words and employs a series of rules such as negations to estimate whether a text is postive and negative.

Typically rules-based approaches are notably faster, but performs worse compared to their neural counterpart especially on more complex sentiment such as sarcasm where it is hard to defined clear rules. It is thus important to take this into consideration when choosing between the models.

Overview of Sentiment Models#

DaCy include a variety of models for sentiment analysis. Depending on the use-case different models might be more suitable.

Model

Reference

Domain

Output Type

Model Type

dacy/subjectivity

DaNLP

Europarl and Twitter

["objective", "subjective"]

Neural (Danish BERT)

dacy/polarity

DaNLP

Europarl and Twitter

["positive", "neutral", "negative"]

Neural (Danish BERT)

dacy/emotion

DaNLP

Social Media

["Emotional", "No emotion"] and ["Glæde/Sindsro", "Tillid/Accept", ... ]

Neural (Danish BERT)

asent_da_v1

Asent

Microblogs and Social media

Polarity score (continuous)

Rule-based

Subjectivity#

The subjectivity model is a part of BertTone a model trained by DaNLP. The models detect whether a text is subjective or objective in its phrasing.

To add the subjectivity model to your pipeline simply run:

import dacy
import spacy

nlp = spacy.blank("da")  # an empty spacy pipeline
# could also be a dacy pipeline, e.g. nlp = dacy.load("large")
nlp.add_pipe("dacy/subjectivity")
Hide code cell output
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:309: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:309: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<spacy_wrap.pipeline_component_seq_clf.SequenceClassificationTransformer at 0x7f1e92b3e440>

This will add the dacy.subjectivity component to your pipeline, which adds two extensions to the Doc object,subjectivity_prob and subjectivity. These show the probabilities of a document being subjective and whether not a document is subjective or objective. Let’s look at an example using the model:

texts = [
    "Analysen viser, at økonomien bliver forfærdelig dårlig",
    "Jeg tror alligevel, det bliver godt",
]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.subjectivity)
    print(doc._.subjectivity_prob)
/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):
objective
{'prob': array([1., 0.], dtype=float32), 'labels': ['objective', 'subjective']}
subjective
{'prob': array([0., 1.], dtype=float32), 'labels': ['objective', 'subjective']}

Polarity#

Similar to the subjectivity model, the polarity model is a of the BertTone model. This model classifies the polarity of a text, i.e. whether it is positve, negative or neutral.

To add the polarity model to your pipeline simply run:

nlp = spacy.blank("da")  # an empty spacy pipeline
# could also be a dacy pipeline, e.g. nlp = dacy.load("large")
nlp.add_pipe("dacy/polarity")
Hide code cell output
/home/runner/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<spacy_wrap.pipeline_component_seq_clf.SequenceClassificationTransformer at 0x7f1e681cbe80>

This will add the dacy.polarity component to your pipeline, which adds two extensions to the Doc object,polarity_prob and polarity. These show the probabilities of a document being positive/neutral/negative and the resulting classification. Let’s look at an example using the model:

# apply the pipeline
docs = nlp.pipe(texts)

for doc in docs:
    # print the model predictions
    print(doc._.polarity)
    print(doc._.polarity_prob)
negative
{'prob': array([0.002, 0.008, 0.99 ], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}
positive
{'prob': array([0.981, 0.019, 0.   ], dtype=float32), 'labels': ['positive', 'neutral', 'negative']}
/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):

Emotion#

The emotion model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is emotionally laden and one for classifying which emotion it is out of the following emotions: Similar to the subjectivity model, the polarity model is a of the BertTone model. This model classifies the polarity of a text, i.e. whether it is positve, negative or neutral.

  • “Glæde/Sindsro” (happiness)

  • “Tillid/Accept” (trust/acceptance)

  • “Forventning/Interrese” (interest)

  • “Overasket/Målløs” (surprise)

  • “Vrede/Irritation” (Anger)

  • “Foragt/Modvilje” (Contempt)

  • “Sorg/trist” (Sadness)

  • “Frygt/Bekymret” (Fear)

To add the emotion models to your pipeline simply run:

nlp = spacy.blank("da")  # an empty spacy pipeline
# could also be a dacy pipeline, e.g. nlp = dacy.load("large")
nlp.add_pipe("dacy/emotionally_laden")  # for emotianal/non-emotional
nlp.add_pipe("dacy/emotion")  # for type of emotion
Hide code cell output
/home/runner/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:519: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(checkpoint_file, map_location=map_location)
<spacy_wrap.pipeline_component_seq_clf.SequenceClassificationTransformer at 0x7f1f847f52a0>

This wil set the two extensions to the Doc object, laden and emotion. These shows whether a text is emotionally laden and what emotion it contains. Both of these also come with *_prob-suffix if you want to examine the probabilites of the model.

Let’s look at an example using the model:

texts = [
    "Ej den bil er såå flot",
    "Fuck det er bare så FUCKING træls!",
    "Har i set at Tesla har landet en raket på månen? Det er vildt!!",
    "der er et træ i haven",
]

docs = nlp.pipe(texts)

for doc in docs:
    print(doc._.emotionally_laden)
    # if emotional print the emotion
    if doc._.emotionally_laden == "emotional":
        print("\t", doc._.emotion)
emotional
	 tillid/accept
emotional
	 sorg/trist
emotional
	 overasket/målløs
no emotion

Dictionary-Based Sentiment#

if you wish to perform rule-based sentiment analysis using DaCy we recommend using Asent. Asent is a rule-based sentiment analysis library for performing sentiment analysis for multiple languages including Danish.

To get started using Asent install it using:

pip install asent

first we will need to set up the spaCy pipeline, which only need to include a method for creating sentences. You can use DaCy for this as it performs dependendency parsing, but it is notably faster to use a rule-based sentencizer.

import asent
import spacy

# load a spacy pipeline
# equivalent to a dacy.load()
# but notably faster
nlp = spacy.blank("da")
nlp.add_pipe("sentencizer")

# add the rule-based sentiment model from asent.
nlp.add_pipe("asent_da_v1")

# try an example
text = "jeg er ikke mega glad."
doc = nlp(text)

# print polarity of document, scaled to be between -1, and 1
print(doc._.polarity)
neg=0.413 neu=0.587 pos=0.0 compound=-0.5448 n_sentences=1

Asent also allow us to obtain more information such as the rated valence of a single token, whether a word is a negation or the valence of a words accounting for its context (polarity):

for token in doc:
    print(
        f"{token._.polarity} | Valence: {token._.valence} | Negation: {token._.is_negation}"
    )
polarity=0.0 token=jeg span=jeg | Valence: 0.0 | Negation: False
polarity=0.0 token=er span=er | Valence: 0.0 | Negation: False
polarity=0.0 token=ikke span=ikke | Valence: 0.0 | Negation: True
polarity=0.0 token=mega span=mega | Valence: 0.0 | Negation: False
polarity=-2.516 token=glad span=ikke mega glad | Valence: 3.0 | Negation: False
polarity=0.0 token=. span=. | Valence: 0.0 | Negation: False

Here we see that words such as “glad” (happy) is rated positively (valence), but accounting for the negation “ikke” (not) it becomes negative. Furthermore, Asent also allows you to visualize the predictions:

Learn more

If you want to learn more about how asent works check out the excellent documentation.

# visualize model prediction
asent.visualize(doc, style="prediction")
jeg er ikke -2.5 mega glad .
# visualize the analysis performed by the model:
asent.visualize(doc, style="analysis")
jeg 0.0 er 0.0 ikke 0.0 mega 0.0 glad -2.5 (3.0) . 0.0 intensified by negated by

Other resources

Asent uses a dictionary of words rated by humans. It is possible to change these ratings out. Notably it uses the words rated by two other resources; AFINN, which does not implement any rules such as negations and sentida which does use rules similarly to asent.