Hate Speech#

Open In Colab

Hate speech in text is often defined text that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation.

DaCy currently does not include its own tools for hate-speech analysis, but incorperates existing state-of-the-art models for Danish. The hate-speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Name

Creator

Domain

Output Type

Model Type

dacy/hatespeech_detection

DaNLP

Facebook

["not offensive", "offensive"]

Ælæctra

dacy/hatespeech_classification

DaNLP

Facebook

["særlig opmærksomhed", "personangreb", "sprogbrug", "spam & indhold"]

Danish BERT by BotXO

Other models for Hate Speech detection

There exist others models for Danish hate-speech detection. We have chosen the BERT offensive model as it obtains a reasonable trade-off between good [performance and speed] and includes a classification for classifying the type of hate-speech. The other models includes

The hate speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Usage#

To add the emotion models to your pipeline simply run:

import dacy
import spacy

nlp = spacy.blank("da")  # create an empty pipeline

# add the hate speech models
nlp.add_pipe("dacy/hatespeech_detection")
nlp.add_pipe("dacy/hatespeech_classification")
Hide code cell output
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:309: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/runner/.local/lib/python3.10/site-packages/transformers/utils/generic.py:309: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.
  _torch_pytree._register_pytree_node(
/home/runner/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<spacy_wrap.pipeline_component_seq_clf.SequenceClassificationTransformer at 0x7fb0f635fb20>

This wil set the two extensions to the Doc object, is_offensive and hate_speech_type. These shows whether a text is emotionally laden and what emotion it contains.

Both of these also come with *_prob-suffix if you want to examine the probabilites of the models.

Let’s look at an example using the model:

texts = ["senile gamle idiot", "hej har du haft en god dag"]

# apply the pipeline
docs = nlp.pipe(texts)

for doc in docs:
    # print model predictions
    print(doc._.is_offensive)
    # print type of hate-speech if it is hate-speech
    if doc._.is_offensive == "offensive":
        print("\t", doc._.hate_speech_type)
/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):
offensive
	 sprogbrug
not offensive
/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):