Hate Speech#

Hate speech in text is often defined text that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation.

DaCy currently does not include its own tools for hate-speech analysis, but incorperates existing state-of-the-art models for Danish. The hate-speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Name	Creator	Domain	Output Type	Model Type
`dacy/hatespeech_detection`	DaNLP	Facebook	`["not offensive", "offensive"]`	Ælæctra
`dacy/hatespeech_classification`	DaNLP	Facebook	`["særlig opmærksomhed", "personangreb", "sprogbrug", "spam & indhold"]`	Danish BERT by BotXO

Other models for Hate Speech detection

There exist others models for Danish hate-speech detection. We have chosen the BERT offensive model as it obtains a reasonable trade-off between good [performance and speed] and includes a classification for classifying the type of hate-speech. The other models includes

The hate speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Usage#

To add the emotion models to your pipeline simply run:

import dacy
import spacy

nlp = spacy.blank("da")  # create an empty pipeline

# add the hate speech models
nlp.add_pipe("dacy/hatespeech_detection")
nlp.add_pipe("dacy/hatespeech_classification")

This wil set the two extensions to the Doc object, is_offensive and hate_speech_type. These shows whether a text is emotionally laden and what emotion it contains.

Both of these also come with *_prob-suffix if you want to examine the probabilites of the models.

Let’s look at an example using the model:

texts = ["senile gamle idiot", "hej har du haft en god dag"]

# apply the pipeline
docs = nlp.pipe(texts)

for doc in docs:
    # print model predictions
    print(doc._.is_offensive)
    # print type of hate-speech if it is hate-speech
    if doc._.is_offensive == "offensive":
        print("\t", doc._.hate_speech_type)

/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):

offensive
	 sprogbrug
not offensive

/home/runner/.local/lib/python3.10/site-packages/thinc/shims/pytorch.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self._mixed_precision):