Hate Speech#

Open In Colab

Hate speech in text is often defined text that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation.

DaCy currently does not include its own tools for hate-speech analysis, but incorperates existing state-of-the-art models for Danish. The hate-speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Name

Creator

Domain

Output Type

Model Type

dacy/hatespeech_detection

DaNLP

Facebook

["not offensive", "offensive"]

Ælæctra

dacy/hatespeech_classification

DaNLP

Facebook

["særlig opmærksomhed", "personangreb", "sprogbrug", "spam & indhold"]

Danish BERT by BotXO

Other models for Hate Speech detection

There exist others models for Danish hate-speech detection. We have chosen the BERT offensive model as it obtains a reasonable trade-off between good [performance and speed] and includes a classification for classifying the type of hate-speech. The other models includes

The hate speech model used in DaCy is trained by DaNLP. It exists of two models. One for detecting wether a text is hate speech laden and one for classifying the type of hate speech.

Usage#

To add the emotion models to your pipeline simply run:

import dacy
import spacy

nlp = spacy.blank("da")  # create an empty pipeline

# add the hate speech models
nlp.add_pipe("dacy/hatespeech_detection")
nlp.add_pipe("dacy/hatespeech_classification")
Hide code cell output
<spacy_wrap.pipeline_component_seq_clf.SequenceClassificationTransformer at 0x7f79ee691b40>

This wil set the two extensions to the Doc object, is_offensive and hate_speech_type. These shows whether a text is emotionally laden and what emotion it contains.

Both of these also come with *_prob-suffix if you want to examine the probabilites of the models.

Let’s look at an example using the model:

texts = ["senile gamle idiot", "hej har du haft en god dag"]

# apply the pipeline
docs = nlp.pipe(texts)

for doc in docs:
    # print model predictions
    print(doc._.is_offensive)
    # print type of hate-speech if it is hate-speech
    if doc._.is_offensive == "offensive":
        print("\t", doc._.hate_speech_type)
offensive
	 sprogbrug
not offensive