Performance#

odyCy achieves state of the art performance on multiple tasks on unseen test data from the Universal Dependencies Perseus treebank, and performs second best on the PROIEL treebank’s test set on even more tasks. In addition performance also seems relatively stable across the two evaluation datasets in comparison with other NLP pipelines.

How did we evaluate performance?

In order to reproduce our measurements check out our repository for evaluation of ancient greek pipelines, greevaluation.

Individual Tasks#

Part-of-Speech Tagging#

odyCy achieves state of the art performance on the UD Perseus Treebank and performs second best on PROIEL. Our pipeline scores highest when taking the weighted average of the two test sets.

Morphological Analysis#

odyCy achieves state of the art performance on the UD Perseus Treebank and performs second best on PROIEL.

Dependency Parsing#

odyCy achieves state of the art performance on the UD Perseus Treebank and performs second best on PROIEL.


What is LAS and UAS?

Unlabelled attachment score (UAS) denotes the percentage of words that get assigned the correct head, while labelled attachment score (LAS) is the percentage of words that get assigned the correct head and label. For more information, read the following chapter by Jurafsky and Martin.

Sentence Segmentation#

odyCy performs second best on PROIEL and has highest weighted average score in sentence segmentation.

Lemmatization#

odyCy achieves the highest weighted average over the two test sets.

Our experiments have shown that our lemmatization pipeline’s performance is comparable to that of its neural subcomponent. Therefore it’s ambiguous which will result in better predictions. See: Lemmatization

Lemmatizer comparison

Corpora#

Perseus#

odyCy achieves state of the art performance on POS-tagging, Morphological Analysis and Dependency Parsing and performs second best in Lemmatization.

Performance on the Perseus Treebank.

PROIEL#

odyCy performs second best in POS-tagging, Morphological Analysis, Dependency Parsing, Sentence Segmentation and Lemmatization.

Performance on the PROIEL Treebank.

Speed#

We measured word per second performance on a joint test set on an Intel Xeon Gold 6130 on a single core with SpaCy’s CLI. As such only models with a SpaCy wrapper could be tested.

Higher score means better.