Status: In Progress First online: 12-06-2020 Updated: 12-06-2020

Design Specification

Title: News-fluxus
Repository: NewsFluxus @ GitHub
Developers: Kristoffer L. Nielbo, Liu Bing 
Contact: chcaa@cas.au.dk
License: MIT License

Introduction

News-fluxus is developed for humanities and social science reseachers that require formal modeling of the language-sensitive dynamics in large collections of digital/digitized newspapers. The tool combines bag-of-words representations of newspaper content with information divergence and adaptive smoothing in order to identify information states and model change. The tool is implemented in Python (with a Java backend) and can accomodate multiple input data structures (ex. vanilla, csv, json). Importantly, news-fluxus can export derived data under several ‘copyright settings’ depending on terms of use for the particular data set.

Goals and objectives

  1. Reseach: Identification of information states and modelling of change in the content of large collections of newspapers.
  2. Data sharing: Make available derived data for further exploration and hypothesis-testing in scientific research.

Statement and scope

Software context

Data

Architectural and components

NEWSFLUX/
├── downloader.py
├── fig
├── main.sh
├── mdl
├── requirements.txt
├── res
│   ├── stopwords-da.txt
│   ├── stopwords-fi.txt
│   ├── stopwords-no.txt
│   └── stopwords-sv.txt
└── src
    ├── bow_mdl.py
    ├── news_uncertainty.py
    ├── saffine
    │   ├── detrending_coeff.py
    │   ├── detrending_method.py
    │   ├── multi_detrending.py
    ├── signal_extraction.py
    └── tekisuto
        ├── datasets
        │   ├── datasetloader.py
        │   ├── datasetloadertable.py
        │   ├── dsloaderndjson.py
        ├── metrics
        │   ├── entropies.py
        ├── models
        │   ├── infodynamics.py
        │   ├── latentsemantics.py
        ├── preprocessing
        │   ├── casefolder.py
        │   ├── lemmatizer.py
        │   ├── regxfilter.py
        │   ├── swfilter.py
        │   └── tokenizer.py
        └── tekiutil.py

User interface

CMD with driver scripts in Python and Make file for build automation.

Restrictions, limitations, and constraints

Testing issues