Status: In Progress | First online: 12-06-2020 | Updated: 12-06-2020 |
Design Specification
Title: News-fluxus
Repository: NewsFluxus @ GitHub
Developers: Kristoffer L. Nielbo, Liu Bing
Contact: chcaa@cas.au.dk
License: MIT License
Introduction
News-fluxus is developed for humanities and social science reseachers that require formal modeling of the language-sensitive dynamics in large collections of digital/digitized newspapers. The tool combines bag-of-words representations of newspaper content with information divergence and adaptive smoothing in order to identify information states and model change. The tool is implemented in Python (with a Java backend) and can accomodate multiple input data structures (ex. vanilla, csv, json). Importantly, news-fluxus can export derived data under several ‘copyright settings’ depending on terms of use for the particular data set.
Goals and objectives
- Reseach: Identification of information states and modelling of change in the content of large collections of newspapers.
- Data sharing: Make available derived data for further exploration and hypothesis-testing in scientific research.
Statement and scope
Software context
Data
Architectural and components
NEWSFLUX/
├── downloader.py
├── fig
├── main.sh
├── mdl
├── requirements.txt
├── res
│ ├── stopwords-da.txt
│ ├── stopwords-fi.txt
│ ├── stopwords-no.txt
│ └── stopwords-sv.txt
└── src
├── bow_mdl.py
├── news_uncertainty.py
├── saffine
│ ├── detrending_coeff.py
│ ├── detrending_method.py
│ ├── multi_detrending.py
├── signal_extraction.py
└── tekisuto
├── datasets
│ ├── datasetloader.py
│ ├── datasetloadertable.py
│ ├── dsloaderndjson.py
├── metrics
│ ├── entropies.py
├── models
│ ├── infodynamics.py
│ ├── latentsemantics.py
├── preprocessing
│ ├── casefolder.py
│ ├── lemmatizer.py
│ ├── regxfilter.py
│ ├── swfilter.py
│ └── tokenizer.py
└── tekiutil.py
User interface
CMD with driver scripts in Python and Make file for build automation.