BTM#

class tweetopic.btm.BTM(n_components: int, n_iterations: int = 100, alpha: float = 6.0, beta: float = 0.1)#

Implementation of the Biterm Topic Model with Gibbs Sampling solver.

Parameters:

n_components (int) – Number of topics in the model.
n_iterations (int, default 100) – Number of iterations furing fitting.
alpha (float, default 6.0) – Dirichlet prior for topic distribution.
beta (float, default 0.1) – Dirichlet prior for topic-word distribution.

components_#

Conditional probabilities of all terms given a topic.

topic_distribution#

Prior probability of each topic.

n_features_in_#

Number of total vocabulary items seen during fitting.

get_params(deep: bool = False) → dict#

Get parameters for this estimator.

Parameters:: deep (bool, default False) – Ignored, exists for sklearn compatibility.
Returns:: Parameter names mapped to their values.
Return type:: dict

Note

Exists for sklearn compatibility.

set_params(**params) → BTM#

Set parameters for this estimator.

Note

Exists for sklearn compatibility.

fit(X: spmatrix | ArrayLike, y: None = None)#

Fits the model using Gibbs Sampling. Detailed description of the algorithm in Yan et al. (2013).

Parameters:

X (array-like or sparse matrix of shape (n_samples, n_features)) – BOW matrix of corpus.
y (None) – Ignored, exists for sklearn compatibility.

Returns:

The fitted model.

Return type:

BTM

Note

fit() works in-place too, the fitted model is returned for convenience.

transform(X: spmatrix | ArrayLike) → ndarray#

Predicts probabilities for each document belonging to each topic.

Parameters:: X (array-like or sparse matrix of shape (n_samples, n_features)) – Document-term matrix.
Returns:: Probabilities for each document belonging to each cluster.
Return type:: array of shape (n_samples, n_components)
Raises:: NotFittedException – If the model is not fitted, an exception will be raised

fit_transform(X: spmatrix | ArrayLike, y: None = None) → ndarray#

Fits the model, then transforms the given data.

Parameters:

X (array-like or sparse matrix of shape (n_samples, n_features)) – Document-term matrix.
y (None) – Ignored, sklearn compatibility.

Returns:

Probabilities for each document belonging to each cluster.

Return type:

array of shape (n_samples, n_components)