BTM#
- class tweetopic.btm.BTM(n_components: int, n_iterations: int = 100, alpha: float = 6.0, beta: float = 0.1)#
Implementation of the Biterm Topic Model with Gibbs Sampling solver.
- Parameters:
n_components (
int) – Number of topics in the model.n_iterations (
int, default100) – Number of iterations furing fitting.alpha (
float, default6.0) – Dirichlet prior for topic distribution.beta (
float, default0.1) – Dirichlet prior for topic-word distribution.
- components_#
Conditional probabilities of all terms given a topic.
- Type:
arrayofshape (n_components,n_vocab)
- topic_distribution#
Prior probability of each topic.
- Type:
arrayofshape (n_components,)
- n_features_in_#
Number of total vocabulary items seen during fitting.
- Type:
int
- get_params(deep: bool = False) dict#
Get parameters for this estimator.
- Parameters:
deep (
bool, defaultFalse) – Ignored, exists for sklearn compatibility.- Returns:
Parameter names mapped to their values.
- Return type:
dict
Note
Exists for sklearn compatibility.
- set_params(**params) BTM#
Set parameters for this estimator.
- Returns:
Estimator instance
- Return type:
Note
Exists for sklearn compatibility.
- fit(X: spmatrix | ArrayLike, y: None = None)#
Fits the model using Gibbs Sampling. Detailed description of the algorithm in Yan et al. (2013).
- Parameters:
X (
array-likeorsparse matrixofshape (n_samples,n_features)) – BOW matrix of corpus.y (
None) – Ignored, exists for sklearn compatibility.
- Returns:
The fitted model.
- Return type:
Note
fit() works in-place too, the fitted model is returned for convenience.
- transform(X: spmatrix | ArrayLike) ndarray#
Predicts probabilities for each document belonging to each topic.
- Parameters:
X (
array-likeorsparse matrixofshape (n_samples,n_features)) – Document-term matrix.- Returns:
Probabilities for each document belonging to each cluster.
- Return type:
arrayofshape (n_samples,n_components)- Raises:
NotFittedException – If the model is not fitted, an exception will be raised
- fit_transform(X: spmatrix | ArrayLike, y: None = None) ndarray#
Fits the model, then transforms the given data.
- Parameters:
X (
array-likeorsparse matrixofshape (n_samples,n_features)) – Document-term matrix.y (
None) – Ignored, sklearn compatibility.
- Returns:
Probabilities for each document belonging to each cluster.
- Return type:
arrayofshape (n_samples,n_components)