sklearn_api.ldaseqmodel – Scikit learn wrapper for LdaSeq model

Scikit learn interface for LdaSeqModel.

Follows scikit-learn API conventions to facilitate using gensim along with scikit-learn.


>>> from gensim.test.utils import common_corpus, common_dictionary
>>> from gensim.sklearn_api.ldaseqmodel import LdaSeqTransformer
>>> # Create a sequential LDA transformer to extract 2 topics from the common corpus.
>>> # Divide the work into 3 unequal time slices.
>>> model = LdaSeqTransformer(id2word=common_dictionary, num_topics=2, time_slice=[3, 4, 2], initialize='gensim')
>>> # Each document almost entirely belongs to one of the two topics.
>>> transformed_corpus = model.fit_transform(common_corpus)
class gensim.sklearn_api.ldaseqmodel.LdaSeqTransformer(time_slice=None, id2word=None, alphas=0.01, num_topics=10, initialize='gensim', sstats=None, lda_model=None, obs_variance=0.5, chain_variance=0.005, passes=10, random_state=None, lda_inference_max_iter=25, em_min_iter=6, em_max_iter=20, chunksize=100)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base Sequential LDA module, wraps LdaSeqModel model.

For more information take a look at David M. Blei, John D. Lafferty: “Dynamic Topic Models”.

  • time_slice (list of int, optional) – Number of documents in each time-slice.

  • id2word (Dictionary, optional) – Mapping from an ID to the word it represents in the vocabulary.

  • alphas (float, optional) – The prior probability of each topic.

  • num_topics (int, optional) – Number of latent topics to be discovered in the corpus.

  • initialize ({'gensim', 'own', 'ldamodel'}, optional) –

    Controls the initialization of the DTM model. Supports three different modes:
    • ’gensim’: Uses gensim’s own LDA initialization.

    • ’own’: Uses your own initialization matrix of an LDA model that has been previously trained.

    • ’lda_model’: Use a previously used LDA model, passing it through the lda_model argument.

  • sstats (np.ndarray of shape [vocab_len, num_topics], optional) – If initialize is set to ‘own’ this will be used to initialize the DTM model.

  • lda_model (LdaModel, optional) – If initialize is set to ‘lda_model’ this object will be used to create the sstats initialization matrix.

  • obs_variance (float, optional) –

    Observed variance used to approximate the true and forward variance as shown in David M. Blei, John D. Lafferty: “Dynamic Topic Models”.

  • chain_variance (float, optional) – Gaussian parameter defined in the beta distribution to dictate how the beta values evolve.

  • passes (int, optional) – Number of passes over the corpus for the initial LdaModel

  • random_state ({numpy.random.RandomState, int}, optional) – Can be a np.random.RandomState object, or the seed to generate one. Used for reproducibility of results.

  • lda_inference_max_iter (int, optional) – Maximum number of iterations in the inference step of the LDA training.

  • em_min_iter (int, optional) – Minimum number of iterations until converge of the Expectation-Maximization algorithm

  • em_max_iter (int, optional) – Maximum number of iterations until converge of the Expectation-Maximization algorithm

  • chunksize (int, optional) – Number of documents in the corpus do be processed in in a chunk.

fit(X, y=None)

Fit the model according to the given training data.


X ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format used for training the model.


The trained model.

Return type


fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

  • X (numpy array of shape [n_samples, n_features]) – Training set.

  • y (numpy array of shape [n_samples]) – Target values.


X_new – Transformed array.

Return type

numpy array of shape [n_samples, n_features_new]


Get parameters for this estimator.


deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

Return type

mapping of string to any


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


Return type



Infer the topic distribution for docs.


docs ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format to be transformed.


The topic representation of each document.

Return type

numpy.ndarray of shape [len(docs), num_topics]