gensim logo

gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

sklearn_integration.sklearn_wrapper_gensim_ldamodel – Scikit learn wrapper for Latent Dirichlet Allocation

sklearn_integration.sklearn_wrapper_gensim_ldamodel – Scikit learn wrapper for Latent Dirichlet Allocation

Scikit learn interface for gensim for easy use of gensim with scikit-learn follows on scikit learn API conventions

class gensim.sklearn_integration.sklearn_wrapper_gensim_ldamodel.SklLdaModel(num_topics=100, id2word=None, chunksize=2000, passes=1, update_every=1, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, minimum_probability=0.01, random_state=None)

Bases: gensim.sklearn_integration.base_sklearn_wrapper.BaseSklearnWrapper, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base LDA module

Sklearn wrapper for LDA model. derived class for gensim.model.LdaModel .

fit(X, y=None)

Fit the model according to the given training data. Calls gensim.models.LdaModel

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

  • X (numpy array of shape [n_samples, n_features]) – Training set.
  • y (numpy array of shape [n_samples]) – Target values.

X_new – Transformed array.

Return type:

numpy array of shape [n_samples, n_features_new]


Returns all parameters as dictionary.


Train model over X. By default, ‘online (single-pass)’ mode is used for training the LDA model. Configure passes and update_every params at init to choose the mode among :

  • online (single-pass): update_every != None and passes == 1
  • online (multi-pass): update_every != None and passes > 1
  • batch: update_every == None

Set all parameters.


Takes as an list of input a documents (documents). Returns matrix of topic distribution for the given document bow, where a_ij indicates (topic_i, topic_probability_j). The input docs should be in BOW format and can be a list of documents like : [ [(4, 1), (7, 1)], [(9, 1), (13, 1)], [(2, 1), (6, 1)] ] or a single document like : [(4, 1), (7, 1)]