sklearn_api.ldaseqmodel – Scikit learn wrapper for LdaSeq model

`sklearn_api.ldaseqmodel` – Scikit learn wrapper for LdaSeq model¶

Scikit learn interface for LdaSeqModel.

Follows scikit-learn API conventions to facilitate using gensim along with scikit-learn.

Examples

>>> from gensim.test.utils import common_corpus, common_dictionary
>>> from gensim.sklearn_api.ldaseqmodel import LdaSeqTransformer
>>>
>>> # Create a sequential LDA transformer to extract 2 topics from the common corpus.
>>> # Divide the work into 3 unequal time slices.
>>> model = LdaSeqTransformer(id2word=common_dictionary, num_topics=2, time_slice=[3, 4, 2], initialize='gensim')
>>>
>>> # Each document almost entirely belongs to one of the two topics.
>>> transformed_corpus = model.fit_transform(common_corpus)

class gensim.sklearn_api.ldaseqmodel.LdaSeqTransformer(time_slice=None, id2word=None, alphas=0.01, num_topics=10, initialize='gensim', sstats=None, lda_model=None, obs_variance=0.5, chain_variance=0.005, passes=10, random_state=None, lda_inference_max_iter=25, em_min_iter=6, em_max_iter=20, chunksize=100)¶

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base Sequential LDA module, wraps LdaSeqModel model.

For more information take a look at David M. Blei, John D. Lafferty: “Dynamic Topic Models”.

Parameters

time_slice (list of int, optional) – Number of documents in each time-slice.
id2word (Dictionary, optional) – Mapping from an ID to the word it represents in the vocabulary.
alphas (float, optional) – The prior probability of each topic.
num_topics (int, optional) – Number of latent topics to be discovered in the corpus.
initialize ({'gensim', 'own', 'ldamodel'}, optional) –
Controls the initialization of the DTM model. Supports three different modes:
- ’gensim’: Uses gensim’s own LDA initialization.
- ’own’: Uses your own initialization matrix of an LDA model that has been previously trained.
- ’lda_model’: Use a previously used LDA model, passing it through the lda_model argument.
sstats (np.ndarray of shape [vocab_len, num_topics], optional) – If initialize is set to ‘own’ this will be used to initialize the DTM model.
lda_model (LdaModel, optional) – If initialize is set to ‘lda_model’ this object will be used to create the sstats initialization matrix.
obs_variance (float, optional) –
Observed variance used to approximate the true and forward variance as shown in David M. Blei, John D. Lafferty: “Dynamic Topic Models”.
chain_variance (float, optional) – Gaussian parameter defined in the beta distribution to dictate how the beta values evolve.
passes (int, optional) – Number of passes over the corpus for the initial LdaModel
random_state ({numpy.random.RandomState, int}, optional) – Can be a np.random.RandomState object, or the seed to generate one. Used for reproducibility of results.
lda_inference_max_iter (int, optional) – Maximum number of iterations in the inference step of the LDA training.
em_min_iter (int, optional) – Minimum number of iterations until converge of the Expectation-Maximization algorithm
em_max_iter (int, optional) – Maximum number of iterations until converge of the Expectation-Maximization algorithm
chunksize (int, optional) – Number of documents in the corpus do be processed in in a chunk.

fit(X, y=None)¶

Fit the model according to the given training data.

Parameters: X ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format used for training the model.
Returns: The trained model.
Return type: LdaSeqTransformer

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (numpy array of shape [n_samples, n_features]) – Training set.
y (numpy array of shape [n_samples]) – Target values.

Returns

X_new – Transformed array.

Return type

numpy array of shape [n_samples, n_features_new]

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: mapping of string to any

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
Return type: self

transform(docs)¶

Infer the topic distribution for docs.

Parameters: docs ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format to be transformed.
Returns: The topic representation of each document.
Return type: numpy.ndarray of shape [len(docs), num_topics]

Get Expert Help From The Gensim Authors

sklearn_api.ldaseqmodel – Scikit learn wrapper for LdaSeq model¶

`sklearn_api.ldaseqmodel` – Scikit learn wrapper for LdaSeq model¶