gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

sklearn_api.hdp – Scikit learn wrapper for Hierarchical Dirichlet Process model

sklearn_api.hdp – Scikit learn wrapper for Hierarchical Dirichlet Process model

Scikit learn interface for HdpModel.

Follows scikit-learn API conventions to facilitate using gensim along with scikit-learn.

Examples

>>> from gensim.test.utils import common_dictionary, common_corpus
>>> from gensim.sklearn_api import HdpTransformer
>>>
>>> # Lets extract the distribution of each document in topics
>>> model = HdpTransformer(id2word=common_dictionary)
>>> distr = model.fit_transform(common_corpus)
class gensim.sklearn_api.hdp.HdpTransformer(id2word, max_chunks=None, max_time=None, chunksize=256, kappa=1.0, tau=64.0, K=15, T=150, alpha=1, gamma=1, eta=0.01, scale=1.0, var_converge=0.0001, outputdir=None, random_state=None)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base HDP module, wraps HdpModel.

The inner workings of this class heavily depends on Wang, Paisley, Blei: “Online Variational Inference for the Hierarchical Dirichlet Process, JMLR (2011)”.

Parameters:
fit(X, y=None)

Fit the model according to the given training data.

Parameters:X ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format used for training the model.
Returns:The trained model.
Return type:HdpTransformer
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (numpy array of shape [n_samples, n_features]) – Training set.
  • y (numpy array of shape [n_samples]) – Target values.
Returns:

X_new – Transformed array.

Return type:

numpy array of shape [n_samples, n_features_new]

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
partial_fit(X)

Train model over a potentially incomplete set of documents.

Uses the parameters set in the constructor. This method can be used in two ways: * On an unfitted model in which case the model is initialized and trained on X. * On an already fitted model in which case the model is updated by X.

Parameters:X ({iterable of list of (int, number), scipy.sparse matrix}) – A collection of documents in BOW format used for training the model.
Returns:The trained model.
Return type:HdpTransformer
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
transform(docs)

Infer a matrix of topic distribution for the given document bow, where a_ij indicates (topic_i, topic_probability_j).

Parameters:docs ({iterable of list of (int, number), list of (int, number)}) – Document or sequence of documents in BOW format.
Returns:Topic distribution for docs.
Return type:numpy.ndarray of shape [len(docs), num_topics]