gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

sklearn_api.rpmodel – Scikit learn wrapper for Random Projection model

sklearn_api.rpmodel – Scikit learn wrapper for Random Projection model

Scikit learn interface for RpModel.

Follows scikit-learn API conventions to facilitate using gensim along with scikit-learn.

Examples

>>> from gensim.sklearn_api.rpmodel import RpTransformer
>>> from gensim.test.utils import common_dictionary, common_corpus
>>>
>>> # Initialize and fit the model.
>>> model = RpTransformer(id2word=common_dictionary).fit(common_corpus)
>>>
>>> # Use the trained model to transform a document.
>>> result = model.transform(common_corpus[3])
class gensim.sklearn_api.rpmodel.RpTransformer(id2word=None, num_topics=300)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base Word2Vec module, wraps RpModel.

For more information please have a look to Random projection.

Parameters:
  • id2word (Dictionary, optional) – Mapping token_id -> token, will be determined from corpus if id2word == None.
  • num_topics (int, optional) – Number of dimensions.
fit(X, y=None)

Fit the model according to the given training data.

Parameters:X (iterable of list of (int, number)) – Input corpus in BOW format.
Returns:The trained model.
Return type:RpTransformer
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (numpy array of shape [n_samples, n_features]) – Training set.
  • y (numpy array of shape [n_samples]) – Target values.
Returns:

X_new – Transformed array.

Return type:

numpy array of shape [n_samples, n_features_new]

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
transform(docs)

Find the Random Projection factors for docs.

Parameters:docs ({iterable of iterable of (int, int), list of (int, number)}) – Document or documents to be transformed in BOW format.
Returns:RP representation for each input document.
Return type:numpy.ndarray of shape [len(docs), num_topics]