gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.rpmodel – Random Projections

models.rpmodel – Random Projections

Random Projections (also known as Random Indexing).

For theoretical background on Random Projections, see [1].

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts, temporary_file
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>> result = model[corpus[3]]  # apply model to document, result is vector in BoW format
>>>
>>> with temporary_file("model_file") as fname:
...     model.save(fname)  # save model to file
...     loaded_model = RpModel.load(fname)  # load model

References

[1]Kanerva et al., 2000, Random indexing of text samples for Latent Semantic Analysis, https://cloudfront.escholarship.org/dist/prd/content/qt5644k0w6/qt5644k0w6.pdf
class gensim.models.rpmodel.RpModel(corpus, id2word=None, num_topics=300)

Bases: gensim.interfaces.TransformationABC

Parameters:
  • corpus (iterable of iterable of (int, int)) – Input corpus.
  • id2word ({dict of (int, str), Dictionary}, optional) – Mapping token_id -> token, will be determine from corpus if id2word == None.
  • num_topics (int, optional) – Number of topics.
__getitem__(bow)

Get random-projection representation of the input vector or corpus.

Parameters:bow ({list of (int, int), iterable of list of (int, int)}) – Input document or corpus.
Returns:
  • list of (int, float) – if bow is document OR
  • TransformedCorpus – if bow is corpus.

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>> result = model[corpus[0]]  # apply model to document, result is vector in BoW format, i.e. [(1, 0.3), ... ]
initialize(corpus)

Initialize the random projection matrix.

Parameters:corpus (iterable of iterable of (int, int)) – Input corpus.
classmethod load(fname, mmap=None)

Load a previously saved object (using save()) from file.

Parameters:
  • fname (str) – Path to file that contains needed object.
  • mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

See also

save()

Returns:Object loaded from fname.
Return type:object
Raises:IOError – When methods are called on instance (should be called from class).
save(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset([]), pickle_protocol=2)

Save the object to file.

Parameters:
  • fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.
  • separately (list of str or None, optional) – If None - automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This avoids pickle memory errors and allows mmap’ing large arrays back on load efficiently. If list of str - this attributes will be stored in separate files, the automatic check is not performed in this case.
  • sep_limit (int) – Limit for automatic separation.
  • ignore (frozenset of str) – Attributes that shouldn’t be serialize/store.
  • pickle_protocol (int) – Protocol number for pickle.

See also

load()