gensim logo

gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

models.rpmodel – Random Projections

models.rpmodel – Random Projections

class gensim.models.rpmodel.RpModel(corpus, id2word=None, num_topics=300)

Objects of this class allow building and maintaining a model for Random Projections (also known as Random Indexing). For theoretical background on RP, see:

Kanerva et al.: “Random indexing of text samples for Latent Semantic Analysis.”

The main methods are:

  1. constructor, which creates the random projection matrix
  2. the [] method, which transforms a simple count representation into the TfIdf space.
>>> rp = RpModel(corpus)
>>> print(rp[some_doc])

Model persistency is achieved via its load/save methods.

id2word is a mapping from word ids (integers) to words (strings). It is used to determine the vocabulary size, as well as for debugging and topic printing. If not set, it will be determined from the corpus.


Initialize the random projection matrix.

load(fname, mmap=None)

Load a previously saved object from file (also see save).

If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. Default: don’t use mmap, load large arrays as normal objects.

If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then mmap=None must be set. Load will raise an IOError if this condition is encountered.

save(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset([]), pickle_protocol=2)

Save the object to file (also see load).

fname_or_handle is either a string specifying the file name to save to, or an open file-like object which can be written to. If the object is a file handle, no special array handling will be performed; all attributes will be saved to the same file.

If separately is None, automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This avoids pickle memory errors and allows mmap’ing large arrays back on load efficiently.

You can also set separately manually, in which case it must be a list of attribute names to be stored in separate files. The automatic check is not performed in this case.

ignore is a set of attribute names to not serialize (file handles, caches etc). On subsequent load() these attributes will be set to None.

pickle_protocol defaults to 2 so the pickled object can be imported in both Python 2 and 3.