models.rpmodel – Random Projections

Random Projections (also known as Random Indexing).

For theoretical background on Random Projections, see 1.

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts, temporary_file
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>> result = model[corpus[3]]  # apply model to document, result is vector in BoW format
>>>
>>> with temporary_file("model_file") as fname:
...     model.save(fname)  # save model to file
...     loaded_model = RpModel.load(fname)  # load model

References

1

Kanerva et al., 2000, Random indexing of text samples for Latent Semantic Analysis, https://cloudfront.escholarship.org/dist/prd/content/qt5644k0w6/qt5644k0w6.pdf

class gensim.models.rpmodel.RpModel(corpus, id2word=None, num_topics=300)

Bases: TransformationABC

Parameters
  • corpus (iterable of iterable of (int, int)) – Input corpus.

  • id2word ({dict of (int, str), Dictionary}, optional) – Mapping token_id -> token, will be determine from corpus if id2word == None.

  • num_topics (int, optional) – Number of topics.

__getitem__(bow)

Get random-projection representation of the input vector or corpus.

Parameters

bow ({list of (int, int), iterable of list of (int, int)}) – Input document or corpus.

Returns

  • list of (int, float) – if bow is document OR

  • TransformedCorpus – if bow is corpus.

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>>
>>> # apply model to document, result is vector in BoW format, i.e. [(1, 0.3), ... ]
>>> result = model[corpus[0]]
add_lifecycle_event(event_name, log_level=20, **event)

Append an event into the lifecycle_events attribute of this object, and also optionally log the event at log_level.

Events are important moments during the object’s life, such as “model created”, “model saved”, “model loaded”, etc.

The lifecycle_events attribute is persisted across object’s save() and load() operations. It has no impact on the use of the model, but is useful during debugging and support.

Set self.lifecycle_events = None to disable this behaviour. Calls to add_lifecycle_event() will not record events into self.lifecycle_events then.

Parameters
  • event_name (str) – Name of the event. Can be any label, e.g. “created”, “stored” etc.

  • event (dict) –

    Key-value mapping to append to self.lifecycle_events. Should be JSON-serializable, so keep it simple. Can be empty.

    This method will automatically add the following key-values to event, so you don’t have to specify them:

    • datetime: the current date & time

    • gensim: the current Gensim version

    • python: the current Python version

    • platform: the current platform

    • event: the name of this event

  • log_level (int) – Also log the complete event dict, at the specified log level. Set to False to not log at all.

initialize(corpus)

Initialize the random projection matrix.

Parameters

corpus (iterable of iterable of (int, int)) – Input corpus.

classmethod load(fname, mmap=None)

Load an object previously saved using save() from a file.

Parameters
  • fname (str) – Path to file that contains needed object.

  • mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

See also

save()

Save object to file.

Returns

Object loaded from fname.

Return type

object

Raises

AttributeError – When called on an object instance instead of class (this is a class method).

save(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset({}), pickle_protocol=4)

Save the object to a file.

Parameters
  • fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.

  • separately (list of str or None, optional) –

    If None, automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This prevent memory errors for large objects, and also allows memory-mapping the large arrays for efficient loading and sharing the large arrays in RAM between multiple processes.

    If list of str: store these attributes into separate files. The automated size check is not performed in this case.

  • sep_limit (int, optional) – Don’t store arrays smaller than this separately. In bytes.

  • ignore (frozenset of str, optional) – Attributes that shouldn’t be stored at all.

  • pickle_protocol (int, optional) – Protocol number for pickle.

See also

load()

Load object from file.