`models.rpmodel` – Random Projections¶

Random Projections (also known as Random Indexing).

For theoretical background on Random Projections, see 1.

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts, temporary_file
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>> result = model[corpus[3]]  # apply model to document, result is vector in BoW format
>>>
>>> with temporary_file("model_file") as fname:
...     model.save(fname)  # save model to file
...     loaded_model = RpModel.load(fname)  # load model

References

1: Kanerva et al., 2000, Random indexing of text samples for Latent Semantic Analysis, https://cloudfront.escholarship.org/dist/prd/content/qt5644k0w6/qt5644k0w6.pdf

class gensim.models.rpmodel.RpModel(corpus, id2word=None, num_topics=300)¶

Bases: TransformationABC

Parameters

corpus (iterable of iterable of (int, int)) – Input corpus.
id2word ({dict of (int, str), Dictionary}, optional) – Mapping token_id -> token, will be determine from corpus if id2word == None.
num_topics (int, optional) – Number of topics.

__getitem__(bow)¶

Get random-projection representation of the input vector or corpus.

Parameters

bow ({list of (int, int), iterable of list of (int, int)}) – Input document or corpus.

Returns

list of (int, float) – if bow is document OR
TransformedCorpus – if bow is corpus.

Examples

>>> from gensim.models import RpModel
>>> from gensim.corpora import Dictionary
>>> from gensim.test.utils import common_texts
>>>
>>> dictionary = Dictionary(common_texts)  # fit dictionary
>>> corpus = [dictionary.doc2bow(text) for text in common_texts]  # convert texts to BoW format
>>>
>>> model = RpModel(corpus, id2word=dictionary)  # fit model
>>>
>>> # apply model to document, result is vector in BoW format, i.e. [(1, 0.3), ... ]
>>> result = model[corpus[0]]

add_lifecycle_event(event_name, log_level=20, **event)¶

Append an event into the lifecycle_events attribute of this object, and also optionally log the event at log_level.

Events are important moments during the object’s life, such as “model created”, “model saved”, “model loaded”, etc.

The lifecycle_events attribute is persisted across object’s save() and load() operations. It has no impact on the use of the model, but is useful during debugging and support.

Set self.lifecycle_events = None to disable this behaviour. Calls to add_lifecycle_event() will not record events into self.lifecycle_events then.

Parameters

event_name (str) – Name of the event. Can be any label, e.g. “created”, “stored” etc.
event (dict) –
Key-value mapping to append to self.lifecycle_events. Should be JSON-serializable, so keep it simple. Can be empty.

This method will automatically add the following key-values to event, so you don’t have to specify them:
- datetime: the current date & time
- gensim: the current Gensim version
- python: the current Python version
- platform: the current platform
- event: the name of this event
log_level (int) – Also log the complete event dict, at the specified log level. Set to False to not log at all.

initialize(corpus)¶

Initialize the random projection matrix.

Parameters: corpus (iterable of iterable of (int, int)) – Input corpus.

classmethod load(fname, mmap=None)¶

Load an object previously saved using save() from a file.

Parameters

fname (str) – Path to file that contains needed object.
mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

Please sponsor Gensim to help sustain this open source project!

models.rpmodel – Random Projections¶

`models.rpmodel` – Random Projections¶