models.lsi_worker
– Worker for distributed LSI¶Worker (“slave”) process used in computing distributed Latent Semantic Indexing (LSI,
LsiModel
) models.
Run this script on every node in your cluster. If you wish, you may even run it multiple times on a single machine, to make better use of multiple cores (just beware that memory footprint increases linearly).
Install needed dependencies (Pyro4)
pip install gensim[distributed]
Setup serialization (on each machine)
export PYRO_SERIALIZERS_ACCEPTED=pickle
export PYRO_SERIALIZER=pickle
Run nameserver
python -m Pyro4.naming -n 0.0.0.0 &
Run workers (on each machine)
python -m gensim.models.lsi_worker &
Run dispatcher
python -m gensim.models.lsi_dispatcher &
Run LsiModel
in distributed mode:
>>> from gensim.test.utils import common_corpus, common_dictionary >>> from gensim.models import LsiModel >>> >>> model = LsiModel(common_corpus, id2word=common_dictionary, distributed=True)
...
optional arguments:
-h, --help show this help message and exit
gensim.models.lsi_worker.
Worker
¶Bases: object
Partly initialize the model.
A full initialization requires a call to initialize()
.
exit
()¶Terminate the worker.
getstate
()¶Log and get the LSI model’s current projection.
The current projection.
initialize
(myid, dispatcher, **model_params)¶Fully initialize the worker.
myid (int) – An ID number used to identify this worker in the dispatcher object.
dispatcher (Dispatcher
) – The dispatcher responsible for scheduling this worker.
**model_params – Keyword parameters to initialize the inner LSI model, see LsiModel
.
processjob
(job)¶Incrementally process the job and potentially logs progress.
job (iterable of list of (int, float)) – Corpus in BoW format.
requestjob
()¶Request jobs from the dispatcher, in a perpetual loop until getstate()
is called.
RuntimeError – If self.model is None (i.e. worker not initialized).
reset
()¶Reset the worker by deleting its current projection.