models.lsi_worker
– Worker for distributed LSI¶
Worker (“slave”) process used in computing distributed Latent Semantic Indexing (LSI,
LsiModel
) models.
Run this script on every node in your cluster. If you wish, you may even run it multiple times on a single machine, to make better use of multiple cores (just beware that memory footprint increases linearly).
How to use distributed LSI¶
Install needed dependencies (Pyro4)
pip install gensim[distributed]
Setup serialization (on each machine)
export PYRO_SERIALIZERS_ACCEPTED=pickle export PYRO_SERIALIZER=pickle
Run nameserver
python -m Pyro4.naming -n 0.0.0.0 &
Run workers (on each machine)
python -m gensim.models.lsi_worker &
Run dispatcher
python -m gensim.models.lsi_dispatcher &
Run
LsiModel
in distributed mode:>>> from gensim.test.utils import common_corpus, common_dictionary >>> from gensim.models import LsiModel >>> >>> model = LsiModel(common_corpus, id2word=common_dictionary, distributed=True)
Command line arguments¶
...
options:
-h, --help show this help message and exit
- class gensim.models.lsi_worker.Worker¶
Bases:
object
Partly initialize the model.
A full initialization requires a call to
initialize()
.- exit()¶
Terminate the worker.
- getstate()¶
Log and get the LSI model’s current projection.
- Returns
The current projection.
- Return type
- initialize(myid, dispatcher, **model_params)¶
Fully initialize the worker.
- Parameters
myid (int) – An ID number used to identify this worker in the dispatcher object.
dispatcher (
Dispatcher
) – The dispatcher responsible for scheduling this worker.**model_params – Keyword parameters to initialize the inner LSI model, see
LsiModel
.
- processjob(job)¶
Incrementally process the job and potentially logs progress.
- Parameters
job (iterable of list of (int, float)) – Corpus in BoW format.
- requestjob()¶
Request jobs from the dispatcher, in a perpetual loop until
getstate()
is called.- Raises
RuntimeError – If self.model is None (i.e. worker not initialized).
- reset()¶
Reset the worker by deleting its current projection.