gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.lsi_worker – Worker for distributed LSI

models.lsi_worker – Worker for distributed LSI

Worker (“slave”) process used in computing distributed LsiModel.

Run this script on every node in your cluster. If you wish, you may even run it multiple times on a single machine, to make better use of multiple cores (just beware that memory footprint increases accordingly).

Warning

Requires installed Pyro4. Distributed version works only in local network.

How to use distributed LsiModel

  1. Install needed dependencies (Pyro4)

    pip install gensim[distributed]
    
  2. Setup serialization (on each machine)

    export PYRO_SERIALIZERS_ACCEPTED=pickle
    export PYRO_SERIALIZER=pickle
    
  3. Run nameserver

    python -m Pyro4.naming -n 0.0.0.0 &
    
  4. Run workers (on each machine)

    python -m gensim.models.lsi_worker &
    
  5. Run dispatcher

    python -m gensim.models.lsi_dispatcher &
    
  6. Run LsiModel in distributed mode

    >>> from gensim.test.utils import common_corpus, common_dictionary
    >>> from gensim.models import LsiModel
    >>>
    >>> model = LsiModel(common_corpus, id2word=common_dictionary, distributed=True)
    

Command line arguments

...

optional arguments:
  -h, --help  show this help message and exit
class gensim.models.lsi_worker.Worker

Bases: object

Partly initializes the model.

A full initialization requires a call to initialize().

exit()

Terminates the worker.

getstate(*args, **kwargs)

Log and get the LSI model’s current projection.

Returns:The current projection.
Return type:Projection
initialize(myid, dispatcher, **model_params)

Fully initializes the worker.

Parameters:
  • myid (int) – An ID number used to identify this worker in the dispatcher object.
  • dispatcher (Dispatcher) – The dispatcher responsible for scheduling this worker.
  • **model_params – Keyword parameters to initialize the inner LSI model, see LsiModel.
processjob(*args, **kwargs)

Incrementally processes the job and potentially logs progress.

Parameters:job (iterable of list of (int, float)) – Corpus in BoW format.
requestjob()

Request jobs from the dispatcher, in a perpetual loop until getstate() is called.

reset(*args, **kwargs)

Resets the worker by deleting its current projection.