gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.lda_worker – Worker for distributed LDA

models.lda_worker – Worker for distributed LDA

Worker (“slave”) process used in computing distributed Latent Dirichlet Allocation (LDA, LdaModel).

Run this script on every node in your cluster. If you wish, you may even run it multiple times on a single machine, to make better use of multiple cores (just beware that memory footprint increases accordingly).

How to use distributed LdaModel

  1. Install needed dependencies (Pyro4)

    pip install gensim[distributed]
    
  2. Setup serialization (on each machine)

    export PYRO_SERIALIZERS_ACCEPTED=pickle
    export PYRO_SERIALIZER=pickle
    
  3. Run nameserver

    python -m Pyro4.naming -n 0.0.0.0 &
    
  4. Run workers (on each machine)

    python -m gensim.models.lda_worker &
    
  5. Run dispatcher

    python -m gensim.models.lda_dispatcher &
    
  6. Run LdaModel in distributed mode

    >>> from gensim.test.utils import common_corpus,common_dictionary
    >>> from gensim.models import LdaModel
    >>>
    >>> model = LdaModel(common_corpus, id2word=common_dictionary, distributed=True)
    

Command line arguments

...
optional arguments:
  -h, --help      show this help message and exit
  --host HOST     Nameserver hostname (default: None)
  --port PORT     Nameserver port (default: None)
  --no-broadcast  Disable broadcast (default: True)
  --hmac HMAC     Nameserver hmac key (default: None)
  -v, --verbose   Verbose flag
class gensim.models.lda_worker.Worker

Used as a Pyro4 class with exposed methods.

Exposes every non-private method and property of the class automatically to be available for remote access.

Partly initialize the model.

exit()

Terminate the worker.

getstate(*args, **kwargs)

Log and get the LDA model’s current state.

Returns:result – The current state.
Return type:LdaState
initialize(myid, dispatcher, **model_params)

Fully initialize the worker.

Parameters:
  • myid (int) – An ID number used to identify this worker in the dispatcher object.
  • dispatcher (Dispatcher) – The dispatcher responsible for scheduling this worker.
  • **model_params – Keyword parameters to initialize the inner LDA model,see LdaModel.
ping()

Test the connectivity with Worker.

processjob(*args, **kwargs)

Incrementally process the job and potentially logs progress.

Parameters:job (iterable of list of (int, float)) – Corpus in BoW format.
requestjob()

Request jobs from the dispatcher, in a perpetual loop until gensim.models.lda_worker.Worker.getstate() is called.

Raises:RuntimeError – If self.model is None (i.e. worker non initialized).
reset(*args, **kwargs)

Reset the worker by setting sufficient stats to 0.

Parameters:state (LdaState) – Encapsulates information for distributed computation of LdaModel objects.