gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.callbacks – Callbacks for track and viz LDA train process

models.callbacks – Callbacks for track and viz LDA train process

class gensim.models.callbacks.Callback(metrics)

Bases: object

Used to log/visualize the evaluation metrics during training. The values are stored at the end of each epoch.

Parameters:metrics – a list of callbacks. Possible values: “CoherenceMetric” “PerplexityMetric” “DiffMetric” “ConvergenceMetric”
on_epoch_end(epoch, topics=None)

Log or visualize current epoch’s metric value

Parameters:
  • epoch – current epoch no.
  • topics – topic distribution from current epoch (required for coherence of unsupported topic models)
set_model(model)

Save the model instance and initialize any required variables which would be updated throughout training

class gensim.models.callbacks.CallbackAny2Vec

Bases: object

Base class to build callbacks. Callbacks are used to apply custom functions over the model at specific points during training (epoch start, batch end etc.). To implement a Callback, subclass CallbackAny2Vec, look at the example below which creates a callback to save a training model after each epoch:

>>> from gensim.test.utils import common_texts as sentences
>>> from gensim.models.callbacks import CallbackAny2Vec
>>> from gensim.models import Word2Vec
>>> from gensim.test.utils import get_tmpfile
>>>
>>> class EpochSaver(CallbackAny2Vec):
...     "Callback to save model after every epoch"
...     def __init__(self, path_prefix):
...         self.path_prefix = path_prefix
...         self.epoch = 0
...     def on_epoch_end(self, model):
...         output_path = '{}_epoch{}.model'.format(self.path_prefix, self.epoch)
...         print("Save model to {}".format(output_path))
...         model.save(output_path)
...         self.epoch += 1
...
>>>
>>> class EpochLogger(CallbackAny2Vec):
...     "Callback to log information about training"
...     def __init__(self):
...         self.epoch = 0
...     def on_epoch_begin(self, model):
...         print("Epoch #{} start".format(self.epoch))
...     def on_epoch_end(self, model):
...         print("Epoch #{} end".format(self.epoch))
...         self.epoch += 1
...
>>> epoch_saver = EpochSaver(get_tmpfile("temporary_model"))
>>> epoch_logger = EpochLogger()
>>> w2v_model = Word2Vec(sentences, iter=5, size=10, min_count=0, seed=42, callbacks=[epoch_saver, epoch_logger])
on_batch_begin(model)

Method called on the start of batch.

Parameters:model (class:~gensim.models.base_any2vec.BaseWordEmbeddingsModel) – Current model.
on_batch_end(model)

Method called on the end of batch.

Parameters:model (class:~gensim.models.base_any2vec.BaseWordEmbeddingsModel) – Current model.
on_epoch_begin(model)

Method called on the start of epoch.

Parameters:model (class:~gensim.models.base_any2vec.BaseWordEmbeddingsModel) – Current model.
on_epoch_end(model)

Method called on the end of epoch.

Parameters:model
on_train_begin(model)

Method called on the start of training process.

Parameters:model (class:~gensim.models.base_any2vec.BaseWordEmbeddingsModel) – Current model.
on_train_end(model)

Method called on the end of training process.

Parameters:model (class:~gensim.models.base_any2vec.BaseWordEmbeddingsModel) – Current model.
class gensim.models.callbacks.CoherenceMetric(corpus=None, texts=None, dictionary=None, coherence=None, window_size=None, topn=10, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for coherence evaluation

Parameters:
  • corpus – Gensim document corpus.
  • texts – Tokenized texts. Needed for coherence models that use sliding window based probability estimator,
  • dictionary – Gensim dictionary mapping of id word to create corpus. If model.id2word is present, this is not needed. If both are provided, dictionary will be used.
  • window_size

    Is the size of the window to be used for coherence measures using boolean sliding window as their probability estimator. For ‘u_mass’ this doesn’t matter. If left ‘None’ the default window sizes are used which are:

    ’c_v’ : 110 ‘c_uci’ : 10 ‘c_npmi’ : 10
  • coherence – Coherence measure to be used. Supported values are: ‘u_mass’ ‘c_v’ ‘c_uci’ also popularly known as c_pmi ‘c_npmi’ For ‘u_mass’ corpus should be provided. If texts is provided, it will be converted to corpus using the dictionary. For ‘c_v’, ‘c_uci’ and ‘c_npmi’ texts should be provided. Corpus is not needed.
  • topn – Integer corresponding to the number of top words to be extracted from each topic.
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Pre-trained topic model. Should be provided if topics is not provided. Currently supports LdaModel, LdaMallet wrapper and LdaVowpalWabbit wrapper. Use ‘topics’ parameter to plug in an as yet unsupported model.
  • topics – List of tokenized topics.
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.ConvergenceMetric(distance='jaccard', num_words=100, n_ann_terms=10, diagonal=True, annotation=False, normed=True, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for convergence evaluation

Parameters:
  • distance – measure used to calculate difference between any topic pair. Available values: kullback_leibler hellinger jaccard
  • num_words – is quantity of most relevant words that used if distance == jaccard (also used for annotation)
  • n_ann_terms – max quantity of words in intersection/symmetric difference between topics (used for annotation)
  • diagonal – difference between identical topic no.s
  • annotation – intersection or difference of words between topics
  • normed (bool) – If true, matrix/array Z will be normalized
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Trained topic model
  • other_model – second topic model instance to calculate the difference from
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.DiffMetric(distance='jaccard', num_words=100, n_ann_terms=10, diagonal=True, annotation=False, normed=True, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for topic difference evaluation

Parameters:
  • distance – measure used to calculate difference between any topic pair. Available values: kullback_leibler hellinger jaccard
  • num_words – is quantity of most relevant words that used if distance == jaccard (also used for annotation)
  • n_ann_terms – max quantity of words in intersection/symmetric difference between topics (used for annotation)
  • diagonal – difference between identical topic no.s
  • annotation – intersection or difference of words between topics
  • normed (bool) – If true, matrix/array Z will be normalized
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Trained topic model
  • other_model – second topic model instance to calculate the difference from
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.Metric

Bases: object

Base Metric class for topic model evaluation metrics

get_value()
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.PerplexityMetric(corpus=None, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for perplexity evaluation

Parameters:
  • corpus – Gensim document corpus
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:model – Trained topic model
set_parameters(**parameters)

Set the parameters