gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

models.callbacks – Callbacks for track and viz LDA train process

models.callbacks – Callbacks for track and viz LDA train process

class gensim.models.callbacks.Callback(metrics)

Bases: object

Used to log/visualize the evaluation metrics during training. The values are stored at the end of each epoch.

Parameters:metrics – a list of callbacks. Possible values: “CoherenceMetric” “PerplexityMetric” “DiffMetric” “ConvergenceMetric”
on_epoch_end(epoch, topics=None)

Log or visualize current epoch’s metric value

Parameters:
  • epoch – current epoch no.
  • topics – topic distribution from current epoch (required for coherence of unsupported topic models)
set_model(model)

Save the model instance and initialize any required variables which would be updated throughout training

class gensim.models.callbacks.CoherenceMetric(corpus=None, texts=None, dictionary=None, coherence=None, window_size=None, topn=10, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for coherence evaluation

Parameters:
  • corpus – Gensim document corpus.
  • texts – Tokenized texts. Needed for coherence models that use sliding window based probability estimator,
  • dictionary – Gensim dictionary mapping of id word to create corpus. If model.id2word is present, this is not needed. If both are provided, dictionary will be used.
  • window_size

    Is the size of the window to be used for coherence measures using boolean sliding window as their probability estimator. For ‘u_mass’ this doesn’t matter. If left ‘None’ the default window sizes are used which are:

    ’c_v’ : 110 ‘c_uci’ : 10 ‘c_npmi’ : 10
  • coherence – Coherence measure to be used. Supported values are: ‘u_mass’ ‘c_v’ ‘c_uci’ also popularly known as c_pmi ‘c_npmi’ For ‘u_mass’ corpus should be provided. If texts is provided, it will be converted to corpus using the dictionary. For ‘c_v’, ‘c_uci’ and ‘c_npmi’ texts should be provided. Corpus is not needed.
  • topn – Integer corresponding to the number of top words to be extracted from each topic.
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Pre-trained topic model. Should be provided if topics is not provided. Currently supports LdaModel, LdaMallet wrapper and LdaVowpalWabbit wrapper. Use ‘topics’ parameter to plug in an as yet unsupported model.
  • topics – List of tokenized topics.
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.ConvergenceMetric(distance='jaccard', num_words=100, n_ann_terms=10, diagonal=True, annotation=False, normed=True, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for convergence evaluation

Parameters:
  • distance – measure used to calculate difference between any topic pair. Available values: kullback_leibler hellinger jaccard
  • num_words – is quantity of most relevant words that used if distance == jaccard (also used for annotation)
  • n_ann_terms – max quantity of words in intersection/symmetric difference between topics (used for annotation)
  • diagonal – difference between identical topic no.s
  • annotation – intersection or difference of words between topics
  • normed (bool) – If true, matrix/array Z will be normalized
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Trained topic model
  • other_model – second topic model instance to calculate the difference from
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.DiffMetric(distance='jaccard', num_words=100, n_ann_terms=10, diagonal=True, annotation=False, normed=True, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for topic difference evaluation

Parameters:
  • distance – measure used to calculate difference between any topic pair. Available values: kullback_leibler hellinger jaccard
  • num_words – is quantity of most relevant words that used if distance == jaccard (also used for annotation)
  • n_ann_terms – max quantity of words in intersection/symmetric difference between topics (used for annotation)
  • diagonal – difference between identical topic no.s
  • annotation – intersection or difference of words between topics
  • normed (bool) – If true, matrix/array Z will be normalized
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:
  • model – Trained topic model
  • other_model – second topic model instance to calculate the difference from
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.Metric

Bases: object

Base Metric class for topic model evaluation metrics

get_value()
set_parameters(**parameters)

Set the parameters

class gensim.models.callbacks.PerplexityMetric(corpus=None, logger=None, viz_env=None, title=None)

Bases: gensim.models.callbacks.Metric

Metric class for perplexity evaluation

Parameters:
  • corpus – Gensim document corpus
  • logger – Monitor training process using: “shell” : print coherence value in shell “visdom” : visualize coherence value with increasing epochs in Visdom visualization framework
  • viz_env – Visdom environment to use for plotting the graph
  • title – title of the graph plot
get_value(**kwargs)
Parameters:model – Trained topic model
set_parameters(**parameters)

Set the parameters