gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

topic_coherence.direct_confirmation_measure – Direct confirmation measure module

topic_coherence.direct_confirmation_measure – Direct confirmation measure module

This module contains functions to compute direct confirmation on a pair of words or word subsets.

gensim.topic_coherence.direct_confirmation_measure.log_conditional_probability(segmented_topics, accumulator)

This function calculates the log-conditional-probability measure which is used by coherence measures such as U_mass. This is defined as: m_lc(S_i) = log[(P(W’, W*) + e) / P(W*)]

Parameters:
  • segmented_topics – Output from the segmentation module of the segmented topics. Is a list of list of tuples.
  • accumulator – word occurrence accumulator from probability_estimation.
Returns:

List of log conditional probability measure for each topic.

Return type:

m_lc

gensim.topic_coherence.direct_confirmation_measure.log_ratio_measure(segmented_topics, accumulator, normalize=False)
If normalize=False:
Popularly known as PMI. This function calculates the log-ratio-measure which is used by coherence measures such as c_v. This is defined as: m_lr(S_i) = log[(P(W’, W*) + e) / (P(W’) * P(W*))]
If normalize=True:
This function calculates the normalized-log-ratio-measure, popularly knowns as NPMI which is used by coherence measures such as c_v. This is defined as: m_nlr(S_i) = m_lr(S_i) / -log[P(W’, W*) + e]
Parameters:
  • topics (segmented) – Output from the segmentation module of the segmented topics. Is a list of list of tuples.
  • accumulator – word occurrence accumulator from probability_estimation.
Returns:

List of log ratio measures for each topic.

Return type:

m_lr