gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

topic_coherence.direct_confirmation_measure – Direct confirmation measure module

topic_coherence.direct_confirmation_measure – Direct confirmation measure module

This module contains functions to compute direct confirmation on a pair of words or word subsets.

gensim.topic_coherence.direct_confirmation_measure.aggregate_segment_sims(segment_sims, with_std, with_support)

Compute various statistics from the segment similarities generated via set pairwise comparisons of top-N word lists for a single topic.

Parameters:
  • segment_sims (iterable) – floating point similarity values to aggregate.
  • with_std (bool) – Set to True to include standard deviation.
  • with_support (bool) – Set to True to include number of elements in segment_sims as a statistic in the results returned.
Returns:

with (mean[, std[, support]])

Return type:

tuple

gensim.topic_coherence.direct_confirmation_measure.log_conditional_probability(segmented_topics, accumulator, with_std=False, with_support=False)

This function calculates the log-conditional-probability measure which is used by coherence measures such as U_mass. This is defined as: m_lc(S_i) = log[(P(W’, W*) + e) / P(W*)]

Parameters:
  • segmented_topics (list) – Output from the segmentation module of the segmented topics. Is a list of list of tuples.
  • accumulator – word occurrence accumulator from probability_estimation.
  • with_std (bool) – True to also include standard deviation across topic segment sets in addition to the mean coherence for each topic; default is False.
  • with_support (bool) – True to also include support across topic segments. The support is defined as the number of pairwise similarity comparisons were used to compute the overall topic coherence.
Returns:

of log conditional probability measure for each topic.

Return type:

list

gensim.topic_coherence.direct_confirmation_measure.log_ratio_measure(segmented_topics, accumulator, normalize=False, with_std=False, with_support=False)
If normalize=False:
Popularly known as PMI. This function calculates the log-ratio-measure which is used by coherence measures such as c_v. This is defined as: m_lr(S_i) = log[(P(W’, W*) + e) / (P(W’) * P(W*))]
If normalize=True:
This function calculates the normalized-log-ratio-measure, popularly knowns as NPMI which is used by coherence measures such as c_v. This is defined as: m_nlr(S_i) = m_lr(S_i) / -log[P(W’, W*) + e]
Parameters:
  • segmented_topics (list) – Output from the segmentation module of the segmented topics. Is a list of list of tuples.
  • accumulator – word occurrence accumulator from probability_estimation.
  • with_std (bool) – True to also include standard deviation across topic segment sets in addition to the mean coherence for each topic; default is False.
  • with_support (bool) – True to also include support across topic segments. The support is defined as the number of pairwise similarity comparisons were used to compute the overall topic coherence.
Returns:

of log ratio measure for each topic.

Return type:

list