topic_coherence.direct_confirmation_measure
– Direct confirmation measure module¶
This module contains functions to compute direct confirmation on a pair of words or word subsets.

gensim.topic_coherence.direct_confirmation_measure.
aggregate_segment_sims
(segment_sims, with_std, with_support)¶ Compute various statistics from the segment similarities generated via set pairwise comparisons of topN word lists for a single topic.
 Parameters
segment_sims (iterable of float) – Similarity values to aggregate.
with_std (bool) – Set to True to include standard deviation.
with_support (bool) – Set to True to include number of elements in segment_sims as a statistic in the results returned.
 Returns
Tuple with (mean[, std[, support]]).
 Return type
(float[, float[, int]])
Examples
>>> from gensim.topic_coherence import direct_confirmation_measure >>> >>> segment_sims = [0.2, 0.5, 1., 0.05] >>> direct_confirmation_measure.aggregate_segment_sims(segment_sims, True, True) (0.4375, 0.36293077852394939, 4) >>> direct_confirmation_measure.aggregate_segment_sims(segment_sims, False, False) 0.4375

gensim.topic_coherence.direct_confirmation_measure.
log_conditional_probability
(segmented_topics, accumulator, with_std=False, with_support=False)¶ Calculate the logconditionalprobability measure which is used by coherence measures such as U_mass. This is defined as .
 Parameters
segmented_topics (list of lists of (int, int)) – Output from the
s_one_pre()
,s_one_one()
.accumulator (
InvertedIndexAccumulator
) – Word occurrence accumulator fromgensim.topic_coherence.probability_estimation
.with_std (bool, optional) – True to also include standard deviation across topic segment sets in addition to the mean coherence for each topic.
with_support (bool, optional) – True to also include support across topic segments. The support is defined as the number of pairwise similarity comparisons were used to compute the overall topic coherence.
 Returns
Log conditional probabilities measurement for each topic.
 Return type
list of float
Examples
>>> from gensim.topic_coherence import direct_confirmation_measure, text_analysis >>> from collections import namedtuple >>> >>> # Create dictionary >>> id2token = {1: 'test', 2: 'doc'} >>> token2id = {v: k for k, v in id2token.items()} >>> dictionary = namedtuple('Dictionary', 'token2id, id2token')(token2id, id2token) >>> >>> # Initialize segmented topics and accumulator >>> segmentation = [[(1, 2)]] >>> >>> accumulator = text_analysis.InvertedIndexAccumulator({1, 2}, dictionary) >>> accumulator._inverted_index = {0: {2, 3, 4}, 1: {3, 5}} >>> accumulator._num_docs = 5 >>> >>> # result should be ~ ln(1 / 2) = 0.693147181 >>> result = direct_confirmation_measure.log_conditional_probability(segmentation, accumulator)[0]

gensim.topic_coherence.direct_confirmation_measure.
log_ratio_measure
(segmented_topics, accumulator, normalize=False, with_std=False, with_support=False)¶ Compute log ratio measure for segment_topics.
 Parameters
segmented_topics (list of lists of (int, int)) – Output from the
s_one_pre()
,s_one_one()
.accumulator (
InvertedIndexAccumulator
) – Word occurrence accumulator fromgensim.topic_coherence.probability_estimation
.normalize (bool, optional) – Details in the “Notes” section.
with_std (bool, optional) – True to also include standard deviation across topic segment sets in addition to the mean coherence for each topic.
with_support (bool, optional) – True to also include support across topic segments. The support is defined as the number of pairwise similarity comparisons were used to compute the overall topic coherence.
Notes
 If normalize=False:
Calculate the logratiomeasure, popularly known as PMI which is used by coherence measures such as c_v. This is defined as
 If normalize=True:
Calculate the normalizedlogratiomeasure, popularly knowns as NPMI which is used by coherence measures such as c_v. This is defined as
 Returns
Log ratio measurements for each topic.
 Return type
list of float
Examples
>>> from gensim.topic_coherence import direct_confirmation_measure, text_analysis >>> from collections import namedtuple >>> >>> # Create dictionary >>> id2token = {1: 'test', 2: 'doc'} >>> token2id = {v: k for k, v in id2token.items()} >>> dictionary = namedtuple('Dictionary', 'token2id, id2token')(token2id, id2token) >>> >>> # Initialize segmented topics and accumulator >>> segmentation = [[(1, 2)]] >>> >>> accumulator = text_analysis.InvertedIndexAccumulator({1, 2}, dictionary) >>> accumulator._inverted_index = {0: {2, 3, 4}, 1: {3, 5}} >>> accumulator._num_docs = 5 >>> >>> # result should be ~ ln{(1 / 5) / [(3 / 5) * (2 / 5)]} = 0.182321557 >>> result = direct_confirmation_measure.log_ratio_measure(segmentation, accumulator)[0]