gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

summarization.mz_entropy – Keywords for the Montemurro and Zanette entropy algorithm

summarization.mz_entropy – Keywords for the Montemurro and Zanette entropy algorithm

gensim.summarization.mz_entropy.mz_keywords(text, blocksize=1024, scores=False, split=False, weighted=True, threshold=0.0)

Extract keywords from text using the Montemurro and Zanette entropy algorithm. [1]

Parameters:
  • text (str) – Document for summarization.
  • blocksize (int, optional) – Size of blocks to use in analysis.
  • scores (bool, optional) – Whether to return score with keywords.
  • split (bool, optional) – Whether to return results as list.
  • weighted (bool, optional) – Whether to weight scores by word frequency. False can useful for shorter texts, and allows automatic thresholding.
  • threshold (float or 'auto', optional) – Minimum score for returned keywords, ‘auto’ calculates the threshold as n_blocks / (n_blocks + 1.0) + 1e-8, use ‘auto’ with weighted=False.
Returns:

  • results (str) – newline separated keywords if split == False OR
  • results (list(str)) – list of keywords if scores == False OR
  • results (list(tuple(str, float))) – list of (keyword, score) tuples if scores == True
  • Results are returned in descending order of score regardless of the format.

Note

This algorithm looks for keywords that contribute to the structure of the text on scales of blocksize words of larger. It is suitable for extracting keywords representing the major themes of long texts.

References

[1]Marcello A Montemurro, Damian Zanette, “Towards the quantification of the semantic information encoded in written language”. Advances in Complex Systems, Volume 13, Issue 2 (2010), pp. 135-153, DOI: 10.1142/S0219525910002530, https://arxiv.org/abs/0907.1558