gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

summarization.summarizer – TextRank Summariser

summarization.summarizer – TextRank Summariser

This module provides functions for summarizing texts. Summarizing is based on ranks of text sentences using a variation of the TextRank algorithm [1].

[1](1, 2) Federico Barrios, Federico L´opez, Luis Argerich, Rosita Wachenchauzer (2016). Variations of the Similarity Function of TextRank for Automated Summarization, https://arxiv.org/abs/1602.03606

Data

INPUT_MIN_LENGTH - Minimal number of sentences in text
WEIGHT_THRESHOLD - Minimal weight of edge between graph nodes. Smaller weights set to zero.

Example

>>> from gensim.summarization.summarizer import summarize
>>> text = '''Rice Pudding - Poem by Alan Alexander Milne
... What is the matter with Mary Jane?
... She's crying with all her might and main,
... And she won't eat her dinner - rice pudding again -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her dolls and a daisy-chain,
... And a book about animals - all in vain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well, and she hasn't a pain;
... But, look at her, now she's beginning again! -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her sweets and a ride in the train,
... And I've begged her to stop for a bit and explain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well and she hasn't a pain,
... And it's lovely rice pudding for dinner again!
... What is the matter with Mary Jane?'''
>>> print(summarize(text))
And she won't eat her dinner - rice pudding again -
I've promised her dolls and a daisy-chain,
I've promised her sweets and a ride in the train,
And it's lovely rice pudding for dinner again!
gensim.summarization.summarizer.summarize(text, ratio=0.2, word_count=None, split=False)

Get a summarized version of the given text.

The output summary will consist of the most representative sentences and will be returned as a string, divided by newlines.

Note

The input should be a string, and must be longer than INPUT_MIN_LENGTH sentences for the summary to make sense. The text will be split into sentences using the split_sentences method in the gensim.summarization.texcleaner module. Note that newlines divide sentences.

Parameters:
  • text (str) – Given text.
  • ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary.
  • word_count (int or None, optional) – Determines how many words will the output contain. If both parameters are provided, the ratio will be ignored.
  • split (bool, optional) – If True, list of sentences will be returned. Otherwise joined strings will bwe returned.
Returns:

  • list of str – If split OR
  • str – Most representative sentences of given the text.

gensim.summarization.summarizer.summarize_corpus(corpus, ratio=0.2)
Get a list of the most important documents of a corpus using a variation of the TextRank algorithm [1].
Used as helper for summarize summarizer()

Note

The input must have at least INPUT_MIN_LENGTH documents for the summary to make sense.

Parameters:
  • corpus (list of list of (int, int)) – Given corpus.
  • ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary, optional.
Returns:

Most important documents of given corpus sorted by the document score, highest first.

Return type:

list of str