summarization.summarizer
– TextRank Summariser¶This module provides functions for summarizing texts. Summarizing is based on ranks of text sentences using a variation of the TextRank algorithm 1.
Federico Barrios, Federico L´opez, Luis Argerich, Rosita Wachenchauzer (2016). Variations of the Similarity Function of TextRank for Automated Summarization, https://arxiv.org/abs/1602.03606
INPUT_MIN_LENGTH - Minimal number of sentences in text
WEIGHT_THRESHOLD - Minimal weight of edge between graph nodes. Smaller weights set to zero.
Example
>>> from gensim.summarization.summarizer import summarize
>>> text = '''Rice Pudding - Poem by Alan Alexander Milne
... What is the matter with Mary Jane?
... She's crying with all her might and main,
... And she won't eat her dinner - rice pudding again -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her dolls and a daisy-chain,
... And a book about animals - all in vain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well, and she hasn't a pain;
... But, look at her, now she's beginning again! -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... I've promised her sweets and a ride in the train,
... And I've begged her to stop for a bit and explain -
... What is the matter with Mary Jane?
... What is the matter with Mary Jane?
... She's perfectly well and she hasn't a pain,
... And it's lovely rice pudding for dinner again!
... What is the matter with Mary Jane?'''
>>> print(summarize(text))
And she won't eat her dinner - rice pudding again -
I've promised her dolls and a daisy-chain,
I've promised her sweets and a ride in the train,
And it's lovely rice pudding for dinner again!
gensim.summarization.summarizer.
summarize
(text, ratio=0.2, word_count=None, split=False)¶Get a summarized version of the given text.
The output summary will consist of the most representative sentences and will be returned as a string, divided by newlines.
Note
The input should be a string, and must be longer than INPUT_MIN_LENGTH
sentences for the summary to make sense.
The text will be split into sentences using the split_sentences method in the gensim.summarization.texcleaner
module. Note that newlines divide sentences.
text (str) – Given text.
ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary.
word_count (int or None, optional) – Determines how many words will the output contain. If both parameters are provided, the ratio will be ignored.
split (bool, optional) – If True, list of sentences will be returned. Otherwise joined strings will bwe returned.
list of str – If split OR
str – Most representative sentences of given the text.
gensim.summarization.summarizer.
summarize_corpus
(corpus, ratio=0.2)¶Used as helper for summarize summarizer()
Note
The input must have at least INPUT_MIN_LENGTH
documents for the summary
to make sense.
corpus (list of list of (int, int)) – Given corpus.
ratio (float, optional) – Number between 0 and 1 that determines the proportion of the number of sentences of the original text to be chosen for the summary, optional.
Most important documents of given corpus sorted by the document score, highest first.
list of str