gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

summarization.summarizer – TextRank Summariser

summarization.summarizer – TextRank Summariser

gensim.summarization.summarizer.summarize(text, ratio=0.2, word_count=None, split=False)

Returns a summarized version of the given text using a variation of the TextRank algorithm (see https://arxiv.org/abs/1602.03606).

The output summary will consist of the most representative sentences and will be returned as a string, divided by newlines. If the split parameter is set to True, a list of sentences will be returned instead.

The input should be a string, and must be longer than INPUT_MIN_LENGTH sentences for the summary to make sense. The text will be split into sentences using the split_sentences method in the summarization.texcleaner module. Note that newlines divide sentences.

The length of the output can be specified using the ratio and word_count parameters:

ratio should be a number between 0 and 1 that determines the percentage of the number of sentences of the original text to be chosen for the summary (defaults at 0.2). word_count determines how many words will the output contain. If both parameters are provided, the ratio will be ignored.
gensim.summarization.summarizer.summarize_corpus(corpus, ratio=0.2)