summarization.keywords – Keywords for TextRank summarization algorithm

`summarization.keywords` – Keywords for TextRank summarization algorithm¶

This module contains functions to find keywords of the text and building graph on tokens from text.

Examples

Extract keywords from text

>>> from gensim.summarization import keywords
>>> text = '''Challenges in natural language processing frequently involve
... speech recognition, natural language understanding, natural language
... generation (frequently from formal, machine-readable logical forms),
... connecting language and machine perception, dialog systems, or some
... combination thereof.'''
>>> keywords(text).split('\n')
[u'natural language', u'machine', u'frequently']

Notes

Check tags in http://www.clips.ua.ac.be/pages/mbsp-tags and use only first two letters for INCLUDING_FILTER and EXCLUDING_FILTER

Data:¶

WINDOW_SIZE - Size of window, number of consecutive tokens in processing.

INCLUDING_FILTER - Including part of speech filters.

EXCLUDING_FILTER - Excluding part of speech filters.

gensim.summarization.keywords.get_graph(text)¶

Creates and returns graph from given text, cleans and tokenize text before building graph.

Parameters: text (str) – Sequence of values.
Returns: Created graph.
Return type: Graph

gensim.summarization.keywords.keywords(text, ratio=0.2, words=None, split=False, scores=False, pos_filter=('NN', 'JJ'), lemmatize=False, deacc=True)¶

Get most ranked words of provided text and/or its combinations.

Parameters

text (str) – Input text.
ratio (float, optional) – If no “words” option is selected, the number of sentences is reduced by the provided ratio, else, the ratio is ignored.
words (int, optional) – Number of returned words.
split (bool, optional) – Whether split keywords if True.
scores (bool, optional) – Whether score of keyword.
pos_filter (tuple, optional) – Part of speech filters.
lemmatize (bool, optional) – If True - lemmatize words.
deacc (bool, optional) – If True - remove accentuation.

Returns

result (list of (str, float)) – If scores, keywords with scores OR
result (list of str) – If split, keywords only OR
result (str) – Keywords, joined by endl.

Get Expert Help From The Gensim Authors

summarization.keywords – Keywords for TextRank summarization algorithm¶

Data:¶

`summarization.keywords` – Keywords for TextRank summarization algorithm¶