gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

summarization.textcleaner – Summarization pre-processing

summarization.textcleaner – Summarization pre-processing

gensim.summarization.textcleaner.clean_text_by_sentences(text)

Tokenizes a given text into sentences, applying filters and lemmatizing them. Returns a SyntacticUnit list.

gensim.summarization.textcleaner.clean_text_by_word(text, deacc=True)

Tokenizes a given text into words, applying filters and lemmatizing them. Returns a dict of word -> syntacticUnit.

gensim.summarization.textcleaner.get_sentences(text)
gensim.summarization.textcleaner.join_words(words, separator=' ')
gensim.summarization.textcleaner.merge_syntactic_units(original_units, filtered_units, tags=None)
gensim.summarization.textcleaner.replace_abbreviations(text)
gensim.summarization.textcleaner.replace_with_separator(text, separator, regexs)
gensim.summarization.textcleaner.split_sentences(text)
gensim.summarization.textcleaner.tokenize_by_word(text)
gensim.summarization.textcleaner.undo_replacement(sentence)