gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

parsing.preprocessing – Functions to preprocess raw text

parsing.preprocessing – Functions to preprocess raw text

gensim.parsing.preprocessing.preprocess_documents(docs)
gensim.parsing.preprocessing.preprocess_string(s, filters=[<function <lambda>>, <function strip_tags>, <function strip_punctuation>, <function strip_multiple_whitespaces>, <function strip_numeric>, <function remove_stopwords>, <function strip_short>, <function stem_text>])
gensim.parsing.preprocessing.read_file(path)
gensim.parsing.preprocessing.read_files(pattern)
gensim.parsing.preprocessing.remove_stopwords(s)
gensim.parsing.preprocessing.split_alphanum(s)
gensim.parsing.preprocessing.stem(text)

Return lowercase and (porter-)stemmed version of string text.

gensim.parsing.preprocessing.stem_text(text)

Return lowercase and (porter-)stemmed version of string text.

gensim.parsing.preprocessing.strip_multiple_whitespaces(s)
gensim.parsing.preprocessing.strip_non_alphanum(s)
gensim.parsing.preprocessing.strip_numeric(s)
gensim.parsing.preprocessing.strip_punctuation(s)
gensim.parsing.preprocessing.strip_punctuation2(s)
gensim.parsing.preprocessing.strip_short(s, minsize=3)
gensim.parsing.preprocessing.strip_tags(s)