gensim logo

gensim
gensim tagline

Get Expert Help

• machine learning, NLP, data mining

• custom SW design, development, optimizations

• corporate trainings & IT consulting

parsing.porter – Porter Stemming Algorithm

parsing.porter – Porter Stemming Algorithm

Porter Stemming Algorithm This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It may be be regarded as canonical, in that it follows the algorithm presented in [1], see also [2]

Author - Vivake Gupta (v@nano.com), optimizations and cleanup of the code by Lars Buitinck.

Examples:

>>> from gensim.parsing.porter import PorterStemmer
>>>
>>> p = PorterStemmer()
>>> p.stem("apple")
'appl'
>>>
>>> p.stem_sentence("Cats and ponies have meeting")
'cat and poni have meet'
>>>
>>> p.stem_documents(["Cats and ponies", "have meeting"])
['cat and poni', 'have meet']
[1]Porter, 1980, An algorithm for suffix stripping, http://www.cs.odu.edu/~jbollen/IR04/readings/readings5.pdf
[2]http://www.tartarus.org/~martin/PorterStemmer
class gensim.parsing.porter.PorterStemmer

Bases: object

Class contains implementation of Porter stemming algorithm.

b

str – Buffer holding a word to be stemmed. The letters are in b[0], b[1] … ending at b[k].

k

int – Readjusted downwards as the stemming progresses.

j

int – Word length.

stem(w)

Stem the word w.

Parameters:w (str) –
Returns:Stemmed version of w.
Return type:str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem("ponies")
'poni'
stem_documents(docs)

Stem documents.

Parameters:docs (list of str) – Input documents
Returns:Stemmed documents.
Return type:list of str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_documents(["Have a very nice weekend", "Have a very nice weekend"])
['have a veri nice weekend', 'have a veri nice weekend']
stem_sentence(txt)

Stem the sentence txt.

Parameters:txt (str) – Input sentence.
Returns:Stemmed sentence.
Return type:str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_sentence("Wow very nice woman with apple")
'wow veri nice woman with appl'