parsing.porter – Porter Stemming Algorithm

Porter Stemming Algorithm This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It may be be regarded as canonical, in that it follows the algorithm presented in 1, see also 2

Author - Vivake Gupta (v@nano.com), optimizations and cleanup of the code by Lars Buitinck.

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>>
>>> p = PorterStemmer()
>>> p.stem("apple")
'appl'
>>>
>>> p.stem_sentence("Cats and ponies have meeting")
'cat and poni have meet'
>>>
>>> p.stem_documents(["Cats and ponies", "have meeting"])
['cat and poni', 'have meet']
1

Porter, 1980, An algorithm for suffix stripping, http://www.cs.odu.edu/~jbollen/IR04/readings/readings5.pdf

2

http://www.tartarus.org/~martin/PorterStemmer

class gensim.parsing.porter.PorterStemmer

Bases: object

Class contains implementation of Porter stemming algorithm.

b

Buffer holding a word to be stemmed. The letters are in b[0], b[1] … ending at b[k].

Type

str

k

Readjusted downwards as the stemming progresses.

Type

int

j

Word length.

Type

int

stem(w)

Stem the word w.

Parameters

w (str) –

Returns

Stemmed version of w.

Return type

str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem("ponies")
'poni'
stem_documents(docs)

Stem documents.

Parameters

docs (list of str) – Input documents

Returns

Stemmed documents.

Return type

list of str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_documents(["Have a very nice weekend", "Have a very nice weekend"])
['have a veri nice weekend', 'have a veri nice weekend']
stem_sentence(txt)

Stem the sentence txt.

Parameters

txt (str) – Input sentence.

Returns

Stemmed sentence.

Return type

str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_sentence("Wow very nice woman with apple")
'wow veri nice woman with appl'