`parsing.porter` – Porter Stemming Algorithm¶

Porter Stemming Algorithm This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It may be be regarded as canonical, in that it follows the algorithm presented in 1, see also 2

Author - Vivake Gupta (v@nano.com), optimizations and cleanup of the code by Lars Buitinck.

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>>
>>> p = PorterStemmer()
>>> p.stem("apple")
'appl'
>>>
>>> p.stem_sentence("Cats and ponies have meeting")
'cat and poni have meet'
>>>
>>> p.stem_documents(["Cats and ponies", "have meeting"])
['cat and poni', 'have meet']

1: Porter, 1980, An algorithm for suffix stripping, http://www.cs.odu.edu/~jbollen/IR04/readings/readings5.pdf
2: http://www.tartarus.org/~martin/PorterStemmer

class gensim.parsing.porter.PorterStemmer¶

Bases: object

Class contains implementation of Porter stemming algorithm.

b¶

Buffer holding a word to be stemmed. The letters are in b[0], b[1] … ending at b[k].

Type: str

k¶

Readjusted downwards as the stemming progresses.

Type: int

j¶

Word length.

Type: int

stem(w)¶

Stem the word w.

Parameters: w (str) –
Returns: Stemmed version of w.
Return type: str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem("ponies")
'poni'

stem_documents(docs)¶

Stem documents.

Parameters: docs (list of str) – Input documents
Returns: Stemmed documents.
Return type: list of str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_documents(["Have a very nice weekend", "Have a very nice weekend"])
['have a veri nice weekend', 'have a veri nice weekend']

stem_sentence(txt)¶

Stem the sentence txt.

Parameters: txt (str) – Input sentence.
Returns: Stemmed sentence.
Return type: str

Examples

>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_sentence("Wow very nice woman with apple")
'wow veri nice woman with appl'

Please sponsor Gensim to help sustain this open source project!

parsing.porter – Porter Stemming Algorithm¶

`parsing.porter` – Porter Stemming Algorithm¶