parsing.porter
– Porter Stemming Algorithm¶Porter Stemming Algorithm This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It may be be regarded as canonical, in that it follows the algorithm presented in 1, see also 2
Author - Vivake Gupta (v@nano.com), optimizations and cleanup of the code by Lars Buitinck.
>>> from gensim.parsing.porter import PorterStemmer
>>>
>>> p = PorterStemmer()
>>> p.stem("apple")
'appl'
>>>
>>> p.stem_sentence("Cats and ponies have meeting")
'cat and poni have meet'
>>>
>>> p.stem_documents(["Cats and ponies", "have meeting"])
['cat and poni', 'have meet']
Porter, 1980, An algorithm for suffix stripping, http://www.cs.odu.edu/~jbollen/IR04/readings/readings5.pdf
gensim.parsing.porter.
PorterStemmer
¶Bases: object
Class contains implementation of Porter stemming algorithm.
b
¶Buffer holding a word to be stemmed. The letters are in b[0], b[1] … ending at b[k].
str
k
¶Readjusted downwards as the stemming progresses.
int
j
¶Word length.
int
stem
(w)¶Stem the word w.
w (str) –
Stemmed version of w.
str
Examples
>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem("ponies")
'poni'
stem_documents
(docs)¶Stem documents.
docs (list of str) – Input documents
Stemmed documents.
list of str
Examples
>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_documents(["Have a very nice weekend", "Have a very nice weekend"])
['have a veri nice weekend', 'have a veri nice weekend']
stem_sentence
(txt)¶Stem the sentence txt.
txt (str) – Input sentence.
Stemmed sentence.
str
Examples
>>> from gensim.parsing.porter import PorterStemmer
>>> p = PorterStemmer()
>>> p.stem_sentence("Wow very nice woman with apple")
'wow veri nice woman with appl'