parsing.porter
– Porter Stemming Algorithm¶
Porter Stemming Algorithm This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It may be be regarded as canonical, in that it follows the algorithm presented in 1, see also 2
Author - Vivake Gupta (v@nano.com), optimizations and cleanup of the code by Lars Buitinck.
Examples
>>> from gensim.parsing.porter import PorterStemmer
>>>
>>> p = PorterStemmer()
>>> p.stem("apple")
'appl'
>>>
>>> p.stem_sentence("Cats and ponies have meeting")
'cat and poni have meet'
>>>
>>> p.stem_documents(["Cats and ponies", "have meeting"])
['cat and poni', 'have meet']
- 1
Porter, 1980, An algorithm for suffix stripping, http://www.cs.odu.edu/~jbollen/IR04/readings/readings5.pdf
- 2
- class gensim.parsing.porter.PorterStemmer¶
Bases:
object
Class contains implementation of Porter stemming algorithm.
- b¶
Buffer holding a word to be stemmed. The letters are in b[0], b[1] … ending at b[k].
- Type
str
- k¶
Readjusted downwards as the stemming progresses.
- Type
int
- j¶
Word length.
- Type
int
- stem(w)¶
Stem the word w.
- Parameters
w (str) –
- Returns
Stemmed version of w.
- Return type
str
Examples
>>> from gensim.parsing.porter import PorterStemmer >>> p = PorterStemmer() >>> p.stem("ponies") 'poni'
- stem_documents(docs)¶
Stem documents.
- Parameters
docs (list of str) – Input documents
- Returns
Stemmed documents.
- Return type
list of str
Examples
>>> from gensim.parsing.porter import PorterStemmer >>> p = PorterStemmer() >>> p.stem_documents(["Have a very nice weekend", "Have a very nice weekend"]) ['have a veri nice weekend', 'have a veri nice weekend']
- stem_sentence(txt)¶
Stem the sentence txt.
- Parameters
txt (str) – Input sentence.
- Returns
Stemmed sentence.
- Return type
str
Examples
>>> from gensim.parsing.porter import PorterStemmer >>> p = PorterStemmer() >>> p.stem_sentence("Wow very nice woman with apple") 'wow veri nice woman with appl'