Núcleo Interinstitucional de Lingüística Computacional
An Interinstitutional Center for Research and Development in Computational Linguistics


Porter Stemmer for Brazilian Portuguese


This stemmer was developed at LABIC and follows Porter's algorithm. It works for Brazilian Portuguese language, identifying the stem of words by incrementally removing their suffix/termination.


Caldas Junior, J.; Imamura, C.Y.M.; Rezende, S.O. (2001). Avaliação de um Algoritmo de Stemming para o Língua Portuguesa. In the Proceedings of the 2nd Congress of Logic Applied to Technology, Vol. 2, pp. 267–274.


Stemmer for words

The stemmer must be executed in command line, getting as input a word. For instance:

stemmer.exe word

The stem of the word will be shown in the screen. To put it in a file, execute the following:

stemmer.exe word > myfile.txt

Stemmer for files

The stemmer must be executed in command line, getting as input a file name. For instance:

stemmer.exe myfile.txt

The stem of the word will be stored in a file with the same name + '.stemmed' (for instance, myfile.txt.stemmed)