Erick Rocha Fonseca

I am a doctoral student in Computer Science at ICMC/University of São Paulo. I also got my Master's degree there.

My research is centered in Natural Language Processing, a field that I greatly enjoy. I explore machine learning methods, and have a growing interest in Deep Learning.

I'm a member of NILC.

Google Scholar Profile
Currículo Lattes (in Portuguese)
Short Bio

I graduated in Computer Science in Fluminense Federal University in 2009. I did some research on metaheuristics combined with data mining, but since then I have not worked with that topic.

In my Master's research, I worked with automatic semantic role labeling in Portuguese. I implemented a model based on SENNA for that task, and thus started the nlpnet library. I also used it for POS tagging in parallel research.

In my doctoral research, I explore text entailment and paraphrase detection methods to help automatic question answering systems.


My best known project is nlpnet. It is a Python library (together with standalone scripts) for training and running NLP tagging tools based on neural networks and distributed word representations, also known as word embeddings. Documentation can be found here, along with pre-trained models.

Currently, my version performs POS tagging and SRL, and I plan to add NER identification (at least one fork implemented it) and parsing in the coming months.


In 2013 and 2014, I revised the Mac-Morpho corpus, a collection of newswire texts in Brazilian Portuguese manually annotated with POS tags. Duplicated sentences and sentences with missing words were removed, and a few other corrections were performed. Word contractions were joined in a single token, as they appear in actual text, instead of leaving them separated. Both the original and the revised versions can be found in the link.

