Erick Rocha Fonseca

I am a doctoral student in Computer Science at ICMC/University of São Paulo. I also got my Master's degree there.

My research is centered in Natural Language Processing, a field that I greatly enjoy. I explore machine learning methods, and have a growing interest in Deep Learning.

I'm a member of NILC.

Google Scholar Profile
Currículo Lattes (in Portuguese)
Short Bio

I graduated in Computer Science in Fluminense Federal University in 2009. I did some research on metaheuristics combined with data mining, but since then I have not worked with that topic.

In my Master's research, I worked with automatic semantic role labeling in Portuguese. I implemented a model based on SENNA for that task, and thus started the nlpnet library. I also used it for POS tagging in parallel research.

In my doctoral research, I explore text entailment and paraphrase detection methods to help automatic question answering systems.


My best known project is nlpnet. It is a Python library (together with standalone scripts) for training and running NLP tagging tools based on neural networks and distributed word representations, also known as word embeddings. Documentation can be found here, along with pre-trained models.

Currently, my version performs POS tagging and SRL, and I plan to add NER identification (at least one fork implemented it) and parsing in the coming months.


In 2013 and 2014, I revised the Mac-Morpho corpus, a collection of newswire texts in Brazilian Portuguese manually annotated with POS tags. Duplicated sentences and sentences with missing words were removed, and a few other corrections were performed. Word contractions were joined in a single token, as they appear in actual text, instead of leaving them separated. Both the original and the revised versions can be found in the link.

E. R. Fonseca, S. M. Aluísio and J. L. G. Rosa. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. Journal of the Brazilian Computer Society, 21:2. 2015.
E. R. Fonseca and J. L. G. Rosa. Mac-Morpho Revisited: Towards Robust Part-of-Speech Tagging. Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, 2013. p. 98-107
E. R. Fonseca and J. L. G. Rosa. A Two-Step Convolutional Neural Network Approach for Semantic Role Labeling. Proceedings of the 2013 International Joint Conference on Neural Networks, 2013. p. 2955-2961
E. R. Fonseca and J. L. G. Rosa. An architecture for semantic role labeling on Portuguese. PROPOR 2012 - International Conference on Computational Processing of Portuguese, 2012. p. 204-209.
A. Plastino, E. R. Fonseca, R. Fuchshuber, S. L. Martins, A. A. Freitas, L. Martino, S. Salhi. A Hybrid Data Mining Metaheuristic for the p-Median Problem. SIAM International Conference on Data Mining, 2009. p. 305-316.
E. R. Fonseca, R. Fuchshuber, A. Plastino, S. L. Martins. MDM-GRASP: Uma Metaheurística Híbrida e Adaptativa. Anais do XLI Simpósio Brasileiro de Pesquisa Operacional, 2009. p. 3391-3398.
E. R. Fonseca, R. Fuchshuber, L. F. Santos, A. Plastino, S. L. Martins. Explorando a Metaheurística Híbrida DM-GRASP para o Problema de Multicast Confiável. Anais do XL Simpósio Brasileiro de Pesquisa Operacional, 2008. p. 1284-1295.
E. R. Fonseca, R. Fuchshuber, L. F. Santos, A. Plastino, S. L. Martins. Exploring the Hybrid Metaheuristic DM-GRASP for Efficient Server Replication for Reliable Multicast. Proceedings of the 2nd International Conference on Metaheuristics and Nature Inspired Computing, 2008.