Brazilis

Supervisor

Sandra Maria Aluísio

Master's candidate

Nathan Siegle Hartmann

A very important set of information for the NLP language, compound by the semantic relations between the verb and its arguments, receives the name of semantic roles set. The task of identifying which words act as topics for the action of a verb is called semantic role labeling (Shamsfard and Mousavi, 2007).

For NLP, the annotation of semantic roles using annotated corpora to help with the task, was initially concieved by Gildea and Jurafsky (2001, 2002), employing Framenet (Baker et al., 1998) as a training corpus. From that point on, various projects using SRL were created for several languages, of which we quote the works completed for English: Gildea and Palmer (2002); Gildea and Hockenmaier (2003); Surdeanu et al. (2003); Palmer et al. (2005); Yi et al. (2007); Toutanova et al. (2008); Pradhan et al. (2008).

For Brazilian Portuguese, the Alva-Manchego (2013) system obtained 79.6 of F1 for revised synthatic trees (gold standard data) and Fonseca (2013) obtained 68.0 of F1 using a Deep Learning approach. Both works used the PropBank.Br corpus.

The objectives of this master's degree work are:

Advance the state of the art of SRL in Brazilian Portuguese in the journalistic genre;
Evaluate a semantic role labeler for Brazilian Portuguese for the online product opinions genre;

This master's thesis developed a semantic role labeling systeam for Brazilian Portuguese that was trained with unrevised synthatic trees (a slice of PLN-Br corpus). The system obtained 72.62 F1 annotation PropBank.Br (gold standard) and 69.12 F1 to annotated its own training corpus (automatic prasing). The Alva-Manchengo system, by comparison, obtained only 54.76 F1 annotating our corpus. Thus, this work shows that, to annotate unrevised synthatic trees (real application scenario), our system functions, with statistical difference, better SRL on the data.

We also showed that in unrevised synthatic tree annotations of product reviews, colected from the Buscapé website, our system performs 65.34 F1 against Alva-Manchengo's 57.72 (accounting statistical difference).

In addition, an ommited subject insertion system (first person singular and plural) with 87.8% of selection precision in PLN.Br and 94.5% precision on Buscapé. The explicitation of subjects on text allows them to be annotated, and thus, improve the general SRL system.

Finally we developed a rule based semantic role labeler for auxiliary verbs. The system annotates the way that the auxiliary verb helps the main verb. This system has 96.76% trust when applied on our corpus selection, PLN-Br.

Anotação automática semissupervisionada de papéis semânticos para o português do Brasil.

Alva-Manchego, F. (2013). Anotação automática semissupervisionada de papéis semânticos para o português do Brasil. Master’s thesis, University of São Paulo, Brazil.

The berkeley framenet project

Baker, C. F., C. J. Fillmore, and J. B. Lowe (1998). The berkeley framenet project. In Proceedings of the 17th international conference on Computational Linguistics, pp. 86–90.

Uma abordagem conexionista para anotação de papéis semânticos

Fonseca, E. R. (2013). Uma abordagem conexionista para anotação de papéis semânticos. Master’s thesis, University of São Paulo, Brazil.

Identifying semantic roles using combinatory categorial grammar

Gildea, D. and J. Hockenmaier (2003). Identifying semantic roles using combinatory categorial grammar. In Proceedings of the Conference on Empirical methods in Natural Language Processing (EMNLP-2003), pp. 57–64. Association for Computational Linguistics.

Identifying semantic roles in text

Gildea, D. and D. Jurafsky (2001). Identifying semantic roles in text. In 17th Internati- onal Joint Conference on Artificial Intelligence (IJCAI-01).

Automatic labeling of semantic roles

Gildea, D. and D. Jurafsky (2002). Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288.

The necessity of syntactic parsing for predicate argument recognition

Gildea, D. and M. Palmer (2002). The necessity of syntactic parsing for predicate argument recognition. In Proceedings of the 40th i Annual Conference of the Association for Computational Linguistics (Coling-2002), pp. 239–246.

The proposition bank: An annotated corpus of semantic roles

Palmer, M., D. Gildea, and P. Kingsbury (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31(1), 71–106.

Semantic role labeling: Synthesis Lectures on Human Language Technologies

Palmer, M., Gildea, D., and Xue, N. (2010). Semantic role labeling: Synthesis Lectures on Human Language Technologies, 1-103.

Towards robust semantic role labeling

Pradhan, S. S., W. Ward, and J. H. Martin (2008). Towards robust semantic role labeling. Computational Linguistics 34(2), 289–310.

Thematic role extraction using shallow parsing.

Shamsfard, M. and M. S. Mousavi (2007). Thematic role extraction using shallow parsing. International Journal of Computational, Intelligence, 126–132.

Using predicate-argument structures for information extraction

Surdeanu, M., S. Harabagiu, J. Williams, and P. Aarseth (2003). Using predicate-argument structures for information extraction. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL-2003), pp. 8–15. Association for Computational Linguistics.

Brazilis

Background

Objectives

Results

References