Fapesp Corpora - README

REVISTA PESQUISA FAPESP PARALLEL CORPORA

 

Download it!

 

Content

These corpora are the Portuguese-English and Portuguese-Spanish bilingual collections of the online issues of the scientific news Brazilian magazine REVISTA PESQUISA FAPESP .

Citing

When using these data please cite Aziz and Specia (2011) and add a link to the magazine's webpage.


http://revistapesquisa.fapesp.br


@INPROCEEDINGS{aziz:2011:newfapesp,
AUTHOR={Wilker Aziz and Lucia Specia},
TITLE={Fully Automatic Compilation of a {Portuguese-English} Parallel Corpus for Statistical Machine Translation},
BOOKTITLE={STIL 2011},
ADDRESS={Cuiab\'a, MT},
DAYS={24-26},
MONTH={Obtober},
YEAR={2011},
} 

Documentation

Please address to the paper (Aziz and Specia, 2011).
For additional information email Wilker Aziz .

Format

The data is sentence-aligned (alignment is given by the line number).
Under bitexts you will find the document pairs (one document per file).
You will also find the data split into different sets (training, development and test) proposed by (Aziz and Specia, 2011).

LICENSE

This corpus is distributed under Creative Commons 2.0 (Attribution-NonCommercial 2.0):

You may distribute it and use for non-commercial purposes, such as academic reasearch, but any commercial use of the corpus must be agreed with REVISTA PESQUISA FAPESP.

CONTACT

Email: w.aziz@wlv.ac.uk or wilker.aziz@gmail.com
News: http://pers-www.wlv.ac.uk/~in1676/