These corpora are the Portuguese-English and Portuguese-Spanish bilingual collections of the online issues of the scientific news Brazilian magazine REVISTA PESQUISA FAPESP .
When using these data please cite Aziz and Specia (2011) and add a link to the magazine's webpage.
http://revistapesquisa.fapesp.br
@INPROCEEDINGS{aziz:2011:newfapesp,
AUTHOR={Wilker Aziz and Lucia Specia},
TITLE={Fully Automatic Compilation of a {Portuguese-English} Parallel Corpus for Statistical Machine Translation},
BOOKTITLE={STIL 2011},
ADDRESS={Cuiab\'a, MT},
DAYS={24-26},
MONTH={Obtober},
YEAR={2011},
}
Please address to the paper (Aziz and Specia, 2011).
For additional information email
Wilker Aziz
.
The data is sentence-aligned (alignment is given by the line number).
Under bitexts you will find the document pairs (one document per file).
You will also find the data split into different sets (training, development and test) proposed by (Aziz and Specia, 2011).
This corpus is distributed under Creative Commons 2.0 (Attribution-NonCommercial 2.0):
You may distribute it and use for non-commercial purposes, such as academic reasearch, but any commercial use of the corpus must be agreed with REVISTA PESQUISA FAPESP.
Email:
w.aziz@wlv.ac.uk
or
wilker.aziz@gmail.com
News:
http://pers-www.wlv.ac.uk/~in1676/