Select a Portuguese corpus (350 million words in all): Floresta-Público (115.000 words, corrected) Floresta-Folha (24.000 words, corrected) Europarl (ca. 27.2 mill. words, no password) Folha de São Paulo (ca. 24.2 mill. words, no password) Folha with semantics (ca. 20.2 mill. words, Wikipedia (ca. 11.3 mill. words, no password) Público (ca. 176.4 mill. words, no password) COLONIA (ca. 4.9 mill. words, no password) Público-98 (ca. 38.4 mill. words, no password) Público-91 (ca. 18.0 mill. words, no password) Público-92 (ca. 38.4 mill. words, no password) Público-93 (ca. 38.4 mill. words, no password) Público-94 (ca. 38.4 mill. words, no password) Público-95 (ca. 38.4 mill. words, no password) Público-96 (ca. 38.4 mill. words, no password) Público-97 (ca. 38.4 mill. words, no password) C-ORAL (ca. 200.000 words) NURC (ca. 117.000 words) Children (ca. 45.600 words, password) Ad corpus (ca. 46.000 words) Netlang (Hate speech) (ca. 6.3 mill. words, password)
Case insensitive Diacritics insensitive