/httpd/html/Corpus Eye

Online corpora collections

  • IDS-Korpora: German corpus archive, searchable as COSMAS (1.900.000.000 words !!!)
  • AC/DC Corpora: Very large collection of Portuguese corpora (by Diana Santos, Linguateca), much of it PALAVRAS-annotated (Eckhard Bick)
  • Corpus del Espanol: Fast and unabridged comparative corpus of historical and modern Spanish (by Mark Davies)
  • BNC's Corpus Page: Overview of English corpora, online acces to the BNC (100 million words)
  • Cobuild Corpus: Mixed British/American English corpus (50 million words), includeds transcribed speech (by Collins)

    Internet corpus tools

  • webcorp: Searching the internet as a corpus, slow but nice. Keeps going.
  • web-conc: Concordancing with the whole internet as a corpus. Fast AND nice.

    Other corpora link overviews

  • Corpora Links: Link collection of corpora in many languages at the University of Tübingen (maintained by Laura Kallmeyer)
  • Corpus Linguistics: Link collection of corpora in many languages (maintained by Michael Barlow)
  • Statistical NLP and computational corpus linguistics: Link collection of corpora and NLP-resources in many languages at the Stanford InfoLab (maintained by Christopher Manning)