Corpus page

Danish: dfk (ca. 10.000.000 words mixed text, no password required) CG-tagged (part only)
korpus 90 by DSL (ca. 26.000.000 words mixed text, password-free) CG-tagged by E.Bick/VISL
korpus 2000 by DSL (ca. 26.000.000 words mixed text, password-free) CG-tagged by E.Bick/VISL

English:bnc, the British National Corpus (ca. 100.000.000 words, mixed corpus)
German:bzk (ca. 4.000.000 words, newspaper corpus)mak (ca. 2 500 000 words, mixed corpus)
Portuguese:(tagged) speech data (50.000 words, no password) historical texts (50.000 words) modern texts (100.000 words)
CETEMP�blico (1.000.000 words, no password required)
Spanish:camtie (ca. 1.200.000 words, newspaper text)
Search conventions are explained in the manual (separate window). When searching tagged text, use double quotes for word forms, single quotes for base forms (not given for Danish). Tags are separated by blank space, words by underscore. Use '_._' for dummy words, '_.?_' for one optional dummy word, '_.*_' for one or more optional dummy words, and '_.+_' for one or more obligatory dummy words. Sentence start is '- - -' in untagged corpora, ">>>" (in word form quotes) for tagged corpora. For notational details, have a look at the CG-tags used for Danish or Portuguese at the VISL project site.
Enter search string:
Enter password:
Please note that corpus search engines are meant to provide researchers with language data and statistics, not running text. Thus, ordinary copyright still holds. This implies for instance that you mustn't try to extract larger, contiguous text portions from the corpus.
The search system was designed by Eckhard Bick for VISL. More information on the project as well as live grammatical analysis and a number of grammar teaching tools are available at the VISL main site.
Please mail any questions or suggestions you might have ... the site is evolving. Also, let us know if you have any texts yourself which you would like us to make accessible for searching at this site.

Webmaster: mps@mip.sdu.dk