Patterns and Sentences is a part of the Hjal-project and contains a list of n-grams (especially bigrams and trigrams) that are rare in Icelandic and sentences which contain words where these patterns occur. The sentences were extracted from novels published around the year 2000.
The sentences that participants in the Hjal-project were supposed to read were extracted automatically from a large corpus containing around 100 novels which were acquired from local publishing houses. The sentences were selected according to a list of n-grams made by Eiríkur Rögnvaldsson, most of them extracted from Íslensk rímorðabók [Icelandic Rhyming Dictionary]. Care was taken to select sentences including rare patterns, to ensure that these patterns would be amply exemplified in the reading text. The file pattens_sentences contains both 182 rare n-grams and 1433 sentences selected on the basis of these n-grams.
You can Download Patterns and Sentences. Prospective users must register and accept the terms and conditions. The texts are accessible through a CC BY 3.0 licence.
Eiríkur Rögnvaldsson
professor, Icelandic linguistics
Faculty of Icelandic and Comparative Cultural Studies
School of Humanities, University of Iceland
Office: Árnagarði, 415
Work phone: +354-525-4403
Fax: +354-525-4242
e-mail: eirikur@hi.is;
eirikur.rognvaldsson@gmail.com.