Icelandic Wordweb (Íslenskt orðanet) describes semantic relations of Icelandic words and phrasemes as semantically unambiguous units. The basis of the project is a collection of phrasemes and compounds with a standardised representation which includes more than 200,000 phrasemes of various kinds and about 100,000 compounds. In later stages, semantic relations have mainly been traced and analysed from the parallel relations of words and phrases as they appear in a textual environment.
Icelandic Wordweb (Íslenskt orðanet) is a research project which analyses and describes semantic relations of Icelandic words and phrasemes. The methodology is based on the prerequisite that the semantic relations are indicated by the syntagmatic relations as they appear in collocations and other word combinations. The basis of the project is a collection of phrasemes and compounds with a standardised representation which includes more than 200,000 phrasemes of various kinds and about 100,000 compounds. The collection combines material from Stóra orðabókin um íslenska málnotkun (‘The Big Dictionary of Icelandic Usage’, Jón Hilmar Jónsson 2005) and the phraseological database of The Árni Magnússon Institute for Icelandic Studies. In later stages, semantic relationships have mainly been traced and analysed from the parallel conjunctions of words and phrases (connected with 'and') as they appear in the textual environment, where the source has been the huge text collection of the website Tímarit.is. All this material is linked to a lemma list that combines about 250 thousand single-word and multi-word lemmas.
The semantic relations in question are of various kinds. The clearest and closest relations constitute synonyms and antonyms but the synonym relations vary in closeness. The difference is partially identified by distinguishing between synonyms and near-synonyms. For estimating the relations, the emphasis is laid on the evidence of the material, where the goal is to obtain numeric evidence of semantic proximity and the semantic relatedness of the words compared. The analysis also returns semantically homologous vocabulary which is further sorted and placed under particular concepts and semantic fields.
The lemmas are semantically unambiguous which has a profound impact on the description of the semantic relations. To name an example, the arguments of verbs are taken to be a part of the lemma, and verbal combinations of various kinds have an independent status within the lemma list.
In most general dictionaries, individual lemmas appear as form-based units where the entry can be divided in different senses and numbered sub-divisions, as appropriate. In Icelandic Wordweb, however, the focus is on the lemma as a monosemous lexical unit. This widens the scope of the lemma list compared to traditional semasiological dictionaries and the lemma list depends on whether the potential lemma shows clear relations to other lemmas.
Multi-word lemmas are prominent in the lemma list of the Icelandic Wordweb. Their coordinated representation makes it possible to mark the lemma strings syntactically and by doing so obtain active interaction between syntactic and semantic classification.