List of word forms

The list contains all word forms in the DMII, without tags or lemmas. Each word form is a unique character string.

Note that the ambiguity of word forms is extensive.  There are 2.8 million unique word forms contained in the 5.8 million tagged inflectional forms in the DMII, 1.8 million of the word forms are unique inflectional forms and thus unambiguous, but 1 million word forms are ambiguous. (May 2013).

Tagged inflectional forms in the DMII 5,881,374  
Unambiguous word forms 1,850,090 31.5\%
Ambiguous word forms within 1 lemma 3,619,482 61.5\%
Ambiguous word forms in more than 1 lemma 63,641 1.1\%
Ambiguous word forms within and between lemmas 348,161 5.9\%


Ambiguity poses a substantial problem in many language technology projects. To name an example of an ambiguous word form, there are 30 tags for the word form minni, which appears in 4 paradigms, under the headwords minni (noun, neut.), lítill (adj.), minna (verb), and the possessive pronoun minn. Two examples of ambiguous word forms, with tags as used in Sigrún's format, are listed below for explanation.

Word form Inflectional form & tag Lemma
hana hana ÞFET hani kk 'cock' n.masc.
  hana ÞGFET hani kk
  hana EFET hani kk
  hana ÞFFT hani kk
  hana EFFT hani kk
  hana ÞFET hún pfn.kvk. ‘she’ pers.pron.
öndum öndum ÞGFFT andi, no.kk. ‘spirit’, n.masc.
  öndum ÞGFFT önd, no.kvk. ‘duck’, n.fem.
  öndum GM-FH-NT-1P-FT anda, so. ‘breathe’ v. active
  öndum GM-VH-NT-1P-FT anda, so.   active


KB 1.10.2013