IceNLP
IceNLP is open source software for analyzing and processing Icelandic texts. The software is implemented in Java and consists of the following components: tokeniser, unknown word guesser, part-of-speech tagger, lemmatiser, parser and named-entity recogniser.
The software was originally developed as a part of Hrafn Loftsson's Ph.D. study during the years 2004-2007. Since then, students at the University of Reykjavík and the University of Iceland have helped in developing individual components.
About IceNLP
IceNLP can be used for various tasks, such as breaking up text into individual tokens, tagging each token with its morphosyntactic tag, finding the lemma of a particular word and returning a shallow phrase structure and labels indicating syntactic functions.
Individual components of IceNLP can be run independently or the JAVA clusters in question connected directly to software that is being developed.
Contact
- Hrafn Loftsson, Ph.D.
- Associate Professor
- Reykjavík University, School of Computer Science
- Menntavegi 1, 105 Reykjavík
- Work phone: +354-5996227
- E-mail: hrafn@ru.is
- Web Page: http://www.ru.is/kennarar/hrafn/
References
- Ingason, Anton K., Sigrún Helgadóttir, Hrafn Loftsson and Eiríkur Rögnvaldsson. 2008. A Mixed Method Lemmatization Algorithm Using Hierachy of Linguistic Identities (HOLI). In B. Nordström and A. Ranta (eds.), Advances in Natural Language Processing, 6th International Conference on NLP, GoTAL 2008, Proceedings. Gothenburg, Sweden.
- Loftsson, Hrafn. 2008. Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics, 31(1), 47-72.
- Loftsson, Hrafn, Sigrún Helgadóttir and Eiríkur Rögnvaldsson. 2011. Using a morphological database to increase the accuracy in PoS tagging. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2011). Hissar, Bulgaria.
- Loftsson, Hrafn, Ida Kramarczyk, Sigrún Helgadóttir and Eiríkur Rögnvaldsson. 2009. Improving the PoS tagging accuracy of Icelandic text. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA-2009). Odense, Denmark.
- Loftsson, Hrafn, and Eiríkur Rögnvaldsson. 2008. Linguistic richness and technical aspects of an incremental finite-state parser. In Proceedings of "Partial Parsing 2008", workshop at the 6th International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco.
- Loftsson, Hrafn and Eiríkur Rögnvaldsson. 2007. IceNLP: A Natural Language Processing Toolkit for Icelandic. In Proceedings of InterSpeech 2007, Special session: "Speech and language technology for less-resourced languages". Antwerp, Belgium.
- Loftsson, Hrafn, and Eiríkur Rögnvaldsson. 2007. IceParser: An Incremental Finite-State Parser for Icelandic. In J. Nivre, H-J. Kaalep, K. Muischnek and M. Koit (eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA-2007). Tartu, Estonia.
- Loftsson, Hrafn. 2007. Tagging Icelandic Text using a Linguistic and a Statistical Tagger. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the ACL. Rochester, NY, USA.
- Loftsson, Hrafn. 2006. Tagging a morphologically complex language using heuristics. In T. Salakoski, F. Ginter, S. Pyysalo and T. Pahikkala (eds.), Advances in Natural Language Processing, 5th International Conference on NLP, FinTAL 2006, Proceedings. Turku, Finland.
- Loftsson, Hrafn. 2006. Tagging Icelandic text: An experiment with integrations and combinations of taggers. Language Resources and Evaluation, 40(2):175-181.