The final milestone of the language technology program for Icelandic 20218-2022 was reached at the beginning of October. All the data and tools that were created in connection with the language technology program can now be found in the CLARIN-IS repository. A good overview of everything that can be found in the repository area is available here.
At the end of last month, the eighth and penultimate milestone of the language technology program was completed. A total of 18 new entries were added in May and June to the CLARIN-IS repository (http://repository.clarin.is). As an example, Miðeind sent in the data that forms the basis of the website Yfirlestur.is, where it is possible to have an Icelandic text reviewed and corrected. TÍRÓ has created a web portal for speech synthesizers (https://tts.tiro.is) and the source code can be found in the CLARIN-IS repository. Reykjavík University used the web portal to develop WebRice, a web reader that can be added to web pages so that users can select text and listen to it instead of reading it.
A number of resources have recently been added to the CLARIN-IS repository which now includes around 150 entries.
Below are some recent posts that give a good example of the diverse data and tools that can be found in the repository:
The first year of the Icelandic National Language Technology Programme ended on September 30th. Several deliverables of the programme have recently been submitted to the CLARIN-IS repository - language descriptions, language resources, and tools. The repository now contains 80 items, most of them deliverables of the Programme, but also resources and tools that have been developed by LT students at the University of Iceland and Reykjavik University.
Two research groups from the Icelandic CLARIN consortium presented their papers at the virtual ParlaCLARIN workshop today. Steinþór Steingrímsson, Starkaður Barkarson and Gunnar Thor Örnólfsson had a paper entitled "IGC-Parl: Icelandic Corpus of Parliamentary Proceedings", and Kristján Rúnarsson and Einar Freyr Sigurðsson had a paper entitled "Parsing Icelandic Alþingi Transcripts: Parliamentary Speeches as a Genre". Both papers have been published in the Proceedings which are available online.
A number of resources have recently been added to the CLARIN-IS repository:
All of these resources are available for download under the CC BY 4.0 License except the IceNeuralParsingPipeline which is available under the MIT License.
The deadline for submitting extended abstracts for the CLARIN ERIC annual conference which will be held in Madrid, Spain, October 5-7, 2020, has been extended. The main conference topic this year is "Language resources, tools and services for interdisciplinary research" but submissions on various other topics are also accepted. Submissions must be made no later than April 28 through EasyChair. See the full call for abstracts on the CLARIN ERIC website.
It is customary to interview a number of people at CLARIN Annual conferences - invited speakers, prize winners, newcomers, and others. At the Annual Conference in Leipzig last fall, Eiríkur Rögnvaldsson, National Coordinator for Iceland, was interviewed on the occasion of Iceland becoming an observer (now a full member) in CLARIN ERIC. The interview has now been published on the CLARIN ERIC YouTube channel.
Iceland's application for full membership in CLARIN ERIC, which was submitted by the Ministry of Education, Science and Culture on February 5, has now been electronically approved by the CLARIN ERIC General Assembly. Iceland is thus a full member of CLARIN ERIC from February 1, 2020. Iceland has had ties with CLARIN for a decade but can now finally participate fully in its activities. The CLARIN Centre at the Árni Magnússon Institute is being established and is already accepting data submissions, among them deliverables of the National Language Technology Programme. These resources are accessible in the centre's repository and their metadata are harvested by the CLARIN Virtual Language Observatory. Thus, anyone connected to CLARIN can find information about the resources and access them subject to license conditions.
The Icelandic CLARIN office has now moved to Þingholtsstræti 29, where the Árni Magnússon Institute's Language Technology Group is based. The office has a new phone number - +354-525-4037.
All talks given at the CLARIN Annual Conference in Leipzig last fall were recorded and are now available on the Internet. Among them is a talk by Lilja Björk Stefánsdóttir and Anton Karl Ingason, Lifespan Change and Style Shift in the Icelandic Gigaword Corpus.
A new resource has been added to the CLARIN-IS repository - The Icelandic Confusion Set Corpus (ICoSC). It was compiled by Steinunn Rut Friðriksdóttir and Anton Karl Ingason of the language technology department in the University of Iceland. The ICoSC contains a great number of homphonous pairs separated orthographically by a single letter and commonly confused by Icelandic speakers. It is available under a CC BY 4.0 license.
Last summer, the Icelandic Parliament (Alþingi) passed new laws on European Research Infrastructure Consortia (ERICs) which made it possible for Iceland to become full member of ERICs. The Minister of Education, Science and Culture has now decided that Iceland will apply for full membership in CLARIN ERIC. A formal request for membership will be sent in the near future.
According to a contract between Almannarómur, Centre for Language Technology, and SÍM, Consortium for Language Technology, all deliverables of the Language Technology Program will be stored in the CLARIN-IS repository where they will be accessible to all, through the Virtual Language Observatory and search engines. Today, the first deliverable of the Program was uploaded to the repository. It is a tokenizer developed by Miðeind Ltd. More deliverables, both resources and tools, are forthcoming in the near future.
CLARIN ERIC calls for the submission of extended abstracts for the annual conference which will be held in Madrid, Spain, October 5-7, 2020. The main conference topic this year is "Language resources, tools and services for interdisciplinary research" but submissions on various other topics are also accepted. Submissions must be made no later than April 14 through EasyChair. See the full call for abstracts on the CLARIN ERIC website.
The November issue of the CLARIN Newsflash is out, including a short report from the launching event of the National Language Technology Project.
CLARIN-IS has now been registered as a C-centre which means that metadata can be harvested from it. Thus, searches in the Virtual Language Observatory now return metadata from CLARIN-IS. A link to the resources in question accompanies the metadata.
CLARIN-IS was one of the organizers of a conference celebrating the launch of the National Language Technology Program on October 16. The President of Iceland and the Minister of Education, Science and Culture addressed the event and a number of people from academia, IT companies, mass media and the finance sector gave short talks. Bente Maegaard, who is a member of the Program‘s Expert Panel along with Kadri Vider and Steven Krauwer, gave an introductory talk on CLARIN. Around 120 people attended the conference.
The annual CLARIN conference was held in Leipzig, Germany, from September 30 - October 2. Seven participants from Iceland attended the conference. Three of them presented regular papers and one student participated in the student poster slam session.
We are in the process of building a CLARIN Centre and have already applied for recognition as a C Centre, with the aim of becoming a B Centre in the near future. Samúel, our technician, has been delving into all the details of the CLARIN technical infrastructure since he started his work in CLARIN-IS in April. It is a tremendous task to catch up with countries which have been members of CLARIN for several years building up knowledge and skills step by step, but we hope to have gained a solid ground by the end of the year.
On September 4, a contract was signed between Almannarómur, a self-owned organization which has been commissioned to implement the five-year Language Technology Project Plan which the Government has decided to fund, and the SÍM Consortium which consists of nine members – three academic institutions, the State Broadcasting Service, the Association of the Visually Impaired, and four private IT companies. According to this contract, the SÍM Consortium will carry out the research and development work necessary for this project. Within the project, a great number of resources and tools will be developed, which will be described according to standards adopted within CLARIN and will be stored and distributed by the CLARIN-IS Centre which is explicitly mentioned in the contract.
The Icelandic Parliament (Alþingi) has passed a new law on research infrastructures which enables Iceland to apply for full membership in CLARIN ERIC. The National Coordinator has written to the Ministry suggesting that an application for full membership be submitted.
Samúel Þórisson started working as a programmer for CLARIN-IS today. In the beginning, his main task will be to get acquainted with the CLARIN technology and then to start building a CLARIN B-centre at the Árni Magnússon Institute.
The CLARIN-IS website has now been officially launched. All pages are available in both Icelandic and English. The site is still under construction and more material will be added soon. Trausti Dagsson, project manager at the Árni Magnússon Institute, designed the web which adheres for the most part to the CLARIN style sheet.
An Icelandic national CLARIN consortium is currently being set up. The National Coordinator has had meetings with eight prospective participants and presented a draft of a memorandum on consortium partnership. Five institutions have already confirmed that they will sign the memorandum and the remaining three are positive.
CLARIN ERIC calls for the submission of extended abstracts for the annual conference which will be held in Leipzig, Germany, from September 30 to October 2, 2019. The main conference topic this year is "Humanities and Social Science research enabled by language resources and technology" but submissions on various other topics are also accepted. Submissions must be made no later than April 15 through EasyChair. See the full call for abstracts on the CLARIN ERIC website.