The RÚV Corpus

The RÚV Corpus is an Icelandic speech corpus based on a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10M/10F).

About the RÚV Corpus

The RÚV Corpus is an Icelandic speech corpus based on a a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10m/10F).

The corpus contains read news items that includes a large vocabulary. No two speakers read the same text.

1. Speaker information

SpeakerID Gender Files (.wav)
f1 F 1-20
f2 F 21-38
f3 F 39-58
m1 M 59-78
m2 M 79-98
m3 M 99-118
m4 M 119-138
m5 M 139-158
m6 M 159-178
m7 M 179-198
m8 M 199-218
f4 F 219-240
f5 F 241-260
f6 F 261-280
f7 F 281-300
f8 F 301-320
f9 M 321-340
m9 M 341-360
m10 M 361-380
f10 F 381-400

None of the speakers in the RÚV corpus participated in the Jensson corpus or the Thor corpus.

2. Data structure

The files "The_Broadcast_News_RUV-1_Corpus/*.wav" are segmented wave files.
The file "The_Broadcast_News_RUV-1_Corpus/transcription.rtf" contains a transcript of the spoken utterances in Icelandic.


Arnar Þór Jensson