The RÚV Corpus is an Icelandic speech corpus based on a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10M/10F).
The RÚV Corpus is an Icelandic speech corpus based on a a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10m/10F).
The corpus contains read news items that includes a large vocabulary. No two speakers read the same text.
SpeakerID | Gender | Files (.wav) |
f1 | F | 1-20 |
f2 | F | 21-38 |
f3 | F | 39-58 |
m1 | M | 59-78 |
m2 | M | 79-98 |
m3 | M | 99-118 |
m4 | M | 119-138 |
m5 | M | 139-158 |
m6 | M | 159-178 |
m7 | M | 179-198 |
m8 | M | 199-218 |
f4 | F | 219-240 |
f5 | F | 241-260 |
f6 | F | 261-280 |
f7 | F | 281-300 |
f8 | F | 301-320 |
f9 | M | 321-340 |
m9 | M | 341-360 |
m10 | M | 361-380 |
f10 | F | 381-400 |
None of the speakers in the RÚV corpus participated in the Jensson corpus or the Thor corpus.
The files "The_Broadcast_News_RUV-1_Corpus/*.wav" are segmented wave files.
The file "The_Broadcast_News_RUV-1_Corpus/transcription.rtf" contains a transcript of the spoken utterances in Icelandic.
Arnar Þór Jensson
e-mail: arnarjensson@gmail.com