The RÚV Corpus is an Icelandic speech corpus based on a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10M/10F).
The RÚV Corpus is an Icelandic speech corpus based on a a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10m/10F).
The corpus contains read news items that includes a large vocabulary. No two speakers read the same text.
| SpeakerID | Gender | Files (.wav) |
| f1 | F | 1-20 |
| f2 | F | 21-38 |
| f3 | F | 39-58 |
| m1 | M | 59-78 |
| m2 | M | 79-98 |
| m3 | M | 99-118 |
| m4 | M | 119-138 |
| m5 | M | 139-158 |
| m6 | M | 159-178 |
| m7 | M | 179-198 |
| m8 | M | 199-218 |
| f4 | F | 219-240 |
| f5 | F | 241-260 |
| f6 | F | 261-280 |
| f7 | F | 281-300 |
| f8 | F | 301-320 |
| f9 | M | 321-340 |
| m9 | M | 341-360 |
| m10 | M | 361-380 |
| f10 | F | 381-400 |
None of the speakers in the RÚV corpus participated in the Jensson corpus or the Thor corpus.
The files "The_Broadcast_News_RUV-1_Corpus/*.wav" are segmented wave files.
The file "The_Broadcast_News_RUV-1_Corpus/transcription.rtf" contains a transcript of the spoken utterances in Icelandic.
Arnar Þór Jensson
e-mail: arnarjensson@gmail.com