The RÚV Corpus

The RÚV Corpus is an Icelandic speech corpus based on a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10M/10F).

About the RÚV Corpus

The RÚV Corpus is an Icelandic speech corpus based on a a read bi-phonetically balanced text. It is 46 minutes in length with 400 utterances (wav 44.1khz 16 bit) from 20 speakers (10m/10F).

The corpus contains read news items that includes a large vocabulary. No two speakers read the same text.

1. Speaker information

SpeakerID	Gender	Files (.wav)
f1	F	1-20
f2	F	21-38
f3	F	39-58
m1	M	59-78
m2	M	79-98
m3	M	99-118
m4	M	119-138
m5	M	139-158
m6	M	159-178
m7	M	179-198
m8	M	199-218
f4	F	219-240
f5	F	241-260
f6	F	261-280
f7	F	281-300
f8	F	301-320
f9	M	321-340
m9	M	341-360
m10	M	361-380
f10	F	381-400

None of the speakers in the RÚV corpus participated in the Jensson corpus or the Thor corpus.

2. Data structure

The files "The_Broadcast_News_RUV-1_Corpus/*.wav" are segmented wave files.
The file "The_Broadcast_News_RUV-1_Corpus/transcription.rtf" contains a transcript of the spoken utterances in Icelandic.

Contact

Arnar Þór Jensson
e-mail: arnarjensson@gmail.com

CLARIN á Íslandi

The RÚV Corpus

About the RÚV Corpus

1. Speaker information

2. Data structure

Contact