LFSAG - Laboratorio di Fonetica Sperimentale 'Arturo Genre'

Human and Machine Dialect Identification from Natural Speech and Artificial Stimuli (HMDI)

[Last Update 04/04/2014]

"Task: Human and Machine Dialect Identification from Natural Speech and Artificial Stimuli (HMDI)" - Evaluation of NLP and Speech Tools for Italian. Pisa, 11 December 2014

The Event

EVALITA 2014 will be held on December 11 2014 in Pisa, as a Workshop of the XIII Symposium of the Italian Association for Artificial Intelligence (AI*IA 2014).

Results

Paper from the Proceedings of Workshop of the XIII Symposium of the Italian Association for Artificial Intelligence (AI*IA 2014).

Framework

EVALITA 2014 and AI*IA 2014 will be co-located with the first Italian Conference on Computational Linguistics (CLiC-it 2014), 9-10 December 2014

Tasks

The aim of the HMDI task is to test human and machine performances in detecting dialectal variation and identifying accents by acoustic cues characterising short samples of natural and artificial speech.

The EVALITA task "Human and Machine Dialect Identification from Natural Speech and Artificial Stimuli" is manyfold.

Tests for human listeners

Three tasks are intended to test human abilities (listening tests) and a fourth experimental setting is designed for automatic systems of language identification.

The execution of the listening tests should be preceeded by the installation of the PRAAT software and the creation of a HMDI folder under C:\ of the PC you are working on. Please put in this folder the HMDI.txt file and the wavefiles extracted from the HMDI.zip file.

HMDI is a task aiming at testing human abilities to identify languages from short speech samples. Download and run the pps file HMDI1-EVALITA2014_ENG.pps" before starting the listening test. Launch PRAAT, get the HMDI.txt file in the object list and run it*.

HMDI_DIA is a task mainly intended for listeners living in Italy and it is aimed at testing their abilities to identify dialectal varieties. Download and run the pps file HMDI_DIA.pps before starting the listening test. Launch PRAAT, get the HMDI_DIA.txt file in the object list and run it*.

HMDI_TON aims at testing the ability of Italian listeners in identifying dialectal varieties relying on just prosodic values extracted from real sentences. Download and run the pps file HMDI_TON.pps before starting the listening test. Launch PRAAT, get the HMDI_TON.txt file in the object list and run it*.

*Do not forget to "Extract Results" and save the ResultsMFC object as a textfile (with your initials) and send it to: antonio.romano[*]unito.it (Thank you for your help!)

Tests for LID/AID systems

The same samples in the folder C:\HMDI may be used for testing machine performances in identifying languages and dialects after a training of your LID/AID system on longer samples and/or samples uttered by other speakers. Download HMDI_TRAINING.zip and make a copy of the training wavefiles in a HMDI_TRAINING folder.

Research teams interested in testing their LID/AID system may also run it on telephonic or noisy samples available in the HMDI_NOISY.zip file.

Deadlines

Please report results to: antonio.romano[*]unito.it before the end of June

Guidelines and data-sets

Samples and guidelines are available here:

README_HMDI.txt
HMDI.zip (soundfiles for testing)
HMDI Presentation
HMDI.txt
HMDI_DIA Presentation (in Italian)
HMDI_DIA.txt
HMDI_TON Presentation (in Italian)
HMDI_TON.txt
HMDI_TRAINING.zip (soundfiles for training - machine only)
HMDI_NOISY.zip (soundfiles for testing - machine only)

For further information, please contact Antonio Romano (antonio.romano[*]unito.it)