EMMA JSALT25 Benchmark – Multi-Talker ASR Evaluation
Welcome to the official leaderboard for benchmarking multi-talker ASR systems, hosted by the EMMA JSALT25 team.
Leaderboard
⚠️ Status: Under Maintenance
The leaderboard is currently being updated. New results will appear soon.
Below you’ll find the latest results submitted to the benchmark. Models are evaluated using meeteval with TCP-WER [%] (collar=5s).
For AISHELL-4 and AliMeeting conversion to simplified Mandarin is applied, and tcpCER [%] is used.
Leaderboard
Alexander Polok | 17.53 | 15.87 | 6.91 | 49.11 | 39.46 | 17.01 | 2.04 | 4.34 | 17.19 | 10.99 |
Submit Your Model
⚠️ Status: Under Maintenance
The leaderboard is currently being updated. Submit will be reactivated soon.
To submit your MT-ASR hypothesis to the benchmark, complete the form below:
- Submitted by: Your name or team identifier.
- Model ID: A unique identifier for your submission (used to track models on the leaderboard).
- Hypothesis File: Upload a SegLST
.jsonfile that includes all segments across datasets in a single list. - Task: Choose the evaluation task (e.g., single-channel ground-truth diarization).
- Datasets: Select one or more datasets you wish to evaluate on.
📩 To enable submission, please email the EMMA team to receive a submission token.
After clicking Submit, your model will be evaluated and results displayed in the leaderboard.
Reference/Hypothesis File Format
🛠️ Reference annotations were constructed via the prepare_gt.sh script. To add a new dataset, please create a pull request modifying prepare_gt.sh.
📚 For details about SegLST format, please see the SegLST documentation in MeetEval.
🔄 By default, Chime-8 normalization is applied during evaluation for both references and hypotheses.
You can choose to disable this using the checkbox above.