Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

pr0mila · 2025-05-16T07:45:04Z

Description:

We propose adding the MediBeng dataset to the repository. MediBeng is a synthetic, code-switched dataset in Bengali-English, designed specifically for ASR, TTS, and Machine Translation tasks in healthcare. It focuses on bilingual code-switching, which is common in healthcare settings, and is freely available for use.

Key Features:

Language: Bengali and English (Code-Switched)
Primary Use Cases: ASR, TTS, Machine Translation for Healthcare
Free to Use

Links:

This dataset can contribute significantly to improving models for bilingual speech recognition and language processing in healthcare contexts.

Request: Please review and add MediBeng to the dataset collection for use by researchers and developers working on multilingual and healthcare-specific models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

pr0mila commented May 16, 2025

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

Comments

pr0mila commented May 16, 2025