Skip to content

Add MediBeng: Synthetic Bengali-English Code-Switched Healthcare Dataset #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pr0mila opened this issue May 16, 2025 · 0 comments
Open

Comments

@pr0mila
Copy link

pr0mila commented May 16, 2025

Description:

We propose adding the MediBeng dataset to the repository. MediBeng is a synthetic, code-switched dataset in Bengali-English, designed specifically for ASR, TTS, and Machine Translation tasks in healthcare. It focuses on bilingual code-switching, which is common in healthcare settings, and is freely available for use.

Key Features:

  • Language: Bengali and English (Code-Switched)
  • Primary Use Cases: ASR, TTS, Machine Translation for Healthcare
  • Free to Use

Links:

This dataset can contribute significantly to improving models for bilingual speech recognition and language processing in healthcare contexts.

Request: Please review and add MediBeng to the dataset collection for use by researchers and developers working on multilingual and healthcare-specific models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant