Skip to content

GSM-7 as input #32

Open
Open
@willhardy

Description

@willhardy

Some languages can't be translated if the text entered is from an SMS (eg GSM-7 encoded), this problem is visible in the deep-L web interface (eg try to translate "KAΛHΣΠEPA!").

The problem is that to save space, SMS encoding reuses the latin characters where it can (eg KAHEP) and mixes in greek letters where necessary (ΛΣΠ)

If you know the language the text is in, the solution is straightforward. For each language, convert the relevant latin characters to their native counterparts in unicode. When you do this, the translation works, eg translate ΚΑΛΗΣΠΕΡΑ!. For Greek, this might be:

original_greek_sms = "KAΛHΣΠEPA!"
latin = "EPTYIOAHKZXBNM"
greek = "ΕΡΤΥΙΟΑΗΚΖΧΒΝΜ"

table = str.maketrans(latin, greek)
original_greek_unicode = original_greek_sms.translate(table)
print(original_greek_unicode)  # ΚΑΛΗΣΠΕΡΑ!

Not all languages need this, but every language that has latin-like characters that need separate unicode symbols would need this conversion. Until then, text that comes from an SMS in those languages will not work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api changeRequires changes to the DeepL APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions