Skip to content

Add Chinese support and AI translation script #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

connermo
Copy link

@connermo connermo commented Jun 6, 2025

I've added Chinese translation support to Futurecoder and created a translation script that uses AI models to automate the translation process.

The translation script (translate_futurecoder.py) is designed to handle the translation workflow efficiently. Here's a typical usage scenario:

  1. First, ensure you have the English PO file (english.po) in your translations directory
  2. Run the translation script:
# Basic command to translate to Chinese
python translate_futurecoder.py -l zh -k "your-openapi-key" -m "gpt-4-mini" --base-url "https://<API_BASE_URL>/v1/chat/completions"

# The script will:
# - Split english.po into manageable chunks
# - Translate each chunk using the specified model
# - Merge translation chunks to a single PO file
# - Generate the require .mo files and place it in the correct locales directory

The script is designed to be easily extended for other languages.

Would appreciate your review and feedback. Thanks very much!

- Refactor translation system for better language support

- Add dynamic language support through add_language method

- Update translation rules to be more teaching-focused

- Add Chinese translation support

- Update test files and compiled translations
@alexmojaki alexmojaki requested a review from Copilot June 8, 2025 21:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

@alexmojaki
Copy link
Owner

Thank you so much for this! I haven't had much time to look but so far it seems amazing. Here's a preview deploy: https://futurecoder-io--zh-xd7pqtbh.web.app/course/

If you haven't already, please check that the output of CHECK_INLINE_CODES=1 FUTURECODER_LANGUAGES=zh ./scripts/generate.sh seems OK. I can see for example that the first warning is a false positive.

What have you done to proofread and verify the translation?

Please also share a .po file so that I can upload it to https://poeditor.com/projects/view?id=490053

@connermo
Copy link
Author

Thank you so much for this! I haven't had much time to look, but so far it seems amazing. Here's a preview deploy: https://futurecoder-io--zh-xd7pqtbh.web.app/course/

If you haven't already, please check that the output of CHECK_INLINE_CODES=1 FUTURECODER_LANGUAGES=zh ./scripts/generate.sh seems OK. I can see for example that the first warning is a false positive.

What have you done to proofread and verify the translation?

Please also share a .po file so that I can upload it to https://poeditor.com/projects/view?id=490053

Yes, you are right. I didn't add CHECK_INLINE_CODES=1. There are several strings that should not be translated. I've fixed them manually. Please have a look at the updated po file. I went through the whole course quickly. The translations looked fine.

The AI code is untouched since I found it is correct most of the time (GPT-4o-mini). I've also tried other models like Claude 4, but it produced more errors. I think it is because of its worse instruction-following capability. Please try different languages if you have time. Thanks very much.

zh.po.zip

@alexmojaki
Copy link
Owner

Thanks for the PO file. These are the variants of 'Chinese' available on POEditor. Which do you think would be most appropriate?

Screenshot 2025-06-15 at 21 07 28

I've uploaded the file to zh-CN for now but I can easily change it. There's an existing human translation under zh-Hans that's 46% done and has barely been touched recently, it probably won't get finished manually.

What would be an appropriate URL? https://zh.futurecoder.io/ ?

@alexmojaki
Copy link
Owner

There are several strings that should not be translated. I've fixed them manually.

I don't know about the others, but I think you misunderstood me when I said:

I can see for example that the first warning is a false positive.

Specifically this output:

Inline codes don't match auto-translation in pages.FunctionsAndMethodsForLists.steps.subscript_assignment_predict.text
original: {'some_list', 'append', 'index', 'nums'}
expected (auto-translated): ['append', 'index', 'nums', 'some_list']
actual: ['append', 'index', 'nums', 'some_list', '下标赋值']
expected - actual: set()
actual - expected: {'下标赋值'}

was a warning that could be ignored. You've changed it so it now says "subscript assignment:..." which doesn't mean anything to a non-English speaker. Perhaps formatting it as code is confusing, I just did that for some consistency.

@alexmojaki
Copy link
Owner

I went through the whole course quickly. The translations looked fine.

Translating futurecoder correctly is tricky because of the way the terms work. Humans make plenty of mistakes when they do it. I always ask translators to proofread the translated course at the end, going through the whole course like an actual student, checking all the hints and solutions, making sure the instructions actually match the automated checks. Seeing the text properly rendered in context is very different from seeing the terms in isolation.

Here's an example:

Screenshot 2025-06-18 at 21 46 11

Imagine what it's like for a user to get stuck on this step, perfectly printing the text in the instructions and wondering why the checker isn't accepting it. Then they look at the solution and see something completely different.

It'd be great to have more automated checks that can capture this kind of problem, e.g. asserting that the output of the solution appears in the instructions. But only so much is possible.

There's a complete Polish translation at https://pl.futurecoder.io/course/ . But it hasn't been proofread, so it's not fully ready, which is why it isn't publicly linked from the other sites. I can do the same for this translation, e.g. deploy it to https://zh.futurecoder.io/ soon, and add links to it once it's been proofread.

The AI code is untouched since I found it is correct most of the time

The "most of the time" makes me even more worried. AI makes weird mistakes.

The good news is that I would accept proofreading automated by AI. If for each step, an AI was shown:

  • The text of the whole page until that step, as seen by the user, i.e. with things like __code__ replaced but still in markdown
  • All the hints and the solution
  • The output of running the solution
  • All the above in both English and the translated language, to compare

and the AI confirmed that the translation was accurate, I would trust that.

Please try different languages if you have time

I wouldn't be able to check if the output was accurate. But if the proofreading was automated like above, it'd be OK. Then I could translate futurecoder into every language that AI knows well enough, which would be very exciting.

@oskarissimus
Copy link

Proofreading with AI seems like very interesting idea! I’m thinking about playwright to go through all the pages, and using agent to verify translations, next week I think I will be able to implement that :)

@connermo
Copy link
Author

I've uploaded the file to zh-CN for now but I can easily change it. There's an existing human translation under zh-Hans that's 46% done and has barely been touched recently, it probably won't get finished manually.

What would be an appropriate URL? https://zh.futurecoder.io/ ?

Yes, "zh" is fine. zh-CN is the one submitted.

actual - expected: {'下标赋值'}


was a warning that could be ignored. You've changed it so it now says "subscript assignment:..." which doesn't mean anything to a non-English speaker. Perhaps formatting it as code is confusing, I just did that for some consistency.

I see. I've translated it to the more common term "索引赋值".

The "most of the time" makes me even more worried. AI makes weird mistakes.

The good news is that I would accept proofreading automated by AI. If for each step, an AI was shown:

  • The text of the whole page until that step, as seen by the user, i.e. with things like __code__ replaced but still in markdown
  • All the hints and the solution
  • The output of running the solution
  • All the above in both English and the translated language, to compare

and the AI confirmed that the translation was accurate, I would trust that.

Please try different languages if you have time

I wouldn't be able to check if the output was accurate. But if the proofreading was automated like above, it'd be OK. Then I could translate futurecoder into every language that AI knows well enough, which would be very exciting.

100% agree!

I've carefully reviewed all courses again and corrected some mistranslations. Attached is the updated version. Please take a look.
zh.po.zip

@connermo
Copy link
Author

Proofreading with AI seems like very interesting idea! I’m thinking about playwright to go through all the pages, and using agent to verify translations, next week I think I will be able to implement that :)

That sounds great! I'd like to help, too. Please let me know if you need any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants