Skip to content

Fixes #22598: Add Russian Language Support for Elasticsearch Search #22599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dimstunt
Copy link
Contributor

Describe your changes:

Fixes #22598

Added comprehensive Russian language support for Elasticsearch search with enhanced analyzers and complete UI localization.

What changes did I make?

  • Created 43 Russian-specific Elasticsearch mapping files in openmetadata-spec/src/main/resources/elasticsearch/ru/ directory
  • Implemented advanced Russian text analyzers with russian_stop, russian_snowball, and icu_folding filters for better Cyrillic text processing
  • Added Russian language support to IndexMappingVersionTracker.java (line 116) in supported languages array
  • Updated configuration system to support ELASTICSEARCH_INDEX_MAPPING_LANG=RU environment variable in openmetadata.yaml (line 313)
  • Integrated Russian localization in UI system via i18nextUtil.ts (line 54) and LocalUtil.interface.ts (line 24)

Why did I make them?
Russian-speaking users were experiencing poor search quality because Russian text was being processed with generic English analyzers. This resulted in:

  • Improper tokenization of Cyrillic characters
  • Lack of Russian morphological analysis and stemming
  • Poor relevance for Russian search queries
  • Missing localization for Elasticsearch-related UI elements

The new Russian implementation actually provides more advanced text analysis than the English baseline, with enhanced Unicode support and multi-language stop word filtering.

How did I test the changes?

  • Set up local Elasticsearch instance and manually tested Russian mapping configurations
  • Created test indices with Russian analyzers and verified proper tokenization of Cyrillic text
  • Manually applied Russian mapping files to running Elasticsearch cluster and tested search functionality
  • Tested Russian morphological analysis by indexing sample Russian text and verifying stemming works correctly
  • Validated that russian_stop, russian_snowball, and icu_folding filters process Russian queries as expected
  • Confirmed search quality improvements for Russian text compared to using English analyzer

Type of change:

  • New feature

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion or decision-making process is reflected in the issue.
  • I have updated the documentation.
  • I have added tests around the new logic.

…pings

- Add 44 Russian (ru) Elasticsearch mapping files for all entity types
- Update IndexMappingVersionTracker to handle language-specific mappings
- Update generated UI configuration files
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label Jul 27, 2025
Copy link
Contributor

github-actions bot commented Jul 27, 2025

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 63%
63.76% (48528/76110) 39.85% (20355/51076) 43.63% (5846/13399)

Copy link

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Russian Language Support for Elasticsearch Search
2 participants