Skip to content

Vendor wcwidth using python's unicodedata #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

deathaxe
Copy link
Contributor

@deathaxe deathaxe commented Apr 25, 2025

This commit replaces wcwidth dependency by a simple vendored module, leveraging python's built-in unicodedata.

Notes:

  1. wcwidth() function, provided by wcwidth library, is already decorated with lru_cache(100). Hence following line wraps lru-cached function into another duplicated lru-cache layer, which may cause significant overhead.

     wcwidth: Callable[[str], int] = lru_cache(maxsize=4096)(_wcwidth)
    
  2. performance of vendored wcwidth() function is more or less equal to that provided by wcwidth package.

  3. this change turns pyte into a self-contained library.

  4. only possible downside is supported unicode version being bound/limited to that of used python interpreter. But that's probably rather minor as the interpreter wouldn't be able to decode more recent unicode chars anyway.

Benchmarks:

>>> from timeit import timeit
>>> from wcwidth import wcswidth as wcswidth1
>>> from pyte.wcwidth import wcswidth as wcswidth2
>>> s = "开源的计算机代数系统 Maxima 是用于操纵符号和数值表达式的系统"
>>> timeit(lambda: wcswidth1(s))
7.851543699999999
>>> timeit(lambda: wcswidth2(s))
3.857342599999999

Credits:

The implementation is borrowed from pytest and slightly tweaked.

@deathaxe deathaxe force-pushed the feat/vendor-wcwidth branch from 93ee5c1 to 78b998e Compare April 25, 2025 09:37
@superbobry
Copy link
Collaborator

I suspect you might need to keep the MIT license for that file, since the implementation follows the one in pytest.

@deathaxe deathaxe force-pushed the feat/vendor-wcwidth branch 2 times, most recently from bed5f46 to 21b28a9 Compare April 25, 2025 21:05
@superbobry
Copy link
Collaborator

Can you rebase, please?

This commit replaces wcwidth dependency by a simple vendored module, leveraging
python's built-in unicodedata.

Notes:

1. `wcwidth()` function, provided by wcwidth library, is already decorated
   with `lru_cache(100)`. Hence following line wraps lru-cached function into
   another duplicated lru-cache layer, which may cause significant overhead.

        wcwidth: Callable[[str], int] = lru_cache(maxsize=4096)(_wcwidth)

2. performance of vendored `wcwidth()` function is more or less equal to that
   provided by `wcwidth` package.

3. this change turns pyte into a self-contained library.

4. only possible downside is supported unicode version being bound/limited to
   that of used python interpreter. But that's probably rather minor as the
   interpreter wouldn't be able to decode more recent unicode chars anyway.

Benchmarks:

    >>> from timeit import timeit
    >>> from wcwidth import wcswidth as wcswidth1
    >>> from pyte.wcwidth import wcswidth as wcswidth2
    >>> s = "开源的计算机代数系统 Maxima 是用于操纵符号和数值表达式的系统"
    >>> timeit(lambda: wcswidth1(s))
    7.851543699999999
    >>> timeit(lambda: wcswidth2(s))
    3.857342599999999

Credits:

The implementation is borrowed from pytest and slightly tweaked.
@deathaxe deathaxe force-pushed the feat/vendor-wcwidth branch from 21b28a9 to f959f8c Compare April 28, 2025 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants