Skip to content

Add a glossary page on UTF-16 #40007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

wbamberg
Copy link
Collaborator

@wbamberg wbamberg commented Jun 20, 2025

I didn't add/update links from the JS docs because they also have https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters, which they point to sometimes, and I didn't know if they want to point there instead.

@github-actions github-actions bot added Content:HTML Hypertext Markup Language docs Content:WebAPI Web API docs Content:Glossary Glossary entries size/m [PR only] 51-500 LoC changed labels Jun 20, 2025
Copy link
Contributor

Preview URLs (31 pages)
External URLs (2)

URL: /en-US/docs/Glossary/UTF-16
Title: UTF-16

@wbamberg wbamberg marked this pull request as ready for review June 20, 2025 22:23
@wbamberg wbamberg requested review from a team as code owners June 20, 2025 22:23
@wbamberg wbamberg requested review from chrisdavidmills and sideshowbarker and removed request for a team June 20, 2025 22:23
Comment on lines +19 to +39
## UTF-16 in JavaScript

Strings in JavaScript are represented using UTF-16, and many {{jsxref("String")}} APIs operate on code units, not code points. For example, {{jsxref("String.length")}} returns `2` for a string containing a single Unicode character which is not in the BMP:

```js
const string = "🦊"; // U+1F98A
console.log(string.length); // 2
```

The {{jsxref("String.charCodeAt()")}} method returns the code unit at the given index, and the {{jsxref("String.codePointAt()")}} method returns the code point at the given index:

```js
const string = "🦊"; // U+1F98A

console.log(string.charCodeAt(0).toString(16)); // d83e
console.log(string.charCodeAt(1).toString(16)); // dd8a

console.log(string.codePointAt(0).toString(16)); // 1f98a
```

See [UTF-16 characters, Unicode code points, and grapheme clusters](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters) to learn more about working with UTF-16 strings in JavaScript.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can reduce this whole section to the last paragraph :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe? It's pretty short, and I think it is worth introducing this here before sending someone straight to the JS docs. The section in the JS docs also doesn't mention charCodeAt or codePointAt, which to me are quite interesting APIs for someone trying to understand this.

And equally I could say that https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters includes a lot of generic stuff about UTF-16 that doesn't really belong in the JS docs, specifically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And equally I could say that https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters includes a lot of generic stuff about UTF-16 that doesn't really belong in the JS docs, specifically.

Well that I could agree. When I wrote all this, there wasn't a lot of shared content discussing these underlying mechanisms, and I was trying to make JS docs self-contained so readers don't have to go to other areas. I'm okay to move/merge some of this to the glossary and update the links, if you think that's worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content:Glossary Glossary entries Content:HTML Hypertext Markup Language docs Content:WebAPI Web API docs size/m [PR only] 51-500 LoC changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URLSearchParams sort is by code units, not code points
2 participants