-
Notifications
You must be signed in to change notification settings - Fork 22.7k
Add a glossary page on UTF-16 #40007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
## UTF-16 in JavaScript | ||
|
||
Strings in JavaScript are represented using UTF-16, and many {{jsxref("String")}} APIs operate on code units, not code points. For example, {{jsxref("String.length")}} returns `2` for a string containing a single Unicode character which is not in the BMP: | ||
|
||
```js | ||
const string = "🦊"; // U+1F98A | ||
console.log(string.length); // 2 | ||
``` | ||
|
||
The {{jsxref("String.charCodeAt()")}} method returns the code unit at the given index, and the {{jsxref("String.codePointAt()")}} method returns the code point at the given index: | ||
|
||
```js | ||
const string = "🦊"; // U+1F98A | ||
|
||
console.log(string.charCodeAt(0).toString(16)); // d83e | ||
console.log(string.charCodeAt(1).toString(16)); // dd8a | ||
|
||
console.log(string.codePointAt(0).toString(16)); // 1f98a | ||
``` | ||
|
||
See [UTF-16 characters, Unicode code points, and grapheme clusters](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters) to learn more about working with UTF-16 strings in JavaScript. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can reduce this whole section to the last paragraph :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, maybe? It's pretty short, and I think it is worth introducing this here before sending someone straight to the JS docs. The section in the JS docs also doesn't mention charCodeAt
or codePointAt
, which to me are quite interesting APIs for someone trying to understand this.
And equally I could say that https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters includes a lot of generic stuff about UTF-16 that doesn't really belong in the JS docs, specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And equally I could say that https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters includes a lot of generic stuff about UTF-16 that doesn't really belong in the JS docs, specifically.
Well that I could agree. When I wrote all this, there wasn't a lot of shared content discussing these underlying mechanisms, and I was trying to make JS docs self-contained so readers don't have to go to other areas. I'm okay to move/merge some of this to the glossary and update the links, if you think that's worthwhile.
I didn't add/update links from the JS docs because they also have https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters, which they point to sometimes, and I didn't know if they want to point there instead.