Skip to content

[Term Entry] JavaScript String: normalize() #7380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions content/javascript/concepts/strings/terms/normalize/normalize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
Title: '.normalize()'
Description: 'Returns the Unicode Normalization Form of a string.'
Subjects:
- 'Code Foundations'
- 'Computer Science'
Tags:
- 'JavaScript'
- 'Methods'
- 'String'
- 'Unicode'
CatalogContent:
- 'introduction-to-javascript'
- 'paths/front-end-engineer-career-path'
---

The **`.normalize()`** method returns the Unicode Normalization Form of a [string](https://www.codecademy.com/resources/docs/javascript/strings). This is especially useful when comparing strings that may look identical but are composed of different Unicode code points.

## Syntax

```pseudo
string.normalize([form])
```

**Parameters:**

- `form` (optional): A string specifying the Unicode normalization form. Valid values are:
- `'NFC'` (default): Canonical Composition
- `'NFD'`: Canonical Decomposition
- `'NFKC'`: Compatibility Composition
- `'NFKD'`: Compatibility Decomposition

**Return value:**

A new string in the specified normalization form.

## Example: Comparing Unicode Representations

In this example, two visually identical strings have different Unicode encodings, and `.normalize()` is used to make them comparable.

```js
const word1 = '\u00e9'; // é as single character
const word2 = '\u0065\u0301'; // e + ́ (combining acute)

console.log(word1 === word2);
console.log(word1.normalize() === word2.normalize());
```

The output of this code is:

```shell
false
true
```

## Codebyte Example: Stripping Accents Using Normalize + Regex

In this example, `.normalize()` is combined with a regular expression to strip accents by removing Unicode diacritical marks:

```codebyte/javascript
const word = 'café';
const stripped = word.normalize('NFD').replace(/[\u0300-\u036f]/g, '');

console.log(stripped);
```