|
| 1 | + |
| 2 | +Contributors: Lexing and Parsing String Templates: |
| 3 | +=================================================== |
| 4 | +Supporting string templates requires coordination between the lexer, parser and |
| 5 | +printer. The lexer (as always) creates a token stream, but when it encounters a |
| 6 | +backtick, it begins a special parsing mode that collects the (mostly) raw text, |
| 7 | +until either hitting a closing backtick, or a `${`. If it encounters the `${` |
| 8 | +(called an "interpolation region"), it will temporarily resume the "regular" |
| 9 | +lexing approach, instead of collecting the raw text - until it hits a balanced |
| 10 | +`}`, upon which it will enter the "raw text" mode again until it hits the |
| 11 | +closing backtick. |
| 12 | + |
| 13 | +- Parsing of raw text regions and regular tokenizing: Handled by |
| 14 | + `reason_declarative_lexer.ml`. |
| 15 | +- Token balancing: Handled by `reason_lexer.ml`. |
| 16 | + |
| 17 | +The output of lexing becomes tokens streamed into the parser, and the parser |
| 18 | +`reason_parser.mly` turns those tokens into AST expressions. |
| 19 | + |
| 20 | +## Lexing: |
| 21 | + |
| 22 | +String templates are opened by: |
| 23 | +- A backtick. |
| 24 | +- Followed by any whitespace character (newline, or space/tab). |
| 25 | + |
| 26 | +- Any whitespace character (newline, or space/tab). |
| 27 | +- Followed by a backtick |
| 28 | + |
| 29 | +```reason |
| 30 | +let x = ` hi this is my string template ` |
| 31 | +let x = ` |
| 32 | +The newline counts as a whitespace character both for opening and closing. |
| 33 | +` |
| 34 | +
|
| 35 | +``` |
| 36 | + |
| 37 | +Within the string template literal, there may be regions of non-string |
| 38 | +"interpolation" where expressions are lexed/parsed. |
| 39 | + |
| 40 | +```reason |
| 41 | +let x = ` hi this is my ${expressionHere() ++ "!"} template ` |
| 42 | +``` |
| 43 | + |
| 44 | +Template strings are lexed into tokens, some of those tokens contain a string |
| 45 | +"payload" with portions of the string content. |
| 46 | +The opening backtick, closing backtick, and `${` characters do not become a |
| 47 | +token that is fed to the parser, and are not included in the text payload of |
| 48 | +any token. The Right Brace `}` closing an interpolation region `${` _does_ |
| 49 | +become a token that is fed to the parser. There are three tokens that are |
| 50 | +produced when lexing string templates. |
| 51 | + |
| 52 | +- `STRING_TEMPLATE_TERMINATED(string)`: A string region that is terminated with |
| 53 | + closing backtick. It may be the entire string template contents if there are |
| 54 | + no interpolation regions `${}`, or it may be the final string segment after |
| 55 | + an interpolation region `${}`, as long as it is the closing of the entire |
| 56 | + template. |
| 57 | +- `STRING_TEMPLATE_SEGMENT_LBRACE(string)`: A string region occuring _before_ |
| 58 | + an interpolation region `${`. The `string` payload of this token is the |
| 59 | + contents up until (but not including) the next `${`. |
| 60 | +- `RBRACE`: A `}` character that terminates an interpolation region that |
| 61 | + started with `${`. |
| 62 | + |
| 63 | +Simple example: |
| 64 | + |
| 65 | + STRING_TEMPLATE_TERMINATED |
| 66 | + | | |
| 67 | + ` lorem ipsum lorem ipsum bla ` |
| 68 | + ^ ^ |
| 69 | + | | |
| 70 | + | The closing backtick also doesn't show up in the token |
| 71 | + | stream, but the last white space is part of the lexed |
| 72 | + | STRING_TEMPLATE_TERMINATED token |
| 73 | + | (it is used to compute indentation, but is stripped from |
| 74 | + | the string constant, or re-inserted in refmting if not present) |
| 75 | + | |
| 76 | + The backtick doesn't show up anywhere in the token stream. The first |
| 77 | + single white space after backtick is also not part of the lexed tokens. |
| 78 | + |
| 79 | +Multiline example: |
| 80 | + |
| 81 | + All of this leading line whitespace remains parts of the tokens' payloads |
| 82 | + but it is is normalized and stripped when the parser converts the tokens |
| 83 | + into string expressions. |
| 84 | + | |
| 85 | + | This newline not part of any token |
| 86 | + | | |
| 87 | + | v |
| 88 | + | ` |
| 89 | + +-> lorem ipsum lorem |
| 90 | + ipsum bla |
| 91 | + ` |
| 92 | + ^ |
| 93 | + | |
| 94 | + All of this white space on final line is part of the token as well. |
| 95 | + |
| 96 | + |
| 97 | +For interpolation, the token `STRING_TEMPLATE_SEGMENT_LBRACE` represents the |
| 98 | +string contents (minus any single/first white space after backtick), up to the |
| 99 | +`${`. As with non-interpolated string templates, the opening and closing |
| 100 | +backtick does not show up in the token stream, the first white space character |
| 101 | +after opening backtick is not included in the lexed string contents, the final |
| 102 | +white space character before closing backtick *is* part of the lexed string |
| 103 | +token (to compute indentation), but that final white space character, along |
| 104 | +with leading line whitespace is stripped from the string expression when the |
| 105 | +parsing stage converts from lexed tokens to AST string expressions. |
| 106 | + |
| 107 | + ` lorem ipsum lorem ipsum bla${expression}lorem ipsum lorem ip lorem` |
| 108 | + | | || | |
| 109 | + STRING_TEMPLATE_TERMINATED |STRING_TEMPLATE_TERMINATED |
| 110 | + RBRACE |
| 111 | +## Parsing: |
| 112 | + |
| 113 | +The string template tokens are turned into normal AST expressions. |
| 114 | +`STRING_TEMPLATE_SEGMENT_LBRACE` and `STRING_TEMPLATE_TERMINATED` lexed tokens |
| 115 | +contains all of the string contents, plus leading line whitespace for each |
| 116 | +line, including the final whitespace before the closing backtick. These are |
| 117 | +normalized in the parser by stripping that leading whitespace including two |
| 118 | +additional spaces for nice indentation, before turning them into some |
| 119 | +combination of string contants with a special attribute on the AST, or string |
| 120 | +concats with a special attribute on the concat AST node. |
| 121 | + |
| 122 | +```reason |
| 123 | +
|
| 124 | +// This: |
| 125 | +let x = ` |
| 126 | + Hello there |
| 127 | +`; |
| 128 | +// Becomes: |
| 129 | +let x = [@reason.template] "Hello there"; |
| 130 | +
|
| 131 | +// This: |
| 132 | +let x = ` |
| 133 | + ${expr} Hello there |
| 134 | +`; |
| 135 | +// Becomes: |
| 136 | +let x = [@reason.template] (expr ++ [@reason.template] "Hello there"); |
| 137 | +
|
| 138 | +``` |
| 139 | + |
| 140 | +User Documentation: |
| 141 | +=================== |
| 142 | +> This section is the user documentation for string template literals, which |
| 143 | +> will be published to the [official Reason Syntax |
| 144 | +> documentation](https://reasonml.github.io/) when |
| 145 | +
|
| 146 | +TODO |
0 commit comments