Description parsing behaviour for Markdown changed #379

davidjgoss · 2025-03-20T09:25:58Z

From v32, Gherkin is getting different results when parsing Markdown, specifically for the description.

This surfaced in the compatibility kit when trying to upgrade to v32:

The critical parts are still fine - the same pickles come out compared to v31 - but there is now a description being picked up:

For reference the sample file being parsed is:
https://github.com/cucumber/compatibility-kit/blob/main/devkit/samples/markdown/markdown.feature.md

So it looks like we are picking up the first row of that table as a description, where before there was none.

mpkorstanje · 2025-03-20T11:16:58Z

Not entirely unexpected, the Markdown parser is most favorably described as a proof of concept bolted onto the Gherkin parser. And the problem seems to come from this:

The GherkinInMarkdownTokenMatcher will consider all lines that aren't recognised as a special token as Empty. The reason for this is that Markdown documents will typically have lines that have nothing to do with Gherkin - they are just prose.

Which is achieved by testing for the empty token:

gherkin/javascript/src/GherkinInMarkdownTokenMatcher.ts

Lines 97 to 110 in 2143371

    
           if ( 
        
             !this.match_TagLine(token) && 
        
             !this.match_FeatureLine(token) && 
        
             !this.match_ScenarioLine(token) && 
        
             !this.match_BackgroundLine(token) && 
        
             !this.match_ExamplesLine(token) && 
        
             !this.match_RuleLine(token) && 
        
             !this.match_TableRow(token) && 
        
             !this.match_Comment(token) && 
        
             !this.match_Language(token) && 
        
             !this.match_DocStringSeparator(token) && 
        
             !this.match_EOF(token) && 
        
             !this.match_StepLine(token) 
        
           ) {

And then that matches the comment line:

gherkin/javascript/src/GherkinInMarkdownTokenMatcher.ts

Lines 129 to 136 in 2143371

    
           match_Comment(token: Token): boolean { 
        
             let result = false 
        
             if (token.line.startsWith('|')) { 
        
               const tableCells = token.line.getTableCells() 
        
               if (this.isGfmTableSeparator(tableCells)) result = true 
        
             } 
        
             return this.setTokenMatched(token, null, result) 
        
           }

Why, not a clue though.

davidjgoss added the 🐛 bug Defect / Bug label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description parsing behaviour for Markdown changed #379

Description parsing behaviour for Markdown changed #379

davidjgoss commented Mar 20, 2025

mpkorstanje commented Mar 20, 2025

Description parsing behaviour for Markdown changed #379

Description parsing behaviour for Markdown changed #379

Comments

davidjgoss commented Mar 20, 2025

mpkorstanje commented Mar 20, 2025