Skip to content

Description parsing behaviour for Markdown changed #379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidjgoss opened this issue Mar 20, 2025 · 1 comment
Open

Description parsing behaviour for Markdown changed #379

davidjgoss opened this issue Mar 20, 2025 · 1 comment
Labels
🐛 bug Defect / Bug

Comments

@davidjgoss
Copy link
Contributor

From v32, Gherkin is getting different results when parsing Markdown, specifically for the description.

This surfaced in the compatibility kit when trying to upgrade to v32:

The critical parts are still fine - the same pickles come out compared to v31 - but there is now a description being picked up:

Image

For reference the sample file being parsed is:
https://github.com/cucumber/compatibility-kit/blob/main/devkit/samples/markdown/markdown.feature.md

So it looks like we are picking up the first row of that table as a description, where before there was none.

@davidjgoss davidjgoss added the 🐛 bug Defect / Bug label Mar 20, 2025
@mpkorstanje
Copy link
Contributor

Not entirely unexpected, the Markdown parser is most favorably described as a proof of concept bolted onto the Gherkin parser. And the problem seems to come from this:

The GherkinInMarkdownTokenMatcher will consider all lines that aren't recognised as a special token as Empty. The reason for this is that Markdown documents will typically have lines that have nothing to do with Gherkin - they are just prose.

Which is achieved by testing for the empty token:

if (
!this.match_TagLine(token) &&
!this.match_FeatureLine(token) &&
!this.match_ScenarioLine(token) &&
!this.match_BackgroundLine(token) &&
!this.match_ExamplesLine(token) &&
!this.match_RuleLine(token) &&
!this.match_TableRow(token) &&
!this.match_Comment(token) &&
!this.match_Language(token) &&
!this.match_DocStringSeparator(token) &&
!this.match_EOF(token) &&
!this.match_StepLine(token)
) {

And then that matches the comment line:

match_Comment(token: Token): boolean {
let result = false
if (token.line.startsWith('|')) {
const tableCells = token.line.getTableCells()
if (this.isGfmTableSeparator(tableCells)) result = true
}
return this.setTokenMatched(token, null, result)
}

Why, not a clue though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Defect / Bug
Projects
None yet
Development

No branches or pull requests

2 participants