Skip to content

[Bug]:Documentation Inconsistency: Datastream to BigQuery Template Metadata Fields (_metadata_dataset vs. _metadata_schema) #2400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gnswp21 opened this issue May 29, 2025 · 1 comment · Fixed by #2403
Assignees
Labels
bug Something isn't working needs triage p2

Comments

@gnswp21
Copy link

gnswp21 commented May 29, 2025

Related Template(s)

Datastream to BigQuery Dataflow template

Template Version

main (latest at 250529)

What happened?

Hello Dataflow Templates Team,

I'm reporting a significant inconsistency I encountered while using the Datastream to BigQuery Dataflow template. The official documentation for the outputStagingDatasetTemplate (and potentially outputDatasetTemplate) option suggests using {_metadata_dataset} for dynamic dataset naming based on source metadata.

However, after extensive debugging with a MySQL Datastream source (where read_method is mysql-cdc-binlog), I found that the source_metadata.database field was not being mapped to _metadata_dataset. Instead, it was being mapped to {_metadata_schema} in the resulting JSON output.

Specifically, in the FormatDatastreamRecordToJson.java class (within com.google.cloud.teleport.v2.datastream.transforms), the following logic is present:

private String getSourceType(GenericRecord record) {
  String sourceType = record.get("read_method").toString().split("-")[0];
  return sourceType;
}

// ... later in the apply method:
if (sourceType.equals("mysql")) {
  outputObject.put("_metadata_schema", getMetadataDatabase(record));
  // ... other mysql-specific metadata
}

This code snippet clearly shows that for MySQL sources, the source_metadata.database value is explicitly assigned to the _metadata_schema field in the transformed JSON. Consequently, using {_metadata_schema} resolves the issue and allows for correct dynamic dataset creation.

The current documentation is misleading for MySQL users, leading to unnecessary debugging and frustration. I kindly request that the documentation for this template be updated to reflect the correct field name, _metadata_schema, when dealing with MySQL Datastream sources, or to clarify the conditions under which _metadata_dataset is applicable.

Thank you for your time and consideration.

Relevant log output

@gnswp21 gnswp21 added bug Something isn't working p2 needs triage labels May 29, 2025
@liferoad liferoad self-assigned this May 29, 2025
@liferoad
Copy link
Contributor

It is safe to just update the doc with _metadata_schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants