You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reporting a significant inconsistency I encountered while using the Datastream to BigQuery Dataflow template. The official documentation for the outputStagingDatasetTemplate (and potentially outputDatasetTemplate) option suggests using {_metadata_dataset} for dynamic dataset naming based on source metadata.
However, after extensive debugging with a MySQL Datastream source (where read_method is mysql-cdc-binlog), I found that the source_metadata.database field was not being mapped to _metadata_dataset. Instead, it was being mapped to {_metadata_schema} in the resulting JSON output.
Specifically, in the FormatDatastreamRecordToJson.java class (within com.google.cloud.teleport.v2.datastream.transforms), the following logic is present:
privateStringgetSourceType(GenericRecordrecord) {
StringsourceType = record.get("read_method").toString().split("-")[0];
returnsourceType;
}
// ... later in the apply method:if (sourceType.equals("mysql")) {
outputObject.put("_metadata_schema", getMetadataDatabase(record));
// ... other mysql-specific metadata
}
This code snippet clearly shows that for MySQL sources, the source_metadata.database value is explicitly assigned to the _metadata_schema field in the transformed JSON. Consequently, using {_metadata_schema} resolves the issue and allows for correct dynamic dataset creation.
The current documentation is misleading for MySQL users, leading to unnecessary debugging and frustration. I kindly request that the documentation for this template be updated to reflect the correct field name, _metadata_schema, when dealing with MySQL Datastream sources, or to clarify the conditions under which _metadata_dataset is applicable.
Thank you for your time and consideration.
Relevant log output
The text was updated successfully, but these errors were encountered:
Related Template(s)
Datastream to BigQuery Dataflow template
Template Version
main (latest at 250529)
What happened?
Hello Dataflow Templates Team,
I'm reporting a significant inconsistency I encountered while using the Datastream to BigQuery Dataflow template. The official documentation for the outputStagingDatasetTemplate (and potentially outputDatasetTemplate) option suggests using
{_metadata_dataset}
for dynamic dataset naming based on source metadata.However, after extensive debugging with a
MySQL Datastream source
(where read_method is mysql-cdc-binlog), I found that the source_metadata.database field was not being mapped to _metadata_dataset. Instead, it was being mapped to{_metadata_schema}
in the resulting JSON output.Specifically, in the FormatDatastreamRecordToJson.java class (within com.google.cloud.teleport.v2.datastream.transforms), the following logic is present:
This code snippet clearly shows that for MySQL sources, the source_metadata.database value is explicitly assigned to the _metadata_schema field in the transformed JSON. Consequently, using {_metadata_schema} resolves the issue and allows for correct dynamic dataset creation.
The current documentation is misleading for MySQL users, leading to unnecessary debugging and frustration. I kindly request that the documentation for this template be updated to reflect the correct field name, _metadata_schema, when dealing with MySQL Datastream sources, or to clarify the conditions under which _metadata_dataset is applicable.
Thank you for your time and consideration.
Relevant log output
The text was updated successfully, but these errors were encountered: