Skip to content

Feature Request - Preview Results / Query #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benjaminwestern opened this issue Nov 7, 2024 · 5 comments
Open

Feature Request - Preview Results / Query #41

benjaminwestern opened this issue Nov 7, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@benjaminwestern
Copy link
Collaborator

Firstly.... Thank you for this tool!

It has saved countless hours of going back and forth to the GCP Console to validate columns, queries, run costs etc.

I was wondering if you would be able to update your 'Dataform Tools' feature slightly
image

If possible I would love to be able to set a 'sample' size as well as a custom 'limit' size. See Documentation here

In addition to this. Being able to see the columns and types of a given SQLX table definition would be awesome... using INFORMATION_SCHEMA.Columns like:

Documentation

SELECT
column_name, data_type
FROM
`<your-gcp-project-id>`.<your-bigquery-dataset-name>.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="<your-bigquery-table-name>"

I will attempt to write a PR for these later this week but I also just wanted to share my appreciation for your extension!

Cheers!

@ashish10alex ashish10alex self-assigned this Nov 7, 2024
@ashish10alex ashish10alex added the enhancement New feature or request label Nov 7, 2024
@ashish10alex ashish10alex removed their assignment Nov 7, 2024
@ashish10alex
Copy link
Owner

Hi @benjaminwestern , thanks for your kind words. For the features requested in the issue, I have implemented one of them as I could not help myself.

  • Display the schema of the compiled query - I have used the dry run api than querying the information schema table as it would be useful to look at the potential schema of a table/view that does not already exists and also when query is modified but the table is not materialised

CleanShot 2024-11-07 at 14 52 31@2x

  • Custom LIMIT - The main reason I have set the limit to 1000 is if a table has lets say around 100k rows even though the query is processed in BigQuery relatively quickly, the rate at which api gets the data is fairly slow. The way around this to my knowledge is to use storage api which might not be enabled for every user also it incurs additional cost. I am happy for you to create an MR to have custom LIMIT while keeping the default as 1000. Also form what I understood from reading the docs using sample as opposed to limit only makes the results non-deterministic. So, would you say having just LIMIT to be customisable would suffice ?

@ashish10alex ashish10alex self-assigned this Nov 7, 2024
@HampB
Copy link
Collaborator

HampB commented Nov 7, 2024

I would also like to see the option to specify a sample size. Unlike LIMIT, TABLESAMPLE reduces the actual cost of a query. This feature would be beneficial when working with large datasets.

@ashish10alex
Copy link
Owner

cool Ill wait for @benjaminwestern to submit a PR but it might be good to agree to a UI for having both LIMIT & TABLESAMPLE for it be intuitive for the users

@benjaminwestern
Copy link
Collaborator Author

Thanks Legend! haha I love the idea of seeing the values pre-creation so then we can validate what the query will generate.

The reason I am interested in table sample is to validate my utilities across a random sampling of data from the table prior to fully releasing the code. Say for example I have a mobile number validation dataform utility, I would want to pre-validate this across a random sampling of the data available in the destination before committing to a potentially expensive and incorrect operation.

In relation to LIMIT, I 100% agree that large tables need to have limit forced, I will look at pushing a PR to also increment an Offset so users can paginate through their data in a deterministic way Documentation

@ashish10alex
Copy link
Owner

Thanks @benjaminwestern , sounds great, looking forward to the PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants