Skip to content

Conversation

siddarthpatel
Copy link
Contributor

  • add a playground folder
  • explore pandas dataframe methods

@siddarthpatel
Copy link
Contributor Author

@mtsadler-branch can this pr be merged?

Copy link
Contributor

@mtsadler-branch mtsadler-branch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level flow looks good.

Nit-picked the PR for sake of Python exploration

Comment on lines +84 to +99
if (sys.argv[1] == 'df-read'):
df = read_data(filename=source_file)
elif (sys.argv[1] == 'df-iterrows'):
df = read_data(filename=source_file)
df_iter_rows(df)
elif (sys.argv[1] == 'df-itertuples'):
df = read_data(filename=source_file)
df_iter_tuples(df)
elif (sys.argv[1] == 'df-itercolumns'):
df = read_data(filename=source_file)
df_iter_columns(df)
elif (sys.argv[1] == 'db-load'):
load_file_into_db(source_file, table_name, conn)
elif (sys.argv[1] == 'db-load-read'):
load_file_into_db(source_file, table_name, conn)
preview_data_in_db(table_name, conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may help with code dryness to leverage click here.

The @click is called a python decorator.

For example, you could have a flag called --command that you pass to a function:

import click

@click.command(help="""Example: python3 playground/my_playground.py --command 'df-read'

Choose one of the following options:
- df-read
- df-iterrows
- df-itertuples
- ...
"""
)
@click.option("--command", default="db-load-read")
def run(command):
      if (command == 'df-read'):
        df = read_data(filename=source_file)
      ...
    elif (command == 'db-load-read'):
        load_file_into_db(source_file, table_name, conn)
        preview_data_in_db(table_name, conn)
    else:
        print("Invalid input detected!!!!")
        with click.Context(command) as ctx:
               click.echo(command.get_help(ctx))

(help message)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

click should give you ability to do everything from run.sh in the same part of python code where its being passed via sys.argv

try using this line of code in your if name =main block

Comment on lines +12 to +32
if [[ "${varname}" == "1" ]];then
echo "(pd.read_csv) uses pandas to read a csv file and convert it to a dataframe"
python3 my_playground.py df-read
elif [[ "${varname}" == "2" ]]; then
echo "(dataframe.iterrows) uses dataframe.iterrows to print first 3 rows in df"
python3 my_playground.py df-iterrows
elif [[ "${varname}" == "3" ]]; then
echo "(dataframe.itertuples) uses dataframe.itertuples to print first 3 row tuples in df"
python3 my_playground.py df-itertuples
elif [[ "${varname}" == "4" ]]; then
python3 my_playground.py df-itercolumns
elif [[ "${varname}" == "5" ]]; then
echo "(dataframe.to_sql) load a table into a database connection istance"
python3 my_playground.py db-load
elif [[ "${varname}" == "6" ]]; then
echo "(dataframe.to_sql) load a table into a database connection istance"
echo "(pd.read_sql) read table contents in a database connection istance to print first 10 rows"
python3 my_playground.py db-load-read
else
echo "Invalid input"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think itd be easier to maintain this logic with click

Comment on lines +17 to +25
def df_iter_rows(df):
"""(dataframe.iterrows) uses dataframe.iterrows to print first 3 rows in df

Args:
df (dataframe): pandas dataframe
"""
print('(ITER_ROWS) Printing first 3 rows in the dataframe...')
for index, row in df.iterrows():
if index == 3: break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def df_iter_rows(df):
"""(dataframe.iterrows) uses dataframe.iterrows to print first 3 rows in df
Args:
df (dataframe): pandas dataframe
"""
print('(ITER_ROWS) Printing first 3 rows in the dataframe...')
for index, row in df.iterrows():
if index == 3: break
def df_iter_rows(df, limit=3):
"""
Prints the first `limit` rows of a dataframe, leveraging `df.iterrows()`.
Args:
df (dataframe): pandas dataframe
limit (int): number of rows to show [default: 3]
"""
print(f'Printing first {limit} rows in the dataframe...')
for index, row in df.iterrows():
if index == limit: break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like above to your functions.

limit is the number of rows, setting the default to 3

print(f"my_var is set to {my_var}") is called f-string notation which looks nice in code and allows you to build strings easily.

index = index + 1

def load_file_into_db(filename, table, conn):
"""(dataframe.to_sql) load a table into a database connection istance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: load -> loads

Google uses descriptive, rather than imperative mood for function descriptions.

@@ -0,0 +1,1001 @@
First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants