ENH Load CSV files from different dialects #274

moreymat · 2018-08-03T08:19:52Z

This PR exposes some parameters of pyspark's CSV reader: sep, escape and multiLine, up to session.load_data.
This enables to load files in various dialects of CSV.

Expose parameters of pyspark's CSV reader: sep, escape and multiLine.

gmichalo

Thank you for your input! Our policy for PR is to merge into dev, not into master. Please change the base-branch for this PR to dev

moreymat · 2018-08-03T14:15:49Z

@gmichalo thanks, I've changed the base branch and resolved the conflicts.
Let me know if you have any other request !

gmichalo · 2018-08-04T21:09:18Z

holoclean/utils/reader.py


        :return: dataframe
        """
        try:
            if schema is None:
-                df = spark_session.read.csv(file_path, header=True)
+                df = spark_session.read.csv(file_path, header=True, sep=sep,
+                                            escape=escape, multiLine=multiLine))


One of the two parenthesis needs to be removed

Oops. Done.

gmichalo · 2018-08-04T21:09:36Z

holoclean/utils/reader.py

-                df = spark_session.read.csv(file_path, header=True, schema=schema)
+                df = spark_session.read.csv(file_path, header=True, schema=schema,
+                                            sep=sep, escape=escape,
+                                            multiLine=multiLine))


One of the two parenthesis needs to be removed

ENH Load CSV files from different dialects

e6b6e4a

Expose parameters of pyspark's CSV reader: sep, escape and multiLine.

gmichalo requested review from jw-mcgrath, ah89 and gmichalo August 3, 2018 13:22

gmichalo suggested changes Aug 3, 2018

View reviewed changes

moreymat changed the base branch from master to dev August 3, 2018 14:06

Merge branch 'dev' into csv-params

2e86bef

gmichalo suggested changes Aug 4, 2018

View reviewed changes

FIX del extra parentheses

0bf4af4

gmichalo requested a review from j48zheng August 9, 2018 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH Load CSV files from different dialects #274

ENH Load CSV files from different dialects #274

Uh oh!

moreymat commented Aug 3, 2018

Uh oh!

gmichalo left a comment

Uh oh!

moreymat commented Aug 3, 2018

Uh oh!

gmichalo Aug 4, 2018

Uh oh!

moreymat Aug 6, 2018

Uh oh!

gmichalo Aug 4, 2018

Uh oh!

moreymat Aug 6, 2018

Uh oh!

Uh oh!

ENH Load CSV files from different dialects #274

Are you sure you want to change the base?

ENH Load CSV files from different dialects #274

Uh oh!

Conversation

moreymat commented Aug 3, 2018

Uh oh!

gmichalo left a comment

Choose a reason for hiding this comment

Uh oh!

moreymat commented Aug 3, 2018

Uh oh!

gmichalo Aug 4, 2018

Choose a reason for hiding this comment

Uh oh!

moreymat Aug 6, 2018

Choose a reason for hiding this comment

Uh oh!

gmichalo Aug 4, 2018

Choose a reason for hiding this comment

Uh oh!

moreymat Aug 6, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!