-
Notifications
You must be signed in to change notification settings - Fork 23
ENH Load CSV files from different dialects #274
base: dev
Are you sure you want to change the base?
Conversation
Expose parameters of pyspark's CSV reader: sep, escape and multiLine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your input! Our policy for PR is to merge into dev, not into master. Please change the base-branch for this PR to dev
@gmichalo thanks, I've changed the base branch and resolved the conflicts. |
holoclean/utils/reader.py
Outdated
|
||
:return: dataframe | ||
""" | ||
try: | ||
if schema is None: | ||
df = spark_session.read.csv(file_path, header=True) | ||
df = spark_session.read.csv(file_path, header=True, sep=sep, | ||
escape=escape, multiLine=multiLine)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the two parenthesis needs to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Done.
holoclean/utils/reader.py
Outdated
df = spark_session.read.csv(file_path, header=True, schema=schema) | ||
df = spark_session.read.csv(file_path, header=True, schema=schema, | ||
sep=sep, escape=escape, | ||
multiLine=multiLine)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the two parenthesis needs to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed too.
This PR exposes some parameters of pyspark's CSV reader: sep, escape and multiLine, up to
session.load_data
.This enables to load files in various dialects of CSV.