Open
Description
In the file "meta_Amazon_Fashion.jsonl.gz"
under the "details"
field, the nested json contains an attribute that is sometimes labelled "Assembly Required"
(upper case R
) and sometimes "Assembly required"
(lower case r
).
The mixing of upper and lower case results in problems when loading into Apache Spark.
For example when trying to load in Apache Spark using:
spark.read.json("meta_Amazon_Fashion.jsonl.gz")
It fails with:
AnalysisException: [COLUMN_ALREADY_EXISTS] The column `assembly required` already exists. Consider to choose another name or rename the existing column.
Example of data that causes the problem meta_Amazon_Fashion.jsonl.gz
:
(line: 3364) .... "details": {"Brand": "MD Sports", "Sport": "Air Hockey", "Assembly Required": "Yes", "Is Discontinued By Manufacturer": "No", "Date First Available": "December 15, 2017"}
(line: 206170) .... "details": {"Product Dimensions": "0.5 x 0.5 x 0.5 inches", "Item Weight": "4 ounces", "Manufacturer": "Healing Crystals India", "Item model number": "LABCAB", "Is Discontinued By Manufacturer": "No", "Assembly required": "No", "Number of pieces": "7", "Batteries required": "No"}
Metadata
Metadata
Assignees
Labels
No labels