-
Notifications
You must be signed in to change notification settings - Fork 290
Home
Spiro Michaylov edited this page Sep 19, 2015
·
10 revisions
Whenever these exampels are updated for a new Spark version, changes tend to be needed, and some are itneresting and important. Starting with Spark 1.5.0, here are the details.
- The Hive examples (hive.*) are failing with memory problems. Watch this space for a solution when I find one.
- sql.OutputJSON needed to deal to be extended because it seems that JSON integers that were being interpreted as ints are now interpeted as longs.
- sql.Types had to change quite a lot because the type conversions seem to have become a lot more stringent.
- In dealing with a deprecation in sql.JSONTypes I can't find a supported way to provide a schema whan reading a JSON file. This is not strictly speaking a 1.5.0 problem. I'll keep looking for a solution.
- The last example in dataframe.UDF now fails. I'll probably file a JIRA ticket for it, and put the link here.
- With the introduction of more systematic reading and writing for dataframes, I took this opportunity to replace all uses oif the old deprecated techniques.
- Again not really a 1.5.0 problem, but there are two more deprecations I couldn't find a good way to deal with:
- In hiveql.UDAF the deprecated approach is the only one I've been able to figure out so far.
- In queue based streaming, scala.collection.mutable.SynchronizedQueue has been deprecated for some time, but there doiesn't seemt o be a non-deprecated replacement that StreamignContext,queueStream() will accept as an input.