Skip to content
Spiro Michaylov edited this page Sep 19, 2015 · 10 revisions

Spark Compatibility Notes

Whenever these exampels are updated for a new Spark version, changes tend to be needed, and some are itneresting and important. Starting with Spark 1.5.0, here are the details.

Spark 1.5.0

  1. The Hive examples (hive.*) are failing with memory problems. Watch this space for a solution when I find one.
  2. sql.OutputJSON needed to deal to be extended because it seems that JSON integers that were being interpreted as ints are now interpeted as longs.
  3. sql.Types had to change quite a lot because the type conversions seem to have become a lot more stringent.
  4. In dealing with a deprecation in sql.JSONTypes I can't find a supported way to provide a schema whan reading a JSON file. This is not strictly speaking a 1.5.0 problem. I'll keep looking for a solution.
  5. The last example in dataframe.UDF now fails. I'll probably file a JIRA ticket for it, and put the link here.
  6. With the introduction of more systematic reading and writing for dataframes, I took this opportunity to replace all uses oif the old deprecated techniques.
  7. Again not really a 1.5.0 problem, but there are two more deprecations I couldn't find a good way to deal with:
    1. In hiveql.UDAF the deprecated approach is the only one I've been able to figure out so far.
    2. In queue based streaming, scala.collection.mutable.SynchronizedQueue has been deprecated for some time, but there doiesn't seemt o be a non-deprecated replacement that StreamignContext,queueStream() will accept as an input.
Clone this wiki locally