Skip to content

Spark-DNS

Compare
Choose a tag to compare
@yurkao yurkao released this 02 Mar 20:29
· 46 commits to master since this release

Introduction

Spark data source for retrieving DNS A type records from DNS server.
The spark DNS data source uses zone transfers to retrieve data from DNS server.
It tries to use IXFR for every zone transfer though some DNS server implementation may return AXFR response.

The spark DNS data source may operate on multiple DNS zones in single data frame.
Due to nature of DNS zone transfer, data retrieval for single zone transfer cannot be done in parallel,
though data from multiple zones is retrieved in parallel (each DNS zone is handled in different Spark partition of RDD)

Rationale

  1. Learning Spark internals
  2. integrating Spark with 3rd party data sources
  3. Just for fun

Features and limitations

Limitations

  1. Providing multiple DNS servers in options for same the same dataset/table is currently not supported
  2. Continuous Structured Streaming is not supported yet
  3. On Spark 2.4 (incl CDH 6.3.x) only batch reading is supported.

Currently implemented features

  1. Spark batch read
  2. Retrieving DNS A records from multiple DNS zone (though from single DNS server)
  3. New DNS SOA serial of DNS zone is available in Accumulator via Spark UI (refer to relevant stage)
  4. Spark Structured Streaming read support (Only trigger Once and Prcessing time is supported)
  5. Zone transfer timeout
  6. Specifying explicit zone transfer type (AXFR/IXFR) to use when retrieving data from DNS server.
    • When suing xfr=ixfr, only DNS zone updates from initial serial will be returned.
      • On Structured Streaming this may produce empty DataFrames on no updates
    • When using xfr=axfr, entire DNS zone A records will be returned
  7. Handling temporary failures during zone transfer (similar to failOnDataLoss in Spark+Kafka)