SATO Partitioner

SATO, extension of Hadoop-GIS, is a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing.

Spatial Data Partitioning:

In Hadoop-GIS SATO, the raw data is partitioned using different partitioning methods. The major substeps are:

  • Sample, which samples a small fraction of input data for analysis,
  • Analyze, which quickly analyzes sampled data to find an optimal partition strategy,
  • Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and
  • Optimize, which collects succinct partition statistics for potential query optimization. Hadoop-GIS also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems.

Spatial Queries:

The input data of this step can be the partitioned data from the partitioning step or raw data depending on the types and properties of the input data. The data set or sets are processed based on the user-given spatial predicate.

Hadoop-GIS with SATO supports:

  • Range query - Window containment query
  • Spatial join (with different types of predicates)
  • k-NN (k-Nearest-Neighbor)