iSPEED Features

3D Spatial Query Processing

Query operations in iSPEED are implemented as a combination of a framework query processor and MapReduce jobs.

3D Spatial Query Types

Currently there are three main query types that iSPEED supports:

  • Spatial join / cross-matching
  • Nearest neighbor search
  • Spatial proximity estimation

Input 3D Data Format

Data Format.

  • HadoopGIS currently supports the OFF data format.

Data Preparation. The data format that iSPEED can accept or re-process must be have the following properties:

  • Each record is located on a separate line; a record representing a spatial object must be contained in a single line.
  • Each line starts with a unique object ID followed by the object geometry.
  • The geometry of objects must be in OFF format (other data formats will be supported later).
  • There is a delimiter (|) between each line in the OFF format.

If you have existing data satisfying the above requirements, stage your data on HDFS. e.g.:

hdfs dfs -mkdir /user/testuser

hdfs dfs -mkdir /user/testuser/rawdata1

hdfs dfs -put testdata.off /user/testuser/rawdata1/

If you do not have data, you can download the data from Examples page.

Query Parameters

Arguments are passed to iSPEED via command line arguments. The full list of arguments for the framework manager can be found by executing ../build/bin/queryproc3d –-help.

The following output will be displayed.

Options:
  --help                    This help message
  -n [ --numreducers ] arg  The number of reducers
  -p [ --bucket ] arg       Bucket size for partitioning
  -a [ --input1 ] arg       HDFS file path to data set 1
  -b [ --input2 ] arg       HDFS file path to data set 2
  -i [ --geom1 ] arg        Field number of data set 1 containing the geometry
  -j [ --geom2 ] arg        Field number of data set 2 containing the geometry
  -d [ --distance ] arg     Distance (used for certain predicates)
  -f [ --outputfields ] arg Fields to be output. See the full documentation.
  -h [ --outputpath ] arg   Output path
  -t [ --predicate ] arg    Predicate for spatial join and nn queries
  -q [ --querytype ] arg    Query type [spjoin]
  -s [ --samplingrate ] arg Sampling rate (0, 1]
  -u [ --partitioner ] arg  Partitioning method ([fg | bsp | hc | str | bos ]
  -o [ --overwrite ]        Overwrite existing hdfs directory.

Parameter Explanation

Most of the above listed parameters are self-explained. The very detailed explanation of the query parameters can be found in the Features page.

Here we present a 3D spatial join example. The following code sections display the critical commands to run the spatial join query, and the complete script can be downloaded at run_spatial_join.

The first step is data compression. Data compression is performed by Mapper-only jobs.

#data compression first
../build/bin/queryprocessor_3d -t st_intersects -a your_hdfs_path/3Ddata/spjoin/testdata/d1 -b your_hdfs_path/3Ddata/spjoin/testdata/d2 -h your_hdfs_path/3Ddata/spjoin/testdata/output/ -q spjoin -s 1.0 -n 240 -u fg_3d --bucket 4000 -f tileid,1:1,2:1,intersect_volume -j 1 -i 1 --compression --overwrite
  • -t st_intersections is the default predicate for data compression; -a and -b specify the paths of input dataset.
  • The core parameter in this command is --compression to specify the data compression operations.

The second step is to combine the compressed data, store it into memory and share it across all cluster nodes. The object MBBs are also combined for data partitioning.

#combine all binary data and mbb outputs
../build/bin/runcombiner.sh your_hdfs_path/3Ddata/spjoin/testdata/output

hdfs dfs -rm -r your_hdfs_path/3Ddata/spjoin/testdata/output/output_partidx
hdfs dfs -rm -r your_hdfs_path/3Ddata/spjoin/testdata/output/output_joinout

The third step is to run the actural spatial join.

 #run spatial join
../build/bin/queryprocessor_3d -t st_intersects your_hdfs_path/3Ddata/spjoin/testdata/d1 -b your_hdfs_path/3Ddata/spjoin/testdata/d2 -h your_hdfs_path/3Ddata/spjoin/testdata/output/ -q spjoin -s 1.0 -n 240 -u fg_3d --bucket 4000 -f tileid,1:1,2:1 -j 1 -i 1 --spatialproc --decomplod 100 --overwrite
  • -t st_intersections is the predicate for spatial join query; -a and -b specify the paths of input dataset, and -h specifies the output path.
  • -q spjoin specifies the spatial query type.
  • -s 1.0 is the data sampling rate for spatial partitioning, -u fg_3d specifies the fixed-grid partitioning method and bucket 4000 is the bucket size for partitioning. So after data partitioning, each partitioned cuboid contains about 4000 objects.
  • -n 240 is the number of reducers for this MapReduce job. This number is adjustable based on the cluster configurations.
  • --spatialproc indicates this is the spatial query process, and --decomplod 100 specifies the level of detail (LOD) used in the spatial join query. Users can choose different LODs (resolutions) to balance the run time and accuracy in practice.