iSPEED Features
3D Spatial Query Processing
Query operations in iSPEED are implemented as a combination of a framework query processor and MapReduce jobs.
3D Spatial Query Types
Currently there are three main query types that iSPEED supports:
- Spatial join / cross-matching
- Nearest neighbor search
- Spatial proximity estimation
Input 3D Data Format
Data Format.
- HadoopGIS currently supports the OFF data format.
Data Preparation. The data format that iSPEED can accept or re-process must be have the following properties:
- Each record is located on a separate line; a record representing a spatial object must be contained in a single line.
- Each line starts with a unique object ID followed by the object geometry.
- The geometry of objects must be in OFF format (other data formats will be supported later).
- There is a delimiter (
|
) between each line in the OFF format.
If you have existing data satisfying the above requirements, stage your data on HDFS. e.g.:
hdfs dfs -mkdir /user/testuser
hdfs dfs -mkdir /user/testuser/rawdata1
hdfs dfs -put testdata.off /user/testuser/rawdata1/
If you do not have data, you can download the data from Examples page.
Query Parameters
Arguments are passed to iSPEED via command line arguments. The full list of arguments for the framework manager can be found by executing ../build/bin/queryproc3d –-help
.
The following output will be displayed.
Options: --help This help message -n [ --numreducers ] arg The number of reducers -p [ --bucket ] arg Bucket size for partitioning -a [ --input1 ] arg HDFS file path to data set 1 -b [ --input2 ] arg HDFS file path to data set 2 -i [ --geom1 ] arg Field number of data set 1 containing the geometry -j [ --geom2 ] arg Field number of data set 2 containing the geometry -d [ --distance ] arg Distance (used for certain predicates) -f [ --outputfields ] arg Fields to be output. See the full documentation. -h [ --outputpath ] arg Output path -t [ --predicate ] arg Predicate for spatial join and nn queries -q [ --querytype ] arg Query type [spjoin] -s [ --samplingrate ] arg Sampling rate (0, 1] -u [ --partitioner ] arg Partitioning method ([fg | bsp | hc | str | bos ] -o [ --overwrite ] Overwrite existing hdfs directory.
Parameter Explanation
Most of the above listed parameters are self-explained. The very detailed explanation of the query parameters can be found in the Features page.
Here we present a 3D spatial join example. The following code sections display the critical commands to run the spatial join query, and the complete script can be downloaded at run_spatial_join.
The first step is data compression. Data compression is performed by Mapper-only jobs.
#data compression first ../build/bin/queryprocessor_3d -t st_intersects -a your_hdfs_path/3Ddata/spjoin/testdata/d1 -b your_hdfs_path/3Ddata/spjoin/testdata/d2 -h your_hdfs_path/3Ddata/spjoin/testdata/output/ -q spjoin -s 1.0 -n 240 -u fg_3d --bucket 4000 -f tileid,1:1,2:1,intersect_volume -j 1 -i 1 --compression --overwrite
-t st_intersections
is the default predicate for data compression;-a
and-b
specify the paths of input dataset.- The core parameter in this command is
--compression
to specify the data compression operations.
The second step is to combine the compressed data, store it into memory and share it across all cluster nodes. The object MBBs are also combined for data partitioning.
#combine all binary data and mbb outputs ../build/bin/runcombiner.sh your_hdfs_path/3Ddata/spjoin/testdata/output hdfs dfs -rm -r your_hdfs_path/3Ddata/spjoin/testdata/output/output_partidx hdfs dfs -rm -r your_hdfs_path/3Ddata/spjoin/testdata/output/output_joinout
The third step is to run the actural spatial join.
#run spatial join ../build/bin/queryprocessor_3d -t st_intersects your_hdfs_path/3Ddata/spjoin/testdata/d1 -b your_hdfs_path/3Ddata/spjoin/testdata/d2 -h your_hdfs_path/3Ddata/spjoin/testdata/output/ -q spjoin -s 1.0 -n 240 -u fg_3d --bucket 4000 -f tileid,1:1,2:1 -j 1 -i 1 --spatialproc --decomplod 100 --overwrite
-t st_intersections
is the predicate for spatial join query;-a
and-b
specify the paths of input dataset, and-h
specifies the output path.-q spjoin
specifies the spatial query type.-s 1.0
is the data sampling rate for spatial partitioning,-u fg_3d
specifies the fixed-grid partitioning method andbucket 4000
is the bucket size for partitioning. So after data partitioning, each partitioned cuboid contains about 4000 objects.-n 240
is the number of reducers for this MapReduce job. This number is adjustable based on the cluster configurations.--spatialproc
indicates this is the spatial query process, and--decomplod 100
specifies the level of detail (LOD) used in the spatial join query. Users can choose different LODs (resolutions) to balance the run time and accuracy in practice.