Hadoop-GIS Installation

Install Hadoop distribution

Please contact your system admin or see the Hadoop installation guide.

Please add or reconfigure your environment variables to have the following variables set up and pointing to correct paths (preferably in your ~/.bashrc file):

HADOOP_INSTALL

HADOOP_HDFS_HOME

YARN_HOME

HADOOP_PREFIX

The following is not available by default, so please add it manually

HADOOP_STREAMING_PATH

(This is not a native default environment variable; this is the directory where there should be a symbolic link to the hadoop streaming jar file named hadoop-streaming.jar)

The reason is the use of Hadoop streaming jar in scripts: hadoop jar ${HADOOP_STREAMING_PATH}/hadoop-streaming.jar –args

Install library dependencies

Below are the list of dependencies that Hadoop-GIS requires. They must be available on all compute (task) nodes. You can install dependencies as the root user or a non-admin user.

Install the libraries in the following order:

  1. C-boost 1.x (x > 47)
  2. g++ 4.x (x >= 4)
  3. python 3.x (x >= 0)
  4. GEOS 3.x (x >= 3)
  5. libspatialindex 1.8.x (x >= 0)

Most installation scripts located in install/dependencies directory contains several bootstrap scripts to install on different system and different user privileges.

Installing as an admin

TBU - After the installation of the above packages on all cluster

Installing as a non-admin user

For the above library dependencies, you can preferably install them into your home or shared directory.

For most packages, running autogen.sh, bootstrap.sh or ./configure –prefix=your_preferred_path, before executing make and make install.

Update your $LD_LIBRARY_PATH environment variable with paths to dependent libraries.

To test whether dependencies were properly installed,

It is recommended to install them into a common directory.