Getting Started

Quick Installation

This is a portable and OS independent system, therefore installation is practically just downloading and running the system.

Steps:
1. Download required files from our download page.
2. Start the Solr
3. Make the input file according to specified format.
4. Use jar file for running the program. Check out the example.

That is it! We have encapsulated everything in order to make it as simple as possible.

Start The Solr

First you need to run the solr, go to the solr folder in your computer, in Linux and Mac, use command:

bin/solr start -e cloud -noprompt

On Windows, use:

bin/solr.cmd start -e cloud -noprompt

After your job is finish, you could stop Solr in linux and Mac by command:

bin/solr stop -all

And in Windows, by:

bin/solr.cmd stop -all

Input File Format

Currently we are only supporting csv format with headers as follow - Latitude, Longitude, Address 1, Address 2, City, County, State, Zip code, Country. There are four requirements for the input file listed below.

1. Must have meaningful building number.
2. Must have zip code.
3. Must be in New York state.
4. Must have street name.

When your address has two parts, you need to put place name below the address1, and the part with street name below the address2. Examples of correct input format:

Latitude Longitude Address1 Address2 city county state zipcode country
42.651895 -73.764145 339 Hamilton St Albany NY 12210 USA
Strong Memorial Hospital 601 Elmwood Ave Rochester NY 14642 USA

There are samples of input files format at our download page, check them out.

Additional Parameters


1. Mode

This program works in two modes.

1. Comparison Mode. For measuring accuracy of this system given a list of addresses with their geolocations as the ground truth.

2. Searching Mode (default). To extract geolocations for list of addresses, most common mode.


2. NumberOfThreads

Specifying number of additional threads the program could use which helps improving performance of this system. The minimum amount is one, default is set to 8 and recommended number is two times cpu virtual cores minus one.


3. Max

Number of rows from start that are intended to get geocoded.

Example

The first parameter must always be the input file address.

java -jar BigGeocodingJava.jar [inputfile]

java -jar BigGeocodingJava.jar [inputfile] [outputfile]

The program will geocode addresses in the [inputfile], results will be written in [outputfile]. In case the [outputfile] is missing, the output file will have the format “output_[inputfile]_[date]” and it would be placed in the “output” folder near jar file. If this folder does not exist, the jar file would create it automatically.

The jar file could take other optional parameters as following with their default value.

Name Description Default Value Parameter Syntax
Max Number of rows in input file for geocoding Till the end of the file -m
Mode Specifiying the output format 2 -o
Threads Number of additional active threads 15 -t

An example of running the program with all additional parameters:

java -jar BigGeocodingJava.jar [inputfile] [outputfile] -m 2000 -t 10 -m 2

It means, first 2000 addresses in inputfilename file will be geocoded, by 10 additional threads, output will generate based on mode 2 which is only extracting geolocations.

Troubleshooting

This section will be completed along common reported errors.

1. Cannot start the Solr

  • Solr is not OS independent, make sure the right command (solr or solr.cmd) is used.
  • If you get exception that solr didn’t start in 30 seconds, when you start solr first time in your computer. Then you need to stop solr and start it again, exception should be cleared.
  • If the exception still exist, you should download Apache Solr based on your system, then replace the “example/cloud/” folder of our Solr with new version you have downloaded.

In case you are facing any other problem, please feel free to contact us, we would be more than happy to help.