G2S: The GeoStatistical Server

A free and flexible multiple point (geo)statistics framework including state-of-the-art algorithms:
QuickSampling and Narrow Distribution Selection

View project on GitHub Download.zip file Download.tar.gz file

G2S is maintained by the GAIAlab from University of Lausanne.

Final version of QuickSampling available here.

Brief overview

The GeoStatistical Server (G2S) is a framework that allows you to use state-of-the-art Multiple Point Statistics (MPS) algorithms to run stochastic simulations.

G2S is designed to run simulations in a generic way, independently of the code used or language it is written in. For example, it enables to run a C/C++ simulation code using Python, or Python using MATLAB (or any other combination).

Currently, the framework is provided with:

  • QuickSampling (QS) (aka. Quantile Sampling) is a general-purpose pixel-based MPS algorithm, that is designed to be robust, efficient and run in constant time. QS was designed to adapt to your problem, it can be used to do (un)conditional simulation, gap filling or even downscaling, using continuous or categorial variable or a combination of both. The code was developed without restrictions regarding the dimentionality of the data (e.g. 1D, 2D, 3D, nD).
  • A journal article published in Geoscientific Model Development describes the method here.
  • Narrow Distribution Selection (NDS) is an algorithm specifically targeted to simulate spectrally enhanced remote sensed imagery. It requires an external variable (for example, a grayscale image) to control the simulation (of colors)
  • ➔ A paper describing it is available here: Analogue-based colorization of remote sensing images using textural information.

The framework can be easily extended to handle most codes that use gridded data. Currently any compiled code or python code can be handled.

For a hands-on introduction to MPS (PPT slides, Colab Notebook & recorded tutorial), follow this link.

Installation

Because life is short, we designed tools to make it simple. 😉
Except if you plan to run simulations on a remote computer, you need to install both the server and the interfaces on the same machine. Please choose the options below that match your needs.
If you ever need to update G2S, simply pull the changes from the Github repository and recompile (last step of the manual installation).

Installation of the server

Installation of the server on Ubuntu

Automatic with GCC on Ubuntu

  1. First clone the code from this GitHub:
    git clone https://github.com/GAIA-UNIL/G2S.git
  2. Then run build/c++-build/install_needs_W_VM.sh

Manual (including Intel C++ Compiler)

  1. First clone the code from this GitHub:
    git clone https://github.com/GAIA-UNIL/G2S.git
  2. Basics for compiling are needed (e.g. on Ubuntu: build-essential).
  3. The following packages are required: ZMQ, JsonnCpp and zlib for G2S; fftw3 for QS and NDS.
  4. To install them on Ubuntu:sudo apt install build-essential libzmq3-dev libjsoncpp-dev zlib1g-dev libfftw3-dev libcurl4-openssl-dev -y(libcurl4-openssl-dev is optional)
  5. The C++ extension for ZMQ is required too, and can be installed via: wget "https://raw.githubusercontent.com/zeromq/cppzmq/master/zmq.hpp" -O ./include/zmq.hpp
  6. Go to the build subfolder.
  7. Run: make -j, if the Intel C++ compiler is installed, the adapted version will be compiled too. The Intel compiler can be downloaded freely in many cases: here.
    To manually select between GCC or Intel compiler use make c++ -j or make intel -j, respectively.

Installation of the server on macOS

using Homebrew (now recommended)

  1. First clone the code from this GitHub:
    git clone https://github.com/GAIA-UNIL/G2S.git
  2. Install the package manager Homebrew (if not already done)
  3. The following packages are required: ZMQ, JsonnCpp and zlib for G2S. fftw3 for QS and NDS.
  4. To install them with Homebrew:brew install fftw zmq jsoncpp cppzmq curl (curl is optional)
  5. Go to the build subfolder.
  6. Run: make -j, if the Intel C++ compiler is installed the adapted version will be compiled too. The Intel compiler can be downloaded freely in many cases: here. Obviously, Intel compiler is only for Intel CPUs ๐Ÿ˜
    To manually select between GCC or Intel compiler use make c++ -j or make intel -j, respectively.

using MacPort (now deprecated)

  1. First clone the code from this GitHub:
    git clone https://github.com/GAIA-UNIL/G2S.git
  2. Install the package manager MacPort (if not already done)
  3. The following packages are required: ZMQ, JsonnCpp and zlib for G2S. fftw3 for QS and NDS.
  4. To install them with macPort:sudo port install zmq-devel jsoncpp-devel zlib cppzmq-devel fftw-3 fftw-3-single curl (curl is optional)
  5. Go to the build subfolder.
  6. Run: make -j, if the Intel C++ compiler is installed the adapted version will be compiled too. The Intel compiler can be downloaded freely in many cases: here.
    To manually select between GCC or Intel compiler use make c++ -j or make intel -j, respectively.

Installation of the server on Windows 10

  1. Check that the latest updates of Windows are installed
  2. Install WSL following these instructions, and select a Linux distribution (we recommend choosing Ubuntu for beginners).
  3. Clone the code from this GitHub repository: https://github.com/GAIA-UNIL/G2S.git (or download and unzip the zip-file on the top right of this page).
  4. In Windows, go to the directory build/c++-build
  5. and run/double-click install.bat.

Installation of interfaces

Installation of the Python interface on Ubuntu

Automatic installation

Simply use pip install G2S

Manual compilation

  1. If needed Python and Numpy: sudo apt install python3-distutils python3-dev python3-numpy -y
  2. (A C++ compiler with c++17 is required)
  3. Go to build/python-build
  4. Run python3 setup.py install --user

Check proper interface installation

Simply run
from g2s import g2s; g2s('--version')

Installation of the MATLAB interface on Ubuntu

  1. (A C++ compiler with c++17 is required)
  2. Open MATLAB
  3. Go to build/matlab-build
  4. Run CompileG2S
  5. Add compiled file in the MATLAB path

Check proper interface installation

Simply run g2s('--version')

Installation of the R interface on Ubuntu

Installation of the Python interface on macOS

Automatic installation

Use simply pip install G2S

Manual compilation

  1. If needed, Python and Numpy:sudo port install python37 py37-numpy
  2. Go to build/python-build
  3. Run python3 setup.py install --user

Check proper interface installation

Simply run
from g2s import g2s; g2s('--version')

Installation of the MATLAB interface on macOS

  1. Open MATLAB
  2. Go to build/build-matlab
  3. Run CompileG2S
  4. Add compiled file in the MATLAB path

Check proper interface installation

Simply run g2s('--version')

Installation of the R interface on macOS

Installation of the Python interface on Windows 10

Automatic installation

Use simply pip install G2S

Manual compilation

  1. (A C++ compiler with c++17 is required)
  2. If needed, install python with the option to add it to the Path
  3. Go to build/python-build
  4. Run python3 setup.py install

Check proper interface installation

Simply run
from g2s import g2s; g2s('--version')

Installation of the MATLAB interface on Windows 10

Download precompiled interfaces

  1. Download here.
  2. Unzip and add the folder to MATLAB path.

Manual compilation

  1. If needed, install python with the option to add it to the Path
  2. Open MATLAB
  3. Install a compiler with c++17, available here (2017 or later)
  4. Go to build/build-matlab
  5. Run CompileG2S
  6. Add compiled file in the MATLAB path

Check proper interface installation

Simply run g2s('--version')

Installation of the R interface on Windows 10

Run the server

โš ๏ธThe server generates logs- and data-files in subfolders build/build-*/logs and build/build-*/data that are originally saved for debug purpose, and are currently automatically removed only at the launch of the server or after one day. This parameterization can be changed with -kod and -age.

Run the server on Ubuntu

Run ./server in build/c++-build or build/intel-build, respectively for the standard or Intel version. Running the server from a different directory (e.g. ./c++-build/server) is currently not supported.

Run the server on macOS

Run ./server in build/c++-build or build/intel-build, respectively for the standard or Intel version. Running the server from a different directory (e.g. ./c++-build/server) is currently not supported.

Flag Description
-d Run as daemon
-To n Shutdown the server if there is no activity after n seconds, 0 for ∞ (default : 0)
-p Select a specific port for the server (default: 8128)
-kod Keep old data, if the flag is not present all files from previous runs are erased at launch
-age Set the time in second after files need to be removed (default: 86400 s = 1d)
Advanced parameters
-kcwd Keep the current working directory, overwrite default behavior that moves working directory to the server directory.
-mT Single job at a time (excludes the use of -after, only use if you're sure to be the only one accessing the server)
-fM Run as a function, without fork
-maxCJ n Limit the maximum number of jobs running in parallel.

Run the server on Windows 10

It is possible to run the server with runServer.bat or runDaemon.bat as daemon available in build/c++-build

Using the interfaces

A call to g2s is needed to launch any computation. Each call to g2s is composed of parameters of G2S and of the name of the algorithm used.
Flags do NOT have a specific order. You can either write '-flag',value or flag=value.

Use g2s in Python

from g2s import g2s
data=g2s(...) # it returns an tuple that contains all the output maps and the computing duration

Use g2s in MATLAB

data=g2s(...)		% principal output, the simulation
[data, t]=g2s(...) % the simulation and the computation time
[data, ...,t]=g2s(...) % the simulation, other outputs map and the computation time

Use g2s in R

<
Flag Description
--version Return the version and compilation date of the interface
-a The simulation algorithm to be used, it can be 'qs', 'nds', 'ds-l' (DS-like, not maintained)
-sa Server address (default: localhost (the server is local), otherwise provide IP address)
Nice when we have a powerful machine dedicated for computation
-p Port where to look for the server (default: 8128). Should be passed as an integer.
-silent Don't display the progression, useful for scripts
-serverStatus Inform if the server is working properly
<1 → error, such as comunication, server shutdown,...)
=0 → undefined
>1 → server is operational
1: standard server working normaly
-noTO Deactivate TimeOut on ZMQ communication, useful for slow network (e.g., through internet)
-TO Specify a custom TimeOut on ZMQ communication( uint32 in millisecond )
-shutdown Shutdown the server, useful at the end of scripts

The following options represent the Asynchronous mode, which allows you to submit multiple jobs simultaneously and retrieve the results of each of them later on (as opposed to synchronous communication with the server, where you need to wait until a job is finished before you can submit a new one). You launch the async mode by simply adding the -submitOnly flag to your g2s call. This will give only the job ID as an output, so the g2s call becomes jobid = g2s(flag1,value1, flag2,value2, ...). Don't forget to always include the server address if it's not local! See the example section for a demonstration in MATLAB and Python.

Flag Description
-submitOnly Submit a job
-statusOnly Check progression
Usage: status = g2s('-statusOnly',jobid)
-waitAndDownload Download the result
Usage: sim,_ = g2s('-waitAndDownload',jobid)
-kill Kill a given task
Usage: g2s('-kill',jobid)
-after Execute the job after another one is finished (e.g. '-after',previousJobId )

Simulation algorithms

G2S is provided with standard state-of-the-art algorithms:

  • QS (aka. QuickSampling)
  • NDS (aka. Narrow Distribution Selection)

Parameters for QS

Usage: [sim,index,time,finalprogress,jobid] = g2s(flag1,value1, flag2,value2, ...)
Outputs: sim=simulation, index=index of the simulated values in the flattened TI, time=simulation time, finalprogress=final progression of the simulation (normally 100), jobid=job ID.

Flag Description Mandatory
-ti Training images (one or more images). If multivariate, the last dimension should be the same size as the number of variables, and should also match the size of the array given for the parameter dt.
NaN values in the training image will be ignored.
Unlike other MPS-algorithms, if there are multiple variables they will not be automatically normalized to be in the same range.
-di Destination image (one image, aka simulation grid). The size of di will be the size of the simulation grid. di can be identical as ti for gap-filing.
NaN values will be simulated. Non-NaN values will be considered as conditioning data.
-dt Data type. 0 → continuous and 1 → categorical

This is where the number of variables is defined.

Provide a list containing zeros or ones representing the data type of each variable.
-k Number of best candidates to consider ∈[1 ∞]
-n N closest neighbors to consider. If multiple variables:
- use a single N value if identical for all variables
- Use an array of N values if each variable has a different N
-ki Image of the weighting kernel. Can be used to normalize the variables. If multiple variables:
- use a single kernel value if identical for all variables
- If each variable has a different kernel, stack all kernels along the last dimension. The number of kernels should then match the size of the array given for the parameter dt
-f f=1/k equivalent to f of DS with a threshold to 0
-j To run in parallel (default is single core). To use as follows '-j', N1, N2, N3 (all three are optional but N3 needs N2, which in turn needs N1).
Use integer values to specify a number of threads (or logical cores). Use decimal values ∈]0,1[ to indicate fraction of the maximum number of available logical cores (e.g., 0.5=50% of all available logical cores).
- N1 threads used to parallelize the path (path-level) Default: the maximum number of threads available.
- N2 threads used to parallelize over training images (if many TIs are available, each is scanned on a different core). Default: 1
- N3 threads used to parallelize FFTs (node-level). Default: 1
- N1 and N2 are recommended over N3. N1 is usually more efficient than N2, but requires more memory.
-sp Simulation path, array of the size of the simulation grid containing values that specify the simulation path (from low to high). Default is a random path. Equal values are accepted but will always be ordered in the same way (i.e. not random). -∞ values are not simulated. In case of multiple variables, a vector simulation is default (same path for all variables) and the simulation path should be one dimension less than the number of variables. If you prefer a full simulation, provide an array containing the path for each variable and use the "-fs" flag below.
-s Random seed value
-W_GPU Use integrated GPU if available
-W_CUDA Use Nvidia Cuda compatible GPU: specify the device id. (e.g., '-W_CUDA',0,1 to use the first two GPUs)
Advanced parameters
-ii Array that specifies for each pixel which training image to sample from. Default: all training images are searched for the best match.
-far Fast and risky 😄, like -ii but with a random input (experimental)
-cti With this flag QS will treat the training image(s) as periodic (aka circular or cyclic) over each dimension.
-csim With this flag QS will make sure to create a periodic (aka circular or cyclic) simulation over each dimension.
-adsim Augmented dimentionality simulation: allows for 3D simulation using 2D training image, only for categories (Coming soon maybe some day!)
-fs Full simulation: follows a different simulation path for each variable (as opposed to vector simulation, where the same simulation path is used for all variables).
-nV No Verbatim, i.e. prohibits neighbors in the training image to be neighbors in the simulation. (experimental)

Parameters for NDS

Flag Description Mandatory
-ti Training images (one or more images). If multivariate, the last dimension should be the same size as the number of variables, and should also match the size of the array given for the parameter dt.
NaN values in the training image will be ignored.
Unlike other MPS-algorithms, if there are multiple variables they will not be automatically normalized to be in the same range.
-di Destination image (one image, aka simulation grid). The size of di will be the size of the simulation grid. di can be identical as ti for gap-filing.
NaN values will be simulated. Non-NaN values will be considered as conditioning data.
-dt Data type. 0 → continuous and 1 → categorical

This is where the number of variables is defined.

If multiple variables:
- use a single N value if identical for all variables
- Use an array of N values if each variable has a different N.
-k Number of best candidates to consider to compute the narrowness ∈[5 ∞]
-n N closest neighbors to consider
-ki Image of the weighting kernel. Can be used to normalize the variables. If multiple variables:
- use a single kernel value if identical for all variables
- If each variable has a different kernel, stack all kernels along the last dimension. The number of kernels should then match the size of the array given for the parameter dt
-nw Narrowness range 0→ max-min, 1 → median, default IQR → 0.5
-nwv Number of variables to consider in the narrowness, (start from the end), default: 1
-cs Chunk size, the number of pixels to simulate at the same time, at each iteration, default: 1
-uds Area to update around each simulated pixel, the M closest pixel default: 10
-mp Partial simulation, 0 → empty, 1 → 100%
-s Seed value
-j To run in parallel (default is single core). To use as follows '-j', N1, N2, N3 (all three are optional but N3 needs N2, which in turn needs N1).
Use integer values to specify a number of threads (or logical cores). Use decimal values ∈]0,1[ to indicate fraction of the maximum number of available logical cores (e.g., 0.5=50% of all available logical cores).
- N1 threads used to parallelize the path (path-level) Default: the maximum number of threads available.
- N2 threads used to parallelize over training images (if many TIs are available, each is scanned on a different core). Default: 1
- N3 threads used to parallelize FFTs (node-level). Default: 1
- N1 and N2 are recommended over N3. N1 is usually more efficient than N2, but requires more memory.
-W_GPU Use integrated GPU if available
-nV No Verbatim (experimental)

Examples

Some copy-paste examples are provided here for demonstration purposes. Feel free to play around with the parameters and with different training images (we collected some here), but keep in mind that these are just quick examples. For proper implementations please check the literature.
Additional examples are available on the files contained in the Github repository. Even more examples, with detailed explanation, can be found on the Colab notebook accompanying our hands-on introduction to MPS.

Example of using QS

Unconditional simulation using Python

Unconditional simulation using MATLAB

Conditional simulation using Python

Conditional simulation using MATLAB

Simulation with multiple Training Images using python

Simulation with multiple Training Images using Matlab

Multivariate simulation using Python

Multivariate simulation using Matlab

Gap filling using Python

Gap filling using MATLAB

Downscaling using Python

Downscaling using MATLAB

3D simulation using Python

3D simulation using MATLAB


Example of using NDS

Spectral enhancement: RGB⇒IR using Python

Spectral enhancement: RGB⇒IR using MATLAB

Spectral enhancement: Pan⇒RGB using Python

Spectral enhancement: Pan⇒RGB using MATLAB

Spectral enhancement: Pan⇒IR using Python

Spectral enhancement: Pan⇒IR using MATLAB


Example of using asynchronous request to G2S

Asynchronous call using Python

Asynchronous call using MATLAB

Benchmarking

This code CAN be used for benchmarking (and I invite you to do so 😉), the code needs to run natively on macOS or on Linux using the Intel Compiler with MKL library. The version needs to be reported, and the time needs to be the time reported by the algorithm (that is the true execution time without taking into account interfaces overhead).

When benchmarking, the code should NOT be used inside a Virtual Machine or through WSL on Windows 10+.

How to cite ?

For QuickSampling

Gravey, M., & Mariethoz, G. (2020). QuickSampling v1.0: a robust and simplified pixel-based multiple-point simulation approach. Geoscientific Model Development, 13(6), 2611โ€“2630. https://doi.org/10.5194/gmd-13-2611-2020

For Narrow Distribution Selection

Gravey, M., Rasera, L. G., & Mariethoz, G. (2019). Analogue-based colorization of remote sensing images using textural information. ISPRS Journal of Photogrammetry and Remote Sensing, 147, 242โ€“254. https://doi.org/10.1016/j.isprsjprs.2018.11.003