NCL: Task parallelism

NCL V6.5.0 introduces new functions that enable NCL scripts to execute multiple independent tasks that can run in parallel with each other and concurrently with the NCL script itself. The functions are described below and illustrated in Example 3. It should be noted however that there have been other means by which task-parallelism can be performed, which are available to users of older versions of NCL. The first two examples illustrate how this can be accomplished using python's subprocess module to invoke multiple NCL scripts.

For discussion here, a task is any operation that can be invoked from a command-line.

The new capabilities are:

    function subprocess(command[1]:string) id[1]:integer

where command is a string containing a shell command to execute. subprocess invokes the shell command and returns control back to NCL immediately without waiting for the command to complete. It returns a numeric id that is associated with the task.

Often times in task-parallel operations, one needs to know if the subtasks have completed, for example to make use of their results in further processing:

    function subprocess_wait(id[1]:integer, is_blocking[1]:logical) id[1]:integer

This function will either wait on a subtask to finish, or determine without waiting whether a subtask has finished. The status of a particular subtask can be determined by providing the associated id that was returned by subprocess. Alternatively, providing a negative value for id will test the status of any subtask.

If the is_blocking argument is True, the function waits for the/any subtask to finish. If False, the function returns immediately with either the id of a subtask that has finished, or a value of -1 if the/any subtask has not completed.

task_parallelism_driver_1.ncl / task_parallelism_1.py: In this example a ncl script passes a list of ncl scripts to a python script that are to be run at once and managed. The python script loads the commonly available subprocess, sys, time and os modules. Two options are set at the top of the python script. The MAX_CONCURRENT variable is used to set the number of scripts that can run at once. The POLL_INTERVAL variable is used to set the amount of time between when python checks what scripts are actively running.

task_parallelism_driver_2.ncl / task_parallelism_2.py: This example builds upon example one, adding options that are passed from the NCL driver script to external scripts via environmental variables. Environmental variables passed to the python script include the number of tasks to be run at once, along with the amount of time to wait between checks on the number of scripts running. Other environmental variables are set for use within the called NCL scripts. Note that to set environmental variables from within a NCL script export has to be called. getenv can be used to retrieve a shell environmental variable from within a NCL script.

task_parallelism_driver_3.ncl / task_parallelism_3.ncl: This example illustrates the use of NCL's subprocess functions to generate a series of plots as frames for an animation. Similarly to the above examples, it consists of a worker script that generates a plot for a single timestep, and a driver script that launches and coordinates multiple instances of the worker script for each of the timesteps.

In this particular example, there were several hundred timesteps of the variable. Executing several hundred instances of the worker script concurrently would overwhelm the resources of most computers. Thus the driver script employs a gating mechanism, whereby it launches some manageable number of simultaneous instances of the worker script, and as those instances finish, it launches further tasks. It waits for all instances to complete and then uses Imagemagick to compile the collection of plot frames into an animation.

The number of simultaneous subtasks a machine can support varies widely based upon a number of considerations. Obviously the machine's available resources will be a factor, as well as whether the tasks are compute-bound versus I/O-bound, their memory requirements, etc. For this particular example, and as a rule-of-thumb starting point, the number of simultaneous subtasks was set to be equal to the number of processor cores on the machine.

In informal timing trials, the task-parallel scripts completed the job in less than half the time of a sequential version.