Parameter Sweep User Manual

1. Introduction

Parameter sweep applications are a very important class of applications, which are typically defined as a set of computational experiments over a set of input parameters, each of which is executed with its own parameter combination. These applications are becoming extremely important in science and engineering. As an example, one can explore the behavior of the airfoil by running its model multiple times, depending on its properties, such as speed, angle attack, shape and so on. Parameter sweep applications address this kind of computations.

Parameter sweep applications involve some input set of computational parameters and files. Each parameter has its range of values, such as different angle attack values in the above example. Multiple computations, or tasks, are then run for different combinations of each parameter values. Each file may be the input of two or more tasks. Each task is supposed to have some output, typically in the form of the model’s output parameters, describing the computed characteristics of this model, depending on the input parameters. The resulting set of all task outputs represents the result of the whole parameter sweep experiment.

The presented parameter sweep service has been influenced by the NimrodG system. This system has a so-called plan file, describing the whole experiment, including parameters, input and output files and the command to be executed for each task. These tasks are generated for each combination of the parameters using the cartesian product. Our service takes the cartesian product as well, but allows users to impose some restrictions on each parameters combination in the form of the constraints directive (see Plan File Syntax for more details). It also allows to filter the output results and to introduce the computation’s criterion to find out the ‘best’ (in terms of criterion) values of output parameters.

In order to initiate the parametric computation, the user needs to submit two files. The first one is the plan file, while the second one is the archive with the experiment’s input files (currently supported are tar.gz and zip formats). If the computation succeeds, the result the user can download from the server is the archive with all of the results, satisfying both the criterion and the filter, if any. These results represent folders, containing the current task’s output files as well as a parameter file, filled with the corresponding parameters values.

2. Plan File Syntax

Plan File is a simple text file, describing the computational experiment. It has the following directives: parameter, constraint, input, substitute, command, output, filter and criterion. The constraints, substitutes, filters and criterion directives are optional. Each directive, except for command, could span multiple lines. This includes either the line break or the new line with the same directive. Please, note, that the directives order is significant and should be as described below.

Plan File uses the standard $var or ${var} substitution syntax. In our service, the curly brackets { and } are optional with only one exception. If one of the substitute variables is the prefix of another, the longest one must be enclosed into curly brackets. As an example, if we have two variables, var and var1, and don’t enclose the second one (i.e. leave it as $var1), only the $var part will be substituted, which is obviously not what we expect.

2.1. Parameter Directive

This directive describes our experiment’s parameters. It has the following syntax:

parameter {name} {range}

The {name} is parameter’s name and {range} represents the range of its values. The {range} part has the following syntax:

from {value} to {value} step {value}

or

{list of values, separated by at least one whitespace}

The first syntax is for integers and floating point numbers only. Their values are simply integers and floating point numbers respectively. Otherwise, all of the values, separated by whitespace, should be listed. If the value itself contains at least one whitespace, it should be enclosed into quotation marks.

Here are a few examples of this directive

parameter i from 1 to 10 step 3

parameter d 1.23 5 -123.32 0.9

parameter f file1 file2 “my file 3”

2.2. Constraint Directive

This is an optional directive. Its goal is to impose restrictions on experiment’s parameters. Its syntax is as follows:

constraint {type} {constraint expressions, separated by a comma}

The constraint expressions are math expressions. They support the following operators:

+ addition
- subtraction
* multiplication
/ division
^ exponentiation
% remainder operator

< less than
<= less than or equal to
> greater than
>= greater than or equal to
= equal to
!= not equal to

and - conjunction
or - inclusive or
not or ! - negation

These expressions can also contain parenthesis, all standard functions like sin, cos, log, abs, sqrt etc. and the names of parameters to be substituted. The constraint’s {type} specifies, whether the parameters’ values or these values’ indices are substituted into the constraints’ expressions. It should be value or index respectively.

Here are a few examples of this directive:

constraint value sin($i) <= 0.5, i - sqrt($d) > 0.01, ${i} + ${d} <= 2.34

constraint index (${f} = ${t}) && ($i <= 2)

2.3. Input Files Directive

This directive lists each task’s input files. Here is its syntax:

input_files {file names or paths, separated by at least one whitespace}

As with the parameter directive, these file names or paths must be enclosed into quotation marks if they contain whitespaces. Please, note, that they could also be parameterized. File paths are directories inside the parameter sweep’s input archive. They fully support the glob syntax.

Some file names could be prefixed with the @ sign. Such files are called substitution files and should contain parameters’ names (specified in the parameter directive) with the $ or ${} substitution syntax. This substitution is typically useful if the model of our experiment is the script, written in one of the programming languages. Instead of messing with args[n] notation, we could simply use the $var or ${var} syntax, and this directive will perform the substitution. Let us give a simple example of this kind of file. Let us assume it’s a script file, written in Scala. Here’s what some part of this file may look like:

val v1 = $i
val v2 = $d

val result = someFunction(v1, v2)

Here i and d are parameters from the example of the parameter directive. Our substitute directive will look through this file and look for the $var or $var syntax. When if finds the part from the example above, it will substitute i and d with current task’s values of parameters i and d. For example, if some task has the values i = 7 and d = -123.32, this part of the script will be transformed into

val v1 = 7
val v2 = -123.32

val result = someFunction(v1, v2)

The parser has no idea, where this parameter could be used. If it is an Int or a Boolean, it needs to be substituted as is. However, if it is a String, it needs to be substituted with quotation marks. And since no one but the client himself knows, how his script works, it’s up to him to take types into consideration. For example, the piece of Scala code above works if p is an Int or a Boolean, but if it’s a String, this code should be modified like val v = “$p”

Here is an example of this directive.

input_files @someFile @/myDir/$f /myOtherDir/*

Please, pay attention to the parameterized $f part of the second input. It substitutes the parameter f from the example of the parameter directive. The last path also emphasizes the use of the glob syntax.

2.4. Command Directive

This is the command to be executed on the computational nodes. Its syntax is the following:

command {command}

There is only one command line, which should contain only one command!

This directive can also be parameterized. Here’s the appropriate example:

command /bin/sh MyShellScript $i $d $f

2.5. Output Files Directive

These are the task’s output files. As with previous dirctive, they could be parameterized. Here is the syntax of this directive:

output_files {file names, separated by at least one whitespace}

The files outputFile1, outputFile2 and output file 3 must be the output of the command in the previous directive, which is usually some script. In the example of the command directive, thes files must be generated by the script MySchellScript inside the task’s directory.

Some of the task’s output files may contain task’s outputs, which are the task’s results. These outputs should be listed in the file using the following lines:

outputParameter1 = outputParameter1Value
outputParameter2 = outputParameter2Value
outputParameter3 = outputParameter3Value

These files are not limited to those lines only. It is also possible to leave notes at the end of these lines or on a new line as long as the aforementioned output syntax starts some new line. The parser will look through each line, collect the output if the line starts with the appropriate syntax and discard it otherwise.

Here’s an example of the ‘outputFile1’ file:

x = 1 // some comment
y = 3.45
another comment
z = 10e12 // comment as well

In this example, parameters x, y and z are the task’s outputs.

Here’s an example of this directive:

output_files outputFile1 outputFile2 @“output file 3” @$f

The names of the outputs must be unique! Different output files must list different outputs!

2.6. Filter Directive

This directive is optional. It processes the task’s outputs, described in the previous section. Only the tasks, satisfying all of the filter expressions, will be left and processed by the criterion directive if any. Here is this directive’s syntax:

filter {filter expressions, separated by a comma}

Filter expression is essentially the same as constraint expression, with only one difference: instead of parameter names it has the outputs to be substituted. Since the names of the outputs must be unique, this directive will simply look through all of the @ output files until it finds the corresponding output.

Here is the example of this directive:

filter $x - sin($y) >= 10.56, $z <= 10000, sqrt($x) >= 10

Here x, y and z have been taken from the outputFile1, listed inside the previous directive.

2.7. Criterion Directive

This directive is optional as well. It is only applied to the tasks, satisfying the filter directive. The result of this directive is all of the tasks, maximizing or minimizing (depending of the type part) the criterion function. Here is its syntax:

criterion {type} {criterion function}

The {type} part must be either max or min (case-sensitive). Criterion function is a math expression over the task’s outputs.

Here’s an example of this directive:

criterion max $x^2 – sqrt($y) + sin(2*$x)%5

3. Example

To put things together, here’s one example of the plan file for the Autodock Vina computation. Autodock Vina is a well-known program of molecular docking. In our Parameter Sweep computation we will run 10 docking tasks with different ligands and find tasks with minimum affinity (energy). Here is the appropriate plan file:

parameter n from 1 to 10 step 1

input_files @run.sh  vina  write_score.py  protein.pdbqt  ligand${n}.pdbqt  config.txt

command ./run.sh

output_files ligand${n}_out.pdbqt  log.txt  @score

criterion min $affinity