Everest Tutorial

1. Introduction

The ability to effortlessly reuse and combine existing computational tools and computing resources is an important factor influencing research productivity in many scientific domains. Everest is a new distributed computing platform that addresses this problem by supporting publication, execution and composition of applications running across distributed computing resources.

In contrast to traditional software, Everest follows the Platform as a Service cloud delivery model by providing all its functionality via remote web and programming interfaces. A single instance of the platform can be accessed by many users in order to create, run and share applications with each other without the need to install additional software on users’ computers.

Any application ported to Everest can be accessed via web user interface or unified REST API. The latter enables integration with external systems and composition of applications. A Python API is implemented on top of REST API in order to support writing programs that access applications and combine them in arbitrary workflows.

Another distinct feature of Everest is the ability to run applications on arbitrary sets of external computing resources. A computing resource can be attached to the platform by any user. An application developer can bind one or multiple resources to an application. It is also possible to manually specify resources for a single application run.

The aim of this tutorial is to provide first-hand practical information on using Everest, including:

Adding applications to Everest
Running applications via web user interface
Attaching computing resources to Everest
Binding resources to applications
Using Python API to run and combine applications from Python programs
Running parameter sweep applications with Everest

2. Preparation

2.1. Software Requirements

Check that your computer meets the following requirements:

Recent version of Chrome or Firefox browser
SSH Client (in Windows use PuTTY)
Python 2.7

2.2. Everest account

Open https://everest.distcomp.org/ in a browser. Ignore SSL certificate warnings.

Log in with your username and password. If you don’t have an Everest user account, register yourself using Sign Up link.

This web interface will be referred below as Web UI.

2.3. Tutorial files

Download tutorial.zip and unpack it to some directory.

This directory will be referred below as TUTORIAL_HOME.

3. Adding applications

3.1. Introduction

Applications are the main entities in Everest - any computation is performed in the context of some application. Clients interact with applications by sending requests and receiving back results.

All Everest applications follow the same abstract model:

An application has a number of input parameters that constitute a valid request
An application has a number of output parameters that constitute a result of computation corresponding to some request

You can think of an application as a "black box" with some input and output ports or as a "function" with some arguments and return values. Just like pure functions, applications usually process each request independently from other requests in a stateless fashion.

In order to add an application to Everest you should provide an application description that consists of two parts:

Public information that is used by clients in order to discover application and interact with it, e.g., specification of input and output parameters.
Internal configuration that is used by Everest in order to process requests to the application and generate results, e.g. command template, file mappings, resource bindings, etc.

In this part of tutorial we will add to Everest a well-known program for doing molecular docking called AutoDock Vina.

3.2. Adding application

To add an application to Everest navigate to Applications, click New application button on the top right corner and select Create application. You will be presented with a form divided into several tabbed sections:

Metadata - general information about an application
Inputs - description of application input parameters
Outputs - description of application output parameters
Configuration - settings that define how to process a request to the application
Files - files that are required in order to run the application
Resources - settings that control on what resources the application can run
Access - access control settings

Normally when adding a new application to Everest you should fill out this form manually. But to save time we will take a shortcut and import our application description from a prepared JSON file. Click Cancel to close the form.

Click again New application button, but this time select Import from JSON item. Choose file TUTORIAL_HOME/vina/vina.json. After the import is completed you should see the following message:

Import completed: created application AutoDock Vina.

Select My apps checkbox on top of the applications list and check that AutoDock Vina is present in the list.

3.3. Application description

Let’s examine application description. Click on AutoDock Vina and then on Edit button on the top right corner. You will see the same form as before, but this time it is filled out. Let’s walk through the form sections.

3.3.1. Metadata

Name

A name of the application.

Annotation

A brief description of the application (limited to 200 characters).

Both name and annotation are required metadata fields.

Keywords

An optional list of keywords that can help users to find the application. Keywords are separated by commas.

Description

This field can be used to provide a detailed information about the application.

You can use Markdown markup in this field.

3.3.2. Inputs

This section contains a list of application’s input parameters or simply "inputs". Each parameter can be expanded to examine its description by clicking on it.

Everest relies on JSON Schema attributes for parameter description.

Parameter description includes the following attributes.

Title

User-friendly parameter name displayed to users.

Description

Textual description of the parameter.

Type

Type of parameter values.

The following types are supported:

string
integer
number (floating point number)
boolean
array

There is no special "file" type. Instead you can use URI strings to support passing files as parameter values (see below).

Item Type

Type of array items.

The following types are supported:

string
integer
number

Everest doesn’t support multidimensional arrays as parameters. However, you can use strings or files to pass such arrays in any format you like.

Format

Format of string values.

Everest supports the following formats:

uri - a valid URI, according to RFC3986

If you want to be able to pass a file as an input value, use URI string. In this case Everest will generate a form with file upload and will automatically insert uploaded file URI as a parameter value. During request processing Everest will also automatically pass a file referenced by the URI to the application. For example, see receptor and ligand inputs of the Autock Vina application.

Pattern

A regular expression for validation of string values. A string instance is considered valid if the regular expression matches the instance successfully.

The regular expression syntax used is from JavaScript (ECMA 262, specifically). For example, to check that a string contains a 5-digit number use expression \d{5}.

Do not use literal syntax like /\d{5}/ or escape backslashes like \\d{5}.

Regular expressions are not anchored by default, i.e. should be explicitly anchored with ^ and $ symbols. For example, to check that a string contains a 5-digit number and ONLY a 5-digit number, use expression ^\d{5}$.

Multiline

Checkbox controlling generation of the web form input for string parameter:

Checked - use textarea (multiline) input
Unchecked - use standard (single line) input

Min

Minimum valid value of numeric (integer or number) parameter.

Max

Maximum valid value of numeric parameter.

Values (enum)

List of valid parameter values for string or numeric inputs.

Each value is accompanied by a title which is used to generate a dropdown list in the web form.

Default Value

Default parameter value that should be presented to a user.

Required

Checkbox controlling whether the input is required to be present (checked) or can be omitted (unchecked) in a request.

3.3.3. Outputs

This section contains a list of application’s output parameters or simply "outputs". Each parameter can be expanded to examine its description by clicking on it.

Output description uses the following attributes:

Title
Description
Type
Format
Required

These attributes have the same meaning as the corresponding attributes in the Inputs section.

If you want to return a file as an output value, use URI string. In this case Everest will automatically generate file URI and insert it as an output value. For example, see output and log outputs of the Autodock Vina application.

3.3.4. Configuration

This section contains configuration settings that tell Everest how to process a request to the application, namely:

How to map input values from the request to an executable task (command and input files) to be run on a resource
How to generate output values from output files produced by a task

Command

A string template for mapping input values to a task command.

Input values can be referred in a command template as ${param}. That means that each occurrence of ${param} in the template is substituted by the value of input parameter param.

This field is required.

For URI inputs the parameter is substituted by the path to the downloaded file instead of the original URI value.

Command template implements a special semantics for substitution of optional inputs based on a common practice for specifying optional command-line arguments. For example, consider the following template:

run.sh -o ${param1} --option2 ${param2} --{$param3} -O{$param4} -x={$param5}

This template illustrates different ways of passing optional arguments. If any of used inputs is not set it will be removed from the resulting command along with the corresponding prefix. So if all five parameters are not set the resulting command will be simply run.sh.

Note that if you don’t use any of the ways of placing optional inputs in a template described above, then if an input is not set it will not be substituted in the template. For example, the resulting command for template run.sh ${param1} 100 with unspecified param1 will be same run.sh ${param1} 100. The rationale is that in this case we have positional argument which can be dangerous to remove. So make sure that you pass all optional inputs as optional command-line arguments.

Input Mappings

Input mappings define how input values map to task input files. Each mapping consists of a pair:

Input - name of input parameter
File - name of input file to store input value

Pattern attribute is reserved for future purposes.

For each task Everest creates a directory where all input files are placed and command is run.

By default all files referenced in URI inputs are downloaded to a task working directory under their original file names. However, if URI input is present in input mappings, the specified file name is used instead. For all other parameter types the input value is stored in a file only if the input is specified in input mappings.

It is possible to map a single input to multiple files if needed. However, mapping multiple inputs to a single file is not supported.

Output Mappings

Output mappings define how task output files map to output values. Each mapping consists of a pair:

Output - name of the output parameter
File - path to the corresponding output file relative to the task working directory

You can use Unix shell-style wildcards in file paths, e.g. results/out*.pdbqt. In case there are multiple output files matching the pattern, the first matched file will be used.

For URI outputs Everest automatically generates URI referring to the corresponding file and returns it as the output value. For all other parameter types an output file content is used as the output value.

Everest automatically stores stdout and stderr outputs of a task in files called stdout and stderr. These files can be used in output mappings.

It is possible to map a single file to multiple outputs if needed. However, mapping multiple files to a single output is not supported.

3.3.5. Files

Files section contains additional files that should be placed in a task working directory before running the application. It could be executables, scripts or common data files.

By default application files can be accessed only by the application owner. If you want to share any file with other users, set the Public checkbox for this file. In this case you can place a link to the file in the application description or elsewhere and all Everest users will be able to download the file.

3.3.6. Resources

This section contains settings that control what resources can be used to run an application:

Resources field is used to specify a predefined list of resources that should be used for running application tasks. Here you can specify any resources that you have access to.
Override Resources setting enables users to manually select arbitrary resource before running an application. For example, a user can use his own resource for running the application.

In this tutorial we will keep resources list of our application empty until Chapter 6 where we will return to these settings.

3.3.7. Access

This section contains settings that define access control policy for the application.

Allow List field is used to specify a list of users and groups that are allowed to run the application. You can refer to users by their usernames and to groups by their names prefixed by @ (e.g., @myGroup). List elements should be separated by commas.

Application owner always have access to the application, so it is not needed to specify the owner in allow list. By default allow list is empty which means that application can be run only by its owner.

You can create arbitrary user groups in Everest in order to simplify sharing of applications and resources with multiple users. See Groups section in the top navigation.

Public setting controls the visibility of the application in Applications list:

Checked - everyone can see this application (but probably not be able to run it)
Unchecked - only users and groups from allow list can see this application

If you plan to use the application for personal purposes or provide it to some private group of users, leave Public setting unchecked. However, if you want to increase awareness and promote your application among users, make the application public.

Jobs Auto-Share setting enables automatic sharing of all application jobs with specified users or groups. As described in Chapter 4, by default jobs are accessible only by users who submitted them. This setting can be used to avoid manual job sharing when you want all runs to be accessible by specific users.

3.3.8. Modifying and saving application

Our Autodock Vina application is missing one important piece of configuration - an executable file. It is not imported from the JSON file, so we need to add it manually.

Navigate to Files section. Click + button to add a file. Select file TUTORIAL_HOME/vina/vina. Check that the file is uploaded (file size should be 1.79 MB).

Click Save to save the application description.

Attaching executable files and scripts to an application has its pros and cons that are worth to be discussed.

On the one hand, this saves one from manually deploying these files on each resource since Everest automatically ships the files to resources. This means that such application can be easily used with arbitrary resources.

On the other hand, resources are heterogeneous and making portable binaries is a non-trivial task even if we consider only Linux machines. In our example we rely on Autodock Vina using a single Linux binary. However if the application you add to Everest is complex or has many dependencies, the required software should be installed on a resource before running application on it.

Attaching executable files to an application also has security issues if users are allowed to select resources to run the application. Currently Everest doesn’t implement any sandboxing of applications while running them on resources. So in this case a user should trust application owner that these executable files will not do any harm when running on a user’s resource.

3.4. Application page

You are returned to the main page of our application. You can always navigate to this page by clicking on the application name in Applications list. The application page has four tabbed sections.

3.4.1. About

Here you will find all the basic information about the application (except its parameters) including annotation, keywords, usage statistics, as well as the full text description.

The application state can be READY or OFFLINE. The first case means that the application can be run (provided you have access to it), while the second means that all application resources are offline and you cannot run it at the moment. The READY state on green background means application has online attached resources, while the READY state on yellow means that the application can be run only by providing your resources.

3.4.2. Parameters

This section provides a detailed information about input and output parameters of the application.

3.4.3. Submit Job

This section contains a form for running the application by specifying its input parameters (see the next part).

This section is inactive if the user is not allowed to run the application.

3.4.4. Discussion

This section enables users to write comments and discuss the application.

4. Running applications via Web UI

4.1. Introduction

From the user’s viewpoint running an application basically means sending it a request containing input values and waiting for a result containing corresponding output values. For each request Everest creates a new job which can be used to track the status of the request and to collect the result. So "running an application" and "submitting a job" have the same meaning.

Internally each job is composed of one or more tasks that represent units of computation sent for execution to resources. Tasks are generated from job inputs in accordance with the application configuration. Currently Everest users can create only applications with jobs consisting of a single task. However there is a special Parameter Sweep application described in Chapter 8 that runs jobs containing multiple tasks.

4.2. Running AutoDock Vina

Open AutoDock Vina application and switch to Submit Job section.

Specify your job name, e.g. "First Vina Run". By default jobs are named as the application, so custom job names help to distinguish the different runs.

Specify the following values for input parameters:

Receptor: select file TUTORIAL_HOME/vina/protein.pdbqt
Ligand: select file TUTORIAL_HOME/vina/ligand.pdbqt
Center X: 11
Center Y: 90.5
Center Z: 57.5
Size X: 22
Size Y: 24
Size Z: 28
Exhaustiveness: 1

Since we didn’t specify default resources in the application configuration, we should manually select at least one resource to run our job. In Resources input select test.

Resource test is a special resource for demo purposes available for all Everest users. It runs Ubuntu 12.04 with Python. You can use it to run any applications as long as they don’t require installation of additional libraries or packages. However, since this resource has a limited capacity, it is recommended to run applications on external resources (see Chapter 5).

You can also ask Everest to send an email when the job is completed by checking Email Notification box.

Click Submit button.

If Submit button is inactive then one or more of the following happened:

You didn’t specify values for some of required inputs (see inputs highlighted in red)
You specified incorrect values for some inputs (see inputs highlighted in red)
You didn’t specify resources and application doesn’t have default resources

After the job submission you will be redirected to the job page which has the following sections.

4.2.1. Job Info

This section contains basic information about the job including used application, job state, submission and completion times.

During its lifetime the job can be in the following states:

SUBMITTED - the job is accepted and stored for further processing
DEFERRED - the job is accepted but will be processed after the planned server restart
READY - the job is processed, job tasks are generated and ready for scheduling
SCHEDULED - the job tasks are scheduled to resources
RUNNING - the job tasks are running on resources
DONE - the job is successfully completed and its outputs are available
FAILED - the job is failed
CANCELLED - the job is cancelled

Info field is used to provide additional information about the job, e.g., error message for a failed job.

4.2.2. Inputs

This section contains input values that were specified during the job submission. It serves as a reminder, enables provenance and facilitates sharing the job with other users.

4.2.3. Outputs

This section contains output values comprising the job result. Normally it is active only after the job is completed.

4.2.4. Share

By default a job and its files can be accessed only by a user who submitted it. This section enables you to share the job with other users. Specify desired users or groups in Allow List and click Save. After that you can send a link to the job to these users and they can open it, examine inputs/outputs, download job files, or even resubmit the job while possibly changing some input.

You can see the list of all your jobs by clicking Jobs in the top navigation. You can use Filters button to filter jobs by their names, states or submission time. After the job is completed and you don’t need its results, it is recommended to delete the job using the Delete button. Deleting jobs helps both to free disk space and to clean up your Jobs list.

5. Attaching computing resources

5.1. Introduction

Everest doesn’t provide its own computing infrastructure to run application tasks, nor does it provide access to some fixed external infrastructure like grid. Instead Everest enables users to attach to it any external computing resources and to run applications on arbitrary sets of these resources. From this point of view Everest can be seen as a multitenant metascheduling service.

Currently the preferred method for attaching a resource to Everest is based on using a special program called agent. The agent runs on the resource and acts as a mediator between it and Everest. This method has one drawback - it requires deployment of the agent on the resource. However, it also brings a number of advantages in comparison to plain SSH access like supporting resources behind a firewall and more strict security policies. Also, as we will see soon, the agent has minimal requirements and is easy to install.

The agent supports integration with various types of resources via adapter mechanism. At the moment the following adapters are implemented:

local - Running tasks on a local server
torque - Running tasks on a TORQUE cluster (agent is running on a submission host)
slurm - Running tasks on a SLURM cluster (agent is running on a submission host)
docker - Running tasks on a local server inside docker containers

Agent repository contains agent’s source code, while detailed instructions on installing and using agent can be found in Agent User Manual.

In order to complete this part of tutorial you will need access to some Linux server and/or cluster. The cluster should use TORQUE or SLURM job managers. Root access is not required. If you don’t have access to such resources or enough skills to complete this part, you can skip it and keep using test resource in the subsequent parts. Another alternative is to ask your colleague to attach some resource to Everest and share it with you.

5.2. Creating new resource

Before attaching resource to Everest you should register it and obtain a resource token which is used for agent authentication.

In Web UI navigate to Resources section and click on New resource button. In Name field enter the name of your new resource. Click Save button.

You are redirected to the resource page. Note that the Token field contains an alphanumeric string. This string should be passed to the agent running on this resource.

5.3. Attaching server

Make sure the server satisfies the following requirements:

Linux
Python 2.6 or 2.7
Outbound connectivity with Everest server (HTTPS, port 443)

While the agent supports attaching Windows machines, in this tutorial you will need a Linux server since we are going to use it for running Vina application with Linux binary.

Open SSH connection to the server.

All commands in SSH session below are prefixed with $.

5.3.1. Install agent

Follow instructions in Section 2 of Agent User Manual to install the agent and its dependencies on the server.

The directory with installed agent will be referred below as AGENT_HOME.

5.3.2. Configure agent

Copy default configuration template to AGENT_HOME/conf/agent.conf:

$ cd AGENT_HOME
$ mkdir conf
$ cp everest_agent/agent.conf.default conf/agent.conf

Open AGENT_HOME/conf/agent.conf in you favorite editor. This file contains configuration parameters of the agent.

Open in Web UI the resource you created in section 5.2 and copy Token value to clipboard. Paste the token inside the agent configuration file in the place marked as AGENTTOKEN (protocol.activeMode.agentToken parameter), e.g.

"agentToken": "exmpl52z3i0ap60508jyn66tkof23njds",

You can also adjust the value of resource.maxTasks parameter which controls the maximum number of tasks processed simultaneously by the agent.

Note also the value of security.allowedCommands parameter. It contains a list of commands allowed for execution by the agent specified as regular expressions. When the agent receives a new task from Everest it checks the command specified in the task against expressions from this list. The command should match at least one expression, otherwise the task is rejected. For the purpose of this tutorial we will allow agent to run any command by using the default setting:

"allowedCommands": [".*"]

Allowing agent to run any command has important security considerations if you plan to run applications from other users. You should trust applications that you run on your resource, especially in case of applications that include executables as our Vina application. If you use the resource to run a limited set of applications that don’t include executables, you can restrict the agent by specifying in white_list only commands these applications use.

Keep the default values for all other agent settings and save the agent configuration file.

You can find the detailed description of all agent settings in Section 3 of Agent User Manual.

5.3.3. Start agent

Start the agent by running the following script in AGENT_HOME:

$ ./bin/start.sh

Tail agent log and check that the agent is connected to Everest:

$ tail -f log/agent.txt

You should see a message like this:

[everest_agent.agent] [INFO] [2016-07-01 20:19:51,596] Connected to server

In Web UI refresh resource page and check that your resource has ONLINE state.

5.3.4. Test resource

To test our new resource we will resubmit the previous AutoDock Vina job on it.

Open Jobs list, find your first AutoDock Vina job, open it and click Resubmit. This time instead of test resource select your new resource in Resources input. Click Submit button.

Make sure the job is completed successfully.

5.3.5. Stopping agent

Stop the agent by running the following script in AGENT_HOME:

$ ./bin/stop.sh

In Web UI open Resources list and check that your resource has OFFLINE state (red circle).

5.4. Attaching cluster

Repeat the same steps as for the server with the following differences.

When editing the agent configuration file change the resource.taskHandler.type parameter value to torque or slurm. Also add the resource.taskHandler.queue parameter with the name of the cluster queue you want the agent to use. For example:

"taskHandler": {
  "type": "torque",
  "queue": "batch"
},

You can also adjust the value of resource.maxTasks parameter to set the maximum number of tasks submitted by the agent to the queue.

6. Binding resources to applications

6.1. Introduction

Everest supports flexible binding of resources to applications. It is possible to configure a static set of resources that should be used by Everest to run application tasks. Application owner can also enable dynamic resource binding when user manually selects resources for running a job.

In both scenarios it is also possible to specify multiple resources and let Everest to schedule application tasks across these resources.

6.2. Configuring AutoDock Vina

In this section we will try out different options for resource binding supported by Everest.

6.2.1. Bind single resource with resource override

Let’s bind one of attached resources to Autodock Vina application.

Open Autodock Vina application, click Edit and open Resources section. In Resources input select one of resources you attached in Section 5. Click Save.

Open Submit Job tab and check that Resources input has the following comment:

The application has 1 default online resource(s).
You can also select another resource(s) below to run your job.

If the attached resource is offline (agent is stopped) you will see another comment:

The application doesn't have default online resources.
Please select at least one resource below to run your job.

If you enabled resource override and let other users to run your application, make sure users know what resources are suitable to run the application. For example, you could specify resource requirements (architecture, software, etc.) in the application description.

6.2.2. Disable resource override

Now let’s prevent application users from selecting another resources to run jobs.

Open Autodock Vina application, click Edit and open Resources section. Deselect checkbox Override Resources. Click Save.

Open Submit Job tab and check that resource selection input has disappeared.

If the attached resource is offline (agent is stopped) you will see the following warning:

Unfortunately, all application's resources are unavailable.
Since the application doesn't allow overriding resources, you won't be able to submit jobs until the resources have become available again.

6.2.3. Bind multiple resources

Now let’s bind multiple resources to the application.

Open Autodock Vina application, click Edit and open Resources section. In Resources input add another resource you attached or test resource. Click Save.

When you bind multiple resources to an application Everest will try to schedule application tasks to resources in such a way as to optimize completion times and balance resource usage among users.

7. Using Python API

7.1. Introduction

Running Everest applications via Web UI is easy and convenient, but it has some limitations. For example, if you want to run an application many times with different inputs, it is inconvenient to submit many jobs manually via web form. In other case, if you want to produce some result by using multiple applications, you will have to manually copy data between several jobs. Finally, Web UI is not suitable if you want to run Everest application from your program or some other external application.

For all these cases, from automation of repetitive tasks to application integration, Everest provides a REST API. It is the platform’s application programming interface implemented as a RESTful web service. REST API includes operations for accessing and managing applications, jobs, resources and other entities. This interface serves as a single entry point for all Everest clients, including Web UI which uses REST API under the hood.

REST API can be used to access Everest from any program that can speak HTTP protocol and parse JSON format. However it is too low level for most users, so it is nice to have ready-to-use client libraries built on top of REST API. At the moment, Everest provides one such client library for Python programming language called Python API. In this part of tutorial we will examine and run some programs written with Python API.

7.2. Installing Python API

Python API supports all operating systems. Just check that you have Python 2.7 installed (Python 3 is not supported).

Python API depends on Requests library. You can install it with pip:

pip install requests

Installing pip in Windows is simple. Just download get-pip.py, run python get-pip.py and add C:\PythonXX\Scripts to your PATH. Now you should be able to run pip from the command line.

Python API consists of a single Python file everest.py. Download this file and place it in TUTORIAL_HOME/api/ so that our example programs can find it.

7.3. Obtaining access token

Before we start running programs that access Everest it is essential to obtain a client token. Each client accessing Everest should present such token with its request in order to authenticate itself. A token can be obtained from Everest by any user. A client using the token will be authenticated as the user who requested the token.

Don’t confuse client tokens with resource tokens used for resource authentication. They serve different purposes and are managed differently.

In order to obtain a client token open terminal in TUTORIAL_HOME/api/ and run the following command (replace USERNAME with your Everest username):

python everest.py get-token -u USERNAME -l tutorial > .token

This command will prompt you for your Everest password and, if all went well, will create a file .token with the new token inside.

A client token is valid for one week (7 days) since its issue. After the token is expired it is rejected by Everest.

You can find the list all your tokens in Web UI. Click on your username in the top right corner and select Access Tokens. Valid tokens are marked with green circles. Each token has a label. This helps to distinguish tokens used by different clients. For example, tokens used by Web UI have label everest-webui. When you create a token with token_get.py script you can specify a token label via option -l. Check that the newly created token with label tutorial is present in the list.

Keep your tokens in secret and protect files with tokens from reading by other users and untrusted applications. If the token is lost or compromised you can force its expiration by deleting it in Web UI.

7.4. Running an application

Let’s start with a simple example of submitting a single job to Autodock Vina.

In Web UI open Autodock Vina application and copy application ID (long alphanumeric string).

Open TUTORIAL_HOME/api/vina_run.py in text editor.

In Windows use editor that supports Unix-style line endings (e.g., WordPad).

vina_run.py

import everest (1)

# create session by using client token
with open('.token') as f:
    token = f.read().strip()
session = everest.Session('Vina Run', 'https://everest.distcomp.org', token=token) (2)

# define Vina application
vina = everest.App('INSERT VINA APP ID', session) (3)

try:
    # prepare job inputs
    inputs = { (4)
        'receptor': open('inputs/run/protein.pdbqt', 'rb'), (5)
        'ligand': open('inputs/run/ligand.pdbqt', 'rb'),
        'center_x': 11,
        'center_y': 90.5,
        'center_z': 57.5,
        'size_x': 22,
        'size_y': 24,
        'size_z': 28,
        'exhaustiveness': 1
    }

    # submit job
    job = vina.run(inputs) (6)

    # wait until the job is completed
    result = job.result() (7)

    # store output files
    session.getFile(result['output'], 'results/run/ligand_out.pdbqt') (8)
    session.getFile(result['log'], 'results/run/log.txt')

finally:
    # always close the session on exit
    session.close() (9)

1	Our program imports `everest` module which implements Python API and is located in file `everest.py`.
2	We create a new Session object by passing a session name, Everest URI and a token from file `.token`.
3	In order to access application we create a new App object by passing application ID and the session. You should replace `INSERT VINA APP ID` in this line with application ID you copied from Web UI.
4	We prepare inputs for our job as a Python dictionary with keys and values corresponding to parameter names and values respectively.
5	Similarly to Web UI we can pass file objects as values for URI inputs. Python API will automatically upload these files to Everest during job submission.
6	We submit job by invoking `run()` method of the application object with prepared inputs. This method returns a Job object that can be used to check the job state and obtain the result. Note that `run()` method doesn’t block the program until the job is done. Instead Python API performs all job related activities in the background thus allowing the program to continue its execution.
7	In this example we don’t have any other work to do while the job is running so we just call `result()` method. This method blocks the program until the job is completed and returns the job result. The result is returned as a Python dictionary with keys and values corresponding to output parameter names and values respectively.
8	Similarly to Web UI we can download files referenced by URI outputs to local machine. For this we use `getFile()` method of the session object by passing the file URI and a local path.
9	It is a good practice to close the session in the end of the program by invoking `close()` method. This will terminate all background activities and ensure that the program exits normally.

After you replaced INSERT VINA APP ID with application ID, save the file.

Now it’s time to run our program as follows:

python vina_run.py

After the program is completed you can examine job outputs in TUTORIAL_HOME/api/results/run/.

In Web UI open Jobs list and check the corresponding job called "Vina Run - Job 0". The job name is produced by adding the job number to the session name.

7.5. Submitting multiple jobs

Let’s consider a more complex case when we want to submit many jobs to an application. For example, we want to use Autodock Vina for virtual screening of many ligands against some protein.

Open TUTORIAL_HOME/api/vina_screening.py in text editor.

vina_screening.py

import everest

# create session by using client token
with open('.token') as f:
    token = f.read().strip()
session = everest.Session('Vina Screening', 'https://everest.distcomp.org', token=token)

# define Vina application
vina = everest.App('INSERT VINA APP ID', session) (1)

try:
    # prepare tasks
    tasks = []
    for n in range(1,11): (2)
        tasks.append({
            'receptor': open('inputs/screening/protein.pdbqt', 'rb'),
            'ligand': open('inputs/screening/ligand%d.pdbqt' % n, 'rb'), (3)
            'center_x': 11,
            'center_y': 90.5,
            'center_z': 57.5,
            'size_x': 22,
            'size_y': 24,
            'size_z': 28,
            'exhaustiveness': 1
        })

    # submit all tasks
    jobs = vina.runAll(tasks) (4)

    # process results
    results = []
    n = 1
    for job in jobs: (5)
        try:
            result = job.result()
            session.getFile(result['output'], 'results/screening/ligand_out%d.pdbqt' % n)
            session.getFile(result['log'], 'results/screening/log%d.txt' % n)
            job.delete() (6)
            with open('results/screening/ligand_out%d.pdbqt' % n) as out: (7)
                lines = out.readlines()
                line = lines[1]
                result = float(line.split(':')[1].split()[0])
                results.append([result, n])
        except everest.JobException as e: (8)
            print "Task %d error: %s" % (n, e)
        n += 1
    results.sort(lambda x,y: cmp(x[0], y[0])) (9)
    for r in results:
        print "%f (task %d)" % (r[0], r[1])

finally:
    # always close the session on exit
    session.close()

1	Replace `INSERT VINA APP ID` with application ID you copied from Web UI.
2	We prepare inputs for our future jobs and store them in `tasks` array.
3	In this example all our jobs have the same inputs except for the ligand file.
4	We submit all tasks in a one shot by using the `runAll()` method. This method is similar to `run()` method but accepts array of inputs and returns an array of submitted jobs.
5	For each job we wait for its completion and then process the job result.
6	After we downloaded job results we delete the job. It is a good practice if you run lots of one-off jobs and don’t need to keep them in the jobs list.
7	We parse score value from file `ligand_out(1-10).pdbqt` produced by each job and store this value in `results` array.
8	`job.result()` may throw an exception if the job is failed or cancelled.
9	After we collected all results we sort and print them.

After you replaced INSERT VINA APP ID with application ID, save the file.

Now it’s time to run our program as follows:

python vina_screening.py

In Web UI open Jobs list and check the corresponding jobs called "Vina Screening - Job X". You can enable automatic update of job states by selecting Auto Update checkbox on the top of the list. Since we delete the jobs in our program, the jobs will disappear from the list after they are completed.

After the program is completed you can examine job outputs in TUTORIAL_HOME/api/results/screening/.

7.6. Combining multiple applications

Let’s consider another typical use case when one needs to combine multiple applications to produce a desired result. For example, suppose we want to produce a nice animation showing the results of molecular docking. We can do this in the following steps:

Perform docking with Autodock Vina as usual
Visualize docking results with PyMOL and produce a series of PNG images showing the binding site from different positions
Convert produced PNG images to a single GIF animation with ImageMagick

In this example we will perform these steps by using three different Everest applications:

Autodock Vina application we added to Everest
PyMOL application which runs PyMOL script and outputs archive with produced images
PNG to GIF application which converts PNG images to GIF animation using ImageMagick

Our program will run each of these applications one after another in a chain by passing results of one job to the next one.

Open TUTORIAL_HOME/api/vina_visualize.py in text editor.

vina_visualize.py

import everest

# create session by using client token
with open('.token') as f:
    token = f.read().strip()
session = everest.Session('Vina Visualize', 'https://everest.distcomp.org', token=token)

# define applications
vina = everest.App('INSERT VINA APP ID', session) (1)
pymol = everest.App('53ab27d0330000110d12a353', session) (2)
png2gif = everest.App('53ab3efa32000041004b11c6', session)

# define resources
test = '53ad28ca35000042009832de' (3)

try:
    # Step 1: perform docking with Vina (4)
    job_vina = vina.run({
        'receptor': open('inputs/visualize/protein.pdbqt', 'rb'),
        'ligand': open('inputs/visualize/ligand.pdbqt', 'rb'),
        'center_x': 11,
        'center_y': 90.5,
        'center_z': 57.5,
        'size_x': 22,
        'size_y': 24,
        'size_z': 28,
        'exhaustiveness': 1
    })

    # Step 2: generate PNG images of docking results with PyMOL (5)
    job_pymol = pymol.run({
        'script': open('inputs/visualize/script.pml', 'rb'),
        'files': [
            open('inputs/visualize/protein.pdb', 'rb'),
            job_vina.output('output') (6)
        ]
    }, [test]) (7)

    # Step 3: convert PNG images to a single GIF file with ImageMagick (8)
    job_png2gif = png2gif.run({
        'images': job_pymol.output('results'),
        'delay': 5
    }, [test])

    result = job_png2gif.result() (9)
    session.getFile(result['gif'], 'results/visualize/movie.gif')

finally:
    # always close the session on exit
    session.close()

1	Replace `INSERT VINA APP ID` with application ID you copied from Web UI.
2	We create a separate App object for each application we use in the program.
3	In this example we will explicitly pass resources to some applications. Resources are referred by ID that can be found on the resource page in Web UI. Here we define resource test.
4	We start by running Vina application. This time we define our inputs right inside `run()` call.
5	Next we run PyMOL application by passing it a script that produces images, the protein file and the Vina output file.
6	Note how we refer to the output value of the Vina job by using `output()` method of the job and specifying the output name. This method doesn’t block the program until the output value is available. Instead Python API will wait in the background until the Vina job is completed, read the output value and then submit the PyMOL job with this value.
7	`App.run()` method supports passing resources for running the job as an additional list argument. Here we explicitly specify that we want to use test resource for running the PyMOL job, because this application doesn’t
8	Next we run PNG-to-GIF application by passing it the archive with images produced by PyMOL and the delay value which controls delay between frames in the animation. We also explicitly specify the resource for running the job.
9	When we passed all three jobs to Python API it will schedule their submission to Everest in accordance with data dependencies we specified with `Job.output()` method. Now we will wait for the final result by calling `result()` method on the last job in the chain.

After you replaced INSERT VINA APP ID with application ID, save the file.

Now it’s time to run our program as follows:

python vina_visualize.py

In Web UI open Jobs list and check the corresponding jobs called "Vina Visualize - Job X".

After the program is completed you can examine the produced animation in TUTORIAL_HOME/api/results/visualize/movie.gif.

This example shows how to use Python API to build a simple chain of jobs. However you can use the same technique to compose Everest applications into arbitrary directed acyclic graphs (DAGs).

8. Running parameter sweep applications

8.1. Introduction

Parameter sweep experiments represent an important class of scientific applications that require a large amount of computing resources in order to run a large number of similar computations across different combinations of parameter values. Parameter sweep application consists of many independent tasks (such applications are also called bag-of-tasks applications). The number of tasks is defined by a number of parameter combinations in the experiment. As a rule, each task runs the same executable but with different arguments and input files that depend on parameter values.

Everest provides a native support for such applications via generic Parameter Sweep application. This is an Everest application that can be used to run arbitrary parameter sweep experiments described by means of the so called plan file. This is a plain text file that contains parameter definitions and other directives that together define rules for generation of parameter sweep tasks and processing of their results. Besides a plan file a user can also provide an archive with executables, scripts and input files referred in the plan file.

After a user submitted his plan file and files archive to Parameter Sweep it creates a new job that can be used to monitor the status of the experiment. Parameter Sweep parses plan file, generates required tasks, submits them for execution, processes task results and generates the job output. The output is an archive that contains the results of the individual tasks taking into account filtering rules specified in the plan file.

While it is possible to run parameter sweep experiments by submitting multiple jobs to a single application via Python API (see the previous part), the described Parameter Sweep application has a number of advantages:

It doesn’t require to add the application running a singe task to Everest
It doesn’t require to write a program to generate and submit tasks - a simple plan file is enough
It runs parameter sweep experiment as a single job with many tasks, instead of many single-task jobs, which introduces less clutter in the job list

8.2. Virtual screening with Parameter Sweep

In this section we will revisit example from the Python API tutorial where we used Autodock Vina for simplified virtual screening of many ligands against some protein. We used Python API to submit multiple jobs to Vina application and then collected and processed job results. Now let’s take a different approach and use Parameter Sweep application.

8.2.1. Plan File

The first step is to prepare plan file describing our parameter sweep experiment.

Open TUTORIAL_HOME/parametric/vina.plan in text editor.

vina.plan

parameter n from 1 to 10 step 1 (1)

input_files @run.sh vina write_score.py protein.pdbqt ligand${n}.pdbqt config.txt (2)

command ./run.sh (3)

output_files ligand${n}_out.pdbqt log.txt @score (4)

criterion min $affinity (5)

1	We use `parameter` directive to define parameter `n` that refers to ligand number and takes integer values from 1 to 10.
2	`input_files` directive is used to define input files per task. Note how `${n}` is used in the name of ligand file to refer to the value of parameter `n`. That means that task for `n=1` will use input file `ligand1.pdbqt`, the task for `n=2` will use file `ligand2.pdbqt`, and so on. In addition to parametrization of file names it is also possible to insert parameter values inside input files by prefixing file name with `@`. In this example we specify that we want to substitute all strings `${n}` inside file `run.sh` with the value of parameter `n`. This is done on a per task level, so each task will use different run script.
3	`command` directive is used to specify command for each task. The command can also be parametrized but in our case we use the same command for all tasks and parametrize the contents of the run script instead. You can examine run script in file `TUTORIAL_HOME/parametric/files/run.sh`. It runs Vina first and after runs a simple Python script to parse the best affinity value from Vina output and write this value to file `score`.
4	`output_files` directive is used to define output files that are collected per task. Note how we parametrize the name of Vina output file. Output file names need not to be unique among tasks since these files are collected in separate per-task directories. Output files prefixed with `@`, like `score` in this example, have special meaning. Such files should contain outputs that can be used to filter or rank task results. Each output should be placed in a separate line as `OUTPUT_NAME = OUTPUT_VALUE`, e.g., `affinity = -12.900000`.
5	We use `criterion` directive to specify that we are interested in a result with a minimal value of output `affinity`. Parameter Sweep will search for output values in output files prefixed with `@`, i.e. file `score` in this example.

This example doesn’t cover all directives and features of plan files. For example, instead of criterion we could use filter directive to filter results by specifying threshold on affinity values. It is also possible to define multiple parameters and specify constraints on their combinations. Please refer to Parameter Sweep documentation for more details.

8.2.2. Application files

Besides the plan file we should prepare an archive containing all files referred in input_files directive (see TUTORIAL_HOME/parametric/files/):

protein.pdbqt - file with protein structure
ligand[1-10].pdbqt - files with ligand structures
run.sh - main run script
vina - Vina executable
config.txt - Vina configuration file
write_score.py - Python script used to parse affinity score from Vina output

A ready to use zip archive with these files can be found in TUTORIAL_HOME/parametric/files.zip.

8.2.3. Running application

Now it’s time to run our parameter sweep experiment. Open Parameter Sweep application in Web UI. Switch to Submit Job tab.

Specify the following values for input parameters:

Plan File: select file TUTORIAL_HOME/parametric/vina.plan
Application Files: select file TUTORIAL_HOME/parametric/files.zip

In Resources input select resources you attached to Everest and/or test resource.

Click Submit button.

You will be redirected to the job page where you can monitor the status of the job as usual. In addition Parameter Sweep publishes statistics about current task states in the Info field.

After the job is completed open Outputs tab and download file results.zip. Unpack the file and examine its contents. It should contain one or more directories corresponding to tasks with minimal affinity value. Each task directory contains output files and a special file Parameters containing parameter values used in this task.