Introduction

VirtualFlow is a versatile, parallel workflow platform for carrying out virtual screening related tasks on Linux-based computer clusters of any type and size which are managed by a batchsystem (such as SLURM). 

Currently, there exist two versions of VirtualFlow, which are tailored to different types of tasks:

They use the same core technology regarding the workflow management and parallelization, and they can be used individually or in concert with each other. Additional versions are expected to arrive in the future. 

Pre-built ready-to-dock ligand libraries for VFVS are available for free (in the download section). 

How to cite: If you are using VirtualFlow, please cite the following paper in relevant publications: 

  • Christoph Gorgulla, Andras Boeszoermenyi, Zi-Fu Wang, Patrick D. Fischer, Paul W. Coote, Krishna M. Padmanabha Das, Yehor S. Malets, Dmytro S. Radchenko, Yurii S. Moroz, David A. Scott, Konstantin Fackeldey, Moritz Hoffmann, Iryna Iavniuk, Gerhard Wagner, Haribabu Arthanari An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663‚Äď668 (2020). https://doi.org/10.1038/s41586-020-2117-z

Commercial service: A commercial service for ultra-large virtual screenings based on VirtualFlow is available via the company Virtual Discovery, Inc.

Application to COVID-19: https://vf4covid19.hms.harvard.edu/

General Features

VirtualFlow can be extremely fast due to its perfect scaling behavior (even when using very large number of CPUs). There are virtually no bounds regarding the number of processors which can be utilized by VirtualFlow. In addition, it supports some of the fastest docking programs available such as QuickVina 2.
VirtualFlow is relatively robust regarding unexpected errors and interruptions, which often occur on computer clusters. VirtualFlow can respond to signals sent to it by the (batch or operating) system in the case of cluster problems, but even after termination without warning it can simply be resumed.
Cloud Ready
Due to it's architecture, VirtualFlow is able to run on cloud computing platforms such as the Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsofts Azure. It works both with on-demand as well as the much cheaper preemptible virtual machines.
VirtualFlow runs out of the box on almost any GNU/Linux cluster which is managed by a batch system. The workflow tool can use any conceivable hardware configuration regarding the number of cores/CPUs, the number of sockets, the number of nodes, etc. The software runs on any Linux distribution.
VirtualFlow currently supports the following batch systems out of the box: SLURM, Moab/TORQUE, Open PBS, LSF, and SGE. The provided job templates can be adjusted if required. In addition, VirtualFlow can be easily extended by users to other job schedulers by creating additional job-templates.
The workflow can be monitored in real time in different ways during the execution. This allows to track the progress of the workflow, as well as to examine possible problems during or after the runtime.
Virtual Flow can run fully automatically for any duration of time until it has processed all ligands specified in the input files. It achieves this by autonomously ending and submitting new jobs into the batch system as needed.
The workflow can be controlled and modified during the runtime. The workflow can, for instance, be paused, resumed, and the utilized hardware resources and job configurations changed. The workflow can be transferred to other clusters and resumed there if desired.
The input and output ligand databases consist of hierarchical multilevel tar-archives which are compressed. This compact format allows to easily handle vast amounts of ligands in an efficient and scalable manner.
VirtualFlow is free/libre software and licensed under the GNU GPL v3, which guarantees users certain rights (of freedom). Moreover, VirtualFlow is available at no costs. (Free/Libre software and Open Source are two distinct concepts.)
VirtualFlow is developed in an open collaborative development model, warmly inviting anyone to join. This allows VirtualFlow to grow more quickly and healthily in the way the community desires.

VFVS - VirtualFlow for Virtual Screening

VFVS is dedicated to carrying out virtual screening procedures on computer clusters.

It can be applied in a number of different settings:

  • Hit identification by screening¬†ligand libraries of any size
  • Hit optimization by screening custom hit-based analog libraries
  • Thorough and extensive dockings of one or more molecules

VFVS can be used seamlessly used with VFLP when the ligand library needs to be prepared.  

VFVS Features

VFVS supports a variety of different docking programs, in particular the members belonging to the large AutoDock family: AutoDock Vina, QuickVina 2, Smina, QuickVina-W, ADFR, Vina Carb, and VinaXB.
Each docking scenario can be carried out multiple times. This can be very useful when one wants to increase the chance that the docking program finds the docking pose with the global minimum relative to the scoring function which it employs.
Even while the virtual screening procedure is still running, VFVS provides the possibility to see results in realtime for each docking scenario which was defined for the screening procedure. It can provide both statistical information regarding the docking scores of all docked ligands, as well as list the highest scoring compounds along with their highest score.
A separate tools-package (VFTools) for Virtual Flow was created, which contains tools which can assist to create the ligand collections in the required layout (provided the ligands are already in the correct format), as well as to automatically postprocess and curate the output files.
VFVS is suitable for carrying out virtual screenings in a multistaged many. In each stage the accuracy of the dockings can be increased to rescore the highest scoring compounds from the previous stage. This enhances the quality of the virtual screening at reduced computational costs. 
VFVS allows to carry out multiple docking scenarios per ligand. A docking scenario in VFVS is defined by the receptor structure, by the docking parameters (such as exhaustiveness), rigid or flexible receptor docking, the choice of flexible receptor side chains or the docking program. This allows also for ensemble dockings.

VFLP - VirtualFlow for Ligand Preparation

VFLP is dedicated to the preparation of ligand databases in a ready-to-dock format. Moreover, it provides them in a format which is directly usable by VFVS. But VFLP can also be used independently of VFVS (and virtual screenings in general).

VFLP can use ChemAxon's JChemSuite for carrying out several steps of the molecule preparations, though Open Babel can currently be used as well instead. A suitable academic or commercial license needs to be obtained from ChemAxon directly. More information can also be found here in the documentation.

VFLP Features

Any number of distinct output formats can be generated during a single workflow. For each of these formats, a separate output ligand library is created.
Almost every chemical file format (i.e.all formats supported by Open Babel) can be chosen for the final output files of the ligands. Among these file types are all the well known formats such as SDF or PDB, but also specialized ones such as PDBQT which is used by most AutoDock-based docking programs.
During the preparation of the ligands with VFLP, they can be protonated at a freely choosable pH value. Ligand protonation is an optional step in the workflow, which does not have to be employed. It can be carried out either by ChemAxons cxcalc or by Open Babel.
Docking programs normally require three-dimensional conformations of the molecules. VFLP can prepare them when required, either by using OpenBabel or by using ChemAxon's molconvert.
Sometimes, one molecular conversion programs like OpenBabel or the tools from ChemAxon fail for certain molecules during one of the processing steps such as protonation. In VFLP one can specify a backup program in the case that the primary conversion program should fail during the conversion of some ligands.
Molecules which are provided in salt-form can be automatically desalted and neutralized via ChemAxons JChem suite.
Often multiple tautomeric states of the same compounds exist at a given pH. VFLP can automatically generate the tautomers using the cxcalc tool of ChemAxons JChem Suite, with the possibility to specify all options which the tautomer plugin of cxcalc provides. This can lead to a substantial increase in the size of the library.
Substantial additional speedup due to the use of Nailgun, which runs a persistent JVM during the runtime of the workflow.