In 2016, the Zika virus spread rapidly from south America to north America, while also affecting some islands in the Pacific and southeast Asia, so WHO declared Zika a pandemic issue. The Zika virus is a flavivirus transmitted by mosquitoes, mainly Aedes aegypti and also albopictus. The virus determines mild symptoms such as fever, rash, conjunctivitis, muscle and joint pain, malaise or headache, but Zika virus infection during pregnancy can spread from the mother to the baby and cause severe defects to fetuses. Infants are often born with microcephaly and other congenital malformations and developmental problems. Additionally, an increased risk of neurologic complications is associated with Zika virus infection in adults and children, including the Guillain-Barré syndrome, neuropathy and myelitis.
Dompé has fully exploited the combination of excellence in HPC software design, development and tuning by the Politecnico di Milano and the World-Class competences of CINECA in the deployment and optimization of extreme-scale computer simulations. So, at the end of the project, Dompé was able to obtain a 100X of performance increase due to the redesign of part of the code, and a 100X in the scale up of the simulation. This exa-scale ready LiGen™ version was used to perform the ANTAREX4ZIKA virtual screening simulation.
Computer-aided drug design (CADD) techniques are used to speed-up the early-stage drug development process enabling the selection of new active compounds. Through virtual screening, large virtual chemical libraries can be screened to find active molecules for query targets. Virtual screening techniques can be subdivided into ligand-based and structure-based ones, depending on whether they use the structure of an active molecule to find similar compounds or the structure of the target to identify putative ligands, respectively. In recent years, the growing availability of protein structures, resolved by structural biologists, has progressively raised the possibility to deploy structure-based drug design. The key to the success of structure-based drug design is the evaluation of a chemical space big enough to enable the identification of chemical structures having the best complementary pattern of interactions with the biological target under investigation, along with other phys-chem characteristics, as well as novelty and synthesis feasibility.
The ANTAREX research project has been granted in the H2020 Future and Emerging Technologies program on High Performance Computing. The project involves CINECA, the Italian Tier-0 Supercomputing Centre and IT4Innovations, the Czech Tier-1 Supercomputing Center. The ANTAREX project was coordinated by Politecnico di Milano and concluded successfully on November 30, 2018. Final results were presented at the Final Review Meeting held at the European Commission in Luxembourg on January 31, 2019. The main goal of the ANTAREX project is to provide a breakthrough approach to map, runtime management and autotune applications for green and heterogeneous HPC (High Performance Computing) systems up to the exascale level. The ANTAREX project is driven by two use cases chosen to address the self-adaptivity and scalability characteristics of two important scenarios in HPC, chosen for their major scientific and social impact. Here we present the application scenario linked to a biopharmaceutical HPC application for accelerating drug discovery deployed on the 10 PetaFLOPS MARCONI supercomputer at CINECA (#19 in TOP500 most powerful supercomputers).
Since 2010, Dompé SpA, through its dedicated Drug Discovery Platform, has invested in a proprietary software for computer aided drug design (CADD). The most relevant tool is the de novo structure-based virtual screening software LiGen™ (Ligand Generator), co-designed in collaboration with the Italian super computer center, CINECA. The distinguishing feature of LiGen™, is that it has been designed and developed to run on High Performance Computing (HPC) architectures. To maintain the performance primate beyond 2020, Dompé has decided to take the opportunity to embed in LiGen™ the emerging and most innovative technologies and new programming paradigms, which are leading the transition to the exa-scale computing era. In the ANTAREX framework, Dompé has fully exploited the combination of excellence in HPC software design, development and tuning of the Politecnico di Milano and the World-Class competences of CINECA in the deployment and optimization of extreme-scale computer simulations. So, at the end of the project, Dompé was able to obtain a 100X of performance increase due to the redesign of part of the code, and a 100X in the scale up of the simulation. This exa-scale ready LiGen™ version was used to perform the ANTAREX4ZIKA virtual screening simulation.
To demonstrate the benefits of code optimization and scalability, Dompé selected the Zika pandemic crisis to support and promote the identification of novel drugs able to address the unmet medical need in terms of effective therapies. Dompé, in collaboration with Universita’ di Milano, selected 26 binding sites identified from the already resolved crystal structures of 5 Zika proteins: NS5, NS1, NS2B/NS3, NS3 and the envelope protein. For this experiment, Dompé has created a virtual chemical space of 1.2 Billion small molecular-weight molecules. The evaluation of such a huge chemical space has been possible thanks to the outcome of the ANTAREX project and the almost 1million computational threads available at CINECA. The focus was on molecular docking, an increasingly important application for HPC-accelerated drug discovery. Dompé started by analyzing the most computationally intensive kernels of the LiGen™ molecular docking application. In ANTAREX, the research team at Politecnico di Milano developed a runtime tunable version of the molecular docking application for use in virtual screening experiments. This application was deployed and scaled out to the full size of the 10 PetaFLOPS Marconi supercomputer at CINECA to screen the 1.2 billion ligand database (including all investigational and marketed drugs) targeting the Zika virus. This represents the largest virtual screening experiment ever launched in terms of computational threads (up to one million) and size of the compound database (one billion molecules).
The intention of the ANTAREX partners is to make available to the research community the outcome of the simulation to support and speedup the discovery of a novel treatment to fight the Zika pandemic. Its participation in ANTAREX has allowed Dompé to take advantage of HPC-accelerated and tunable solutions, thus indicating new development paths that would not be viable using conventional computing. By exploiting ANTAREX’s HPC technologies supporting autotuning, scalability and energy efficiency, Dompé is now able to optimize molecular docking to reduce the virtual screening process for the identification of new active compounds by two orders of magnitude. What the drug industry (and scientific research) has learned is that this perspective opens up the possibility to shorten the time between the discovery of a health threat, like the case of a sudden pandemic virus, to the prompt availability of candidate drugs.
Milan 17 feb. 2020 - Italy is leading Exscalate4CoV (E4C) , the public-private consortium bidding for the European Commission’s Horizon 2020 tender for projects to counter the Coronavirus pandemic and improve the management and care of patients.
E4C aims at leveraging EU’s supercomputing resources coupling them with some of the continent’s best life-science research labs to counter international pandemics faster and more efficiently.
At the core of the project is Exscalate (EXaSCale smArt pLatform Against paThogEns), at present the most powerful 1 (and cost-efficient) intelligent supercomputing platform in the world, developed by Dompé. Exscalate (exscalate.eu) leverages a "chemical library" of 500 billion molecules, thanks to a processing capacity of more than 3 million molecules per second.
The E4C consortium, coordinated by Dompé farmaceutici, is composed by 18 institutions form seven European countries: Politecnico di Milano (Dept. of Electronics, Information and Bioengineering), Consorzio Interuniversitario CINECA (Supercomputing Innovation and Applications), Università degli Studi di Milano (Pharmaceutical science Department), Katholieke Universiteit Leuven, International Institute Of Molecular And Cell Biology In Warsaw (LIMCB), Elettra Sincrotrone Trieste, Fraunhofer Institute for Molecular Biology and Applied Ecology, Bsc Supercomputing Centre, Forschungszentrum Jülich, Università Federico II di Napoli, Università degli Studi di Cagliari, SIB Swiss Institute of Bioinformatics, KTH Royal Institute of Technology (Department of Applied Physics), Associazione BigData, Istituto Nazionale di Fisica Nucleare (INFN), Istituto nazionale per le malattie infettive Lazzaro Spallanzani and Chelonia Applied Science.
The aim of E4C is twofold, identify molecules capable of targeting the coronavirus (2019-nCoV) and develop a tool effective for countering future pandemics to be consolidated over time. More specifically E4C aims to:
E4C will have two highly interconnected workstreams of activities. One will be primarily computerized and will rely on bioinformatic and chemoinformatic technologies and algorithms, while the second will be focused on a genomic, biochemical and biological approach. Starting from known bioinformatic information on protein targets, 3D molecule structures needed to exploit EXSCALATE platform potential will be prepared. This activity has already started.
The supercomputing component of the project will be then exploited to model future mutations of the virus. These activities will allow to identify candidate molecules (either from repurposing libraries or from proprietary or commercial compound libraries) which will then be provided and/or synthetized to be tested. In parallel, E4C partners shall start the protein production for some of the identified sequences.
The consortium will also open the Exscalate platform to external partners willing to cooperate in the drug hunting exercise, as previously with the Zika virus. This considerable effort to test roughly more than 25,000 compounds will ensure E4Cdoesn’t lose any active candidate molecules and will evaluate other possible mechanisms previously underestimated. The project will also use inverse genomic screening to identify host factors associated with virus infection and connectivity mapping analysis to predict relevant host-specific compounds for testing.
The further, development of a specific Coronavirus animal model to test the outcome of the E4C will help researchers bridge from the selection of active compounds to a possible fast first dose in humans, following EMEA directives and managed by Spallanzani Hospital Partner.
Finally, researchers will work to make the Exscalate platform a sustainable resource for an emergency engine for compound identification to be deployed in all future pandemic emergencies.
1This performance exceeds by far the current "state of the art" technology which is also the subject of a recent article in Nature (6 February 2019, (www.nature.com/articles/s41586-019-0917-9) whereby a "chemical library" of 138 million molecules of a single target has been reached, with a processing capacity of less than 2 thousand molecules per second.