Structure-Based De Novo Drug Design (SBDNDD) is a promising class of Computer-Aided Drug Design (CADD) methods leveraging deep generative Artificial Intelligence (AI) models to create novel, valid, and synthesizable ligands based on design constraints and the target protein’s 3D structure. The key objective is designing molecules that bind with high affinity and selectivity to a given target while ensuring optimal pharmacokinetic, toxicity, and drug-like physico-chemical properties. This approach can significantly accelerate early preclinical drug discovery, particularly hit/lead identification and optimization. Despite recent progress driven by increased data availability, enhanced hardware performance, and advancements in Deep Learning (DL), state-of-the-art SBDNDD models still face limitations hindering their industrial application. Common issues include low explainability, limited chemical diversity, and the lack of integrated post-processing and ranking systems. Furthermore, defining multi-objective scoring functions to effectively guide the generative process remains complex, requiring a careful balance between exploring chemical space, adhering to constraints, and maintaining creativity. In this study, we present ITForge, an end-to-end, AI-driven pipeline developed to support hit-to-lead workflows through generative fragment growing. ITForge integrates and optimizes multiple open-source frameworks to address significant SBDNDD limitations. The pipeline combines a generative workflow based on the scaffold decoration model LibINVENT, pre-trained on the ChEMBL database and optimized via Reinforcement Learning (RL), with a comprehensive post-processing module designed to progressively filter and rank compounds using increasingly accurate scoring stages. The generative chemistry process starts from a promising SMILES input fragment with defined growing points. RL guides the molecule generation toward high-scoring regions of chemical space based on a custom multi-objective function incorporating docking scores, an empirical synthetic accessibility score, and physico-chemical and structural properties related to drug-likeness and flexibility. Generated molecules are subsequently ranked by progressively accurate filtering and scoring methods, including: docking pose-template RMSD filtering, DL-based synthetic feasibility assessment, commercial building block search in an in-house database, ADME-Tox property prediction, refined docking scores, and binding free energy estimation. Our main innovation, presented in this poster, is GRIP SCORE, which stands for Geometric Residue Interaction Profiler. This is a custom NCI profiler that analyzes the specific non-covalent interactions of a binding pose and assigns a weighted score based on the strength and type of each interaction. The score can be calculated in two ways: an Absolute Mode, which sums the weights of all identified interactions, or a Reference-Based Mode, which normalizes the score against a known reference ligand. This allows us to re-weight the initial docking scores (multypling them for the grip score) and better rank candidates that show a more realistic and favorable binding pattern. We validated the pipeline performance in silico across several key metrics, using a fragment of the selective COX-2 inhibitor Celecoxib as a starting scaffold and a custom scoring function that combines docking with our novel GRIP Score. Our pipeline produced a set of valid and unique molecules, with very few compounds activating structural alerts such as PAINS. In terms of chemical novelty, which refers to its ability to generate new and diverse compounds, our model produced molecules that are distinct from the training set while still maintaining optimal drug-like properties. Finally, we estimated the binding free energy for our top 1000 candidates and were very pleased to find that 31 compounds obtained values comparable to our reference ligand, Celecoxib. This demonstrates the success of our scoring approach. ITForge provides a flexible, scalable, and open-source solution for AI-driven drug design, integrating generative modeling, multi-objective optimization, and rigorous post-processing to prioritize high-quality candidate molecules for experimental validation.
Carbone, G., Dante Manenti, A., Cosentino, U., Sandrone, G., Greco, C., Rovelli, G. (2025). An Open-Source AI-Driven Pipeline for Structure-Based De Novo Drug Design and Ligand Prioritization: ITForge. Intervento presentato a: 29th National Meeting on Medicinal Chemistry (NMMC 2025) - 14-17 September 2025, Parma, Italy.
An Open-Source AI-Driven Pipeline for Structure-Based De Novo Drug Design and Ligand Prioritization: ITForge
Giorgio Carbone
;Ugo Renato Cosentino
;Claudio Greco;Grazia Rovelli
2025
Abstract
Structure-Based De Novo Drug Design (SBDNDD) is a promising class of Computer-Aided Drug Design (CADD) methods leveraging deep generative Artificial Intelligence (AI) models to create novel, valid, and synthesizable ligands based on design constraints and the target protein’s 3D structure. The key objective is designing molecules that bind with high affinity and selectivity to a given target while ensuring optimal pharmacokinetic, toxicity, and drug-like physico-chemical properties. This approach can significantly accelerate early preclinical drug discovery, particularly hit/lead identification and optimization. Despite recent progress driven by increased data availability, enhanced hardware performance, and advancements in Deep Learning (DL), state-of-the-art SBDNDD models still face limitations hindering their industrial application. Common issues include low explainability, limited chemical diversity, and the lack of integrated post-processing and ranking systems. Furthermore, defining multi-objective scoring functions to effectively guide the generative process remains complex, requiring a careful balance between exploring chemical space, adhering to constraints, and maintaining creativity. In this study, we present ITForge, an end-to-end, AI-driven pipeline developed to support hit-to-lead workflows through generative fragment growing. ITForge integrates and optimizes multiple open-source frameworks to address significant SBDNDD limitations. The pipeline combines a generative workflow based on the scaffold decoration model LibINVENT, pre-trained on the ChEMBL database and optimized via Reinforcement Learning (RL), with a comprehensive post-processing module designed to progressively filter and rank compounds using increasingly accurate scoring stages. The generative chemistry process starts from a promising SMILES input fragment with defined growing points. RL guides the molecule generation toward high-scoring regions of chemical space based on a custom multi-objective function incorporating docking scores, an empirical synthetic accessibility score, and physico-chemical and structural properties related to drug-likeness and flexibility. Generated molecules are subsequently ranked by progressively accurate filtering and scoring methods, including: docking pose-template RMSD filtering, DL-based synthetic feasibility assessment, commercial building block search in an in-house database, ADME-Tox property prediction, refined docking scores, and binding free energy estimation. Our main innovation, presented in this poster, is GRIP SCORE, which stands for Geometric Residue Interaction Profiler. This is a custom NCI profiler that analyzes the specific non-covalent interactions of a binding pose and assigns a weighted score based on the strength and type of each interaction. The score can be calculated in two ways: an Absolute Mode, which sums the weights of all identified interactions, or a Reference-Based Mode, which normalizes the score against a known reference ligand. This allows us to re-weight the initial docking scores (multypling them for the grip score) and better rank candidates that show a more realistic and favorable binding pattern. We validated the pipeline performance in silico across several key metrics, using a fragment of the selective COX-2 inhibitor Celecoxib as a starting scaffold and a custom scoring function that combines docking with our novel GRIP Score. Our pipeline produced a set of valid and unique molecules, with very few compounds activating structural alerts such as PAINS. In terms of chemical novelty, which refers to its ability to generate new and diverse compounds, our model produced molecules that are distinct from the training set while still maintaining optimal drug-like properties. Finally, we estimated the binding free energy for our top 1000 candidates and were very pleased to find that 31 compounds obtained values comparable to our reference ligand, Celecoxib. This demonstrates the success of our scoring approach. ITForge provides a flexible, scalable, and open-source solution for AI-driven drug design, integrating generative modeling, multi-objective optimization, and rigorous post-processing to prioritize high-quality candidate molecules for experimental validation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


