Predictive Discovery

Compugen discovers novel drug targets through a unique, predictive, computational process. We call this process “Predictive Discovery” because our in silico findings predict the biological function and therapeutic relevance of novel proteins which were not previously considered as drug target candidates. For over a decade, we have been developing predictive platforms for a variety of biological processes and phenomena, that are continuously being improved and diversified to address the need for novel targets in areas of interest to the industry.

Biological knowledge: For each biological phenomenon or process, we first screen the available biological literature on the topic. Our scientists study and critically evaluate the publically available information to discern the key components from a computational perspective.

Genome and proteome analysis: The genome is the most complex encryption system known to mankind, and Compugen’s discovery team has made exceptional progress in understanding and deciphering its code. Our proprietary genome and protein analysis platforms generate accurate, robust and comprehensive data sources which have proven successful in a variety of internal and collaborative programs. Our genome and proteome analysis tools are employed depending on the biological phenomenon or process of interest, and are one of the pillars of our discovery process.

Experimental and disease data: The worldwide explosion of molecular data and exponential growth in personal, clinical and life-style data are presenting a monumental challenge alongside an extraordinary discovery opportunity. Our discovery team is therefore focused on collecting these data, analyzing them, evaluating their quality and utility, and integrating relevant studies in a format that is appropriate for predictive discovery. MED and LINKS are examples of internal platforms that were created to integrate gene expression data, which now support multiple internal programs.


Compugen’s predictive discovery process. The process integrates three core assets: extensive biological knowledge, genome and proteome analysis, and experimental and disease data. The weight of each of these components in the overall discovery process varies according to the class of targets of interest.

Data Integration and Computational Modeling leading to the Prediction of Drug Targets: The process illustrated in the diagram above integrates three core assets: biological knowledge, genome and proteome analysis, and experimental and disease data. The weight of each of these components in the predictive in silico model varies according to the class of targets of interest. This model is tested and continuously refined to identify with high accuracy key differentiating attributes from the three domains of known biology, genome and experimental and disease data. Predicted protein candidates, are then separated into two categories, novel and non-novel. The non-novel group serves the important function of providing an in silico validation to the given computational discovery process. This group includes proteins that were not part of the database used as a foundation for our algorithms, but were identified in the scientific or patent literature. The fact that these proteins were re-identified through our unique computational approach serves as an important control for the accuracy and breadth of the platform. Novel results, meaning proteins that were identified by our predictive process as novel drug targets, are then prioritized for experimental validation.

Our Main Technology Platforms

Genome & Proteome Analysis

LEADS – A comprehensive genome annotation platform

LEADS, our proprietary transcript assembly and human genome annotation platform, is a core component of the discovery infrastructure and provides a comprehensive view of genes, transcripts (mRNA sequences coding for proteins) and proteins. At the gene level, LEADS provides extensive annotation and details including antisense genes, SNPs, splice variants and RNA editing events. At the protein level, LEADS provides detailed annotations, including homologies, domain information, sub-cellular localization, peptides and motif predictions.

Functional Protein Segments: A collection of discovery platforms

The functional discovery platforms identify key protein segments that play an essential functional or structural role in proteins of interest. Through the use of sequence and structure-based proprietary algorithms integrated with additional computational biology tools, these platforms identify potential protein binding segments on a protein of interest or interacting segments within a protein. One application of such platforms is the discovery of segments blocking protein-protein interactions or intra-protein interactions, having the potential to serve as therapeutic peptides.

Experimental and Disease Data Analysis

LINKS – A comprehensive Drug Target Characterization Infrastructure
The LINKS infrastructure is designed to allow comprehensive characterization and differentiation of drug target candidates. LINKS was designed to integrate and analyze extremely large amounts of patients’ disease and clinical data to associate novel drug targets with specific disease conditions, clinical attributes and disease-associated mechanisms of action. LINKS was applied to analyze our pipeline of immune checkpoint target candidates and to compare them to one another as well as to differentiate them from known immune checkpoints. This analysis includes immune subpopulations, regulatory mechanisms and cancer-specific immune signatures, and enabled Compugen to compare and differentiate its large portfolio of novel immune checkpoint programs.

In 2016, LINKS was enhanced to include the computational discovery of new immuno-oncology drug targets, with a specific focus on the discovery of myeloid targets within the tumor microenvironment (TME). This further development of LINKS includes the integration of additional public and proprietary data in order to allow the identification and analysis of specific immune cell types derived from the TME, and, in particular, myeloid cells. Through the integration of multiple data types across a range of conditions, diseases, stimuli and specific myeloid sub-populations, the enhanced LINKS platform has demonstrated the capability to predict new myeloid target candidates that have potential utility in cancer immunotherapy.

Mining Expression Data (MED) Platform
The MED Platform is an integrated gene expression database composed of more than 70,000 public and proprietary microarray experiments. MED data sets have been curated, normalized and organized into more than 1,400 therapeutically relevant conditions (e.g., normal tissues, malignant tissues, tissues from drug treated patients). Utilizing a sophisticated query interface, the proprietary MED platform allows the simultaneous examination of gene expression profiles and pathways across all 1,400 conditions, tissues and individuals.


Max 1mb, accepted formats: pdf, doc, docx, rtf
[recaptcha size:compact]
Message Sent. Thank You!