Structuring Cancer Research Knowledge Through Knowledge Graphs

Cancer is not a single disease but a complex set of disorders driven by interactions among genetics, environment, and life-style factors. Making sense of such heterogeneous and large-scale data goes beyond the capabilities of traditional data management and analysis approaches, requiring holistic, integrative methods that can combine information across modalities, preserve context, and capture complex relationships. Here, ontologies and knowledge graphs offer powerful tools to organize, integrate, and analyze complex biomedical information.

Ontologies are structured frameworks that provide clear definitions of important concepts in a domain, such as genes, proteins, diseases, or drugs, and explain what kinds of connections can exist between them. They offer a shared vocabulary that helps researchers describe information in a consistent and understandable way across studies and institutions. Knowledge graphs take this a step further by showing actual examples of these connections. They are networks where specific entities, like a particular gene, protein, or tumor, are linked by real-world relationships, such as “inhibits,” “expressed in,” or “associated with.” If the ontology is the dictionary, a knowledge graph is the storybook that uses those words to describe real events and facts. This structure not only helps researchers explore complex connections and patterns but also organizes the information in a way that computers can process efficiently for analysis. 

The potential of ontologies and knowledge graphs extends beyond simple data organization. They provide a structured, interconnected view of information, which allows researchers to uncover relationships that might otherwise remain hidden in fragmented datasets. For example, linking genomic alterations to clinical outcomes through a knowledge graph can reveal patterns of treatment response or disease progression that would be difficult to detect using traditional methods. Similarly, the consistent labeling and description of molecular and cellular data facilitates cross-study comparisons, enabling the discovery of common mechanisms across different cancer types or patient cohorts.

These capabilities are particularly valuable in the era of multi-omics and large-scale clinical studies, where integrating heterogeneous data is essential to derive actionable insights. In addition to the integrative capabilities, knowledge graphs provide a foundation for computational approaches, including machine learning and artificial intelligence, by transforming raw and heterogeneous data into a structured, interpretable format. This makes it possible to perform sophisticated analyses such as predicting treatment response, identifying potential drug targets, or stratifying patients based on complex molecular and clinical profiles. Notably, the semantic framework created by the combination of ontologies and knowledge graphs preserves the meaning and context of the data, ensuring that computational models remain interpretable and explainable and that any findings can be traced back to well-defined concepts.

The CancerScan project focuses on analyzing tissue slide images to extract detailed information from the tumor microenvironment (e.g. spatial organization of cells, the composition of immune and stromal populations, molecular markers, etc) to predict the emergence of treatment resistance and to offer insights that could guide more effective and personalized cancer therapies. Ontologies and knowledge graphs are particularly powerful in this context because they provide a structured framework to represent and integrate these complex image-derived features with clinical and molecular data. They enable consistent annotation, support advanced analyses, and facilitate predictive modeling, transforming heterogeneous and fragmented data into coherent, actionable knowledge that advances the understanding, diagnosis, and treatment of cancer.

 

For more information, see Silva et al., “Ontologies and Knowledge Graphs in Oncology Research,” Cancers, 2022.