Use ontologies in the model

STATUS: READY FOR REVIEW

Ontologies provide a flexible approach to integrating data and sharing meaning and may be better able to assist in inferring meaning in complex situations. Liyanage H, Krause P, de Lusignan S. Using ontologies to improve semantic interoperability in health data. BMJ Health & Care Informatics.

This process involves linking your data model to ontologies, which are essentially formal, shared dictionaries that provide precise, computer-readable definitions for concepts. By assigning these standard definitions to the elements in your data model, you make their meanings explicit and clear. This allows different computer systems to correctly understand, combine, and compare data from various sources without confusion, making the data more interoperable and easier to reuse across different projects.

Short description

This step starts from the existing semantic (meta)data model that organises your domain’s entities, attributes and value sets with clear project-level meaning. Building on that model, this step links its elements to ontology terms to make their meanings explicit, computable and interoperable. Ontologies are formal, logic-based representations of domain knowledge that define types of entities, relations and constraints, enabling explicit semantics and consistent computational interpretation. By adding this semantic layer, models become not only well-structured but also interoperable across systems.

Why is this step important

A (meta)data model can express semantics for its project context, but those meanings are often implicit or local. The next challenge is to make them interpretable across systems. Ontologies provide the shared semantic layer that enables this. They make model elements formally defined and logically connected beyond their original context, turning isolated data structures into interoperable, machine-understandable representations. The following aspects highlight the main contributions of ontology use to semantic precision, data integration and FAIR alignment:

Shared meaning. Ontologies use formal semantics and axioms (formal rules that software can check) to make each model element’s meaning explicit across systems.
Data integration. By providing precise, shared semantics, ontologies reduce false matches where labels align but meanings differ, enabling reliable combination and comparison across systems.
Automated reasoning and validation. By encoding semantics formally, ontologies can be processed by logic reasoners (e.g. HermiT or Pellet, both available for free use), software that can infer missing connections, check for inconsistencies (such as conflicting definitions or hard classifications) and flag contradictions.
FAIR alignment. Using ontologies provides a formal, shared representation that makes meaning computable, directly supporting the FAIR principle I1. Adopting recognised community ontologies also advances conformance with domain standards, aligning with R1.3.

🧪 Example

As an illustrative example of these advantages, suppose System A uses the label “myocardial infarction” and System B uses “heart attack” to denote the same clinical concept. Both are bound to the same ontology IRI (Internationalized Resource Identifier) (A = B). A third System C uses “stroke”, which is bound to a different IRI (B ≠ C). By substitution, it follows that A ≠ C as well. In practice, this prevents false merges (heart attack ≠ stroke) while still integrating data where the label differs but the meaning is identical (myocardial infarction = heart attack). If a broader or narrower relation exists (e.g. “ischemic heart disease” broader than “myocardial infarction”), queries can still aggregate correctly without conflating distinct concepts.

How to

Before starting, choose an ontology development methodology to guide your process—this applies whether you are creating a new model, extending or reusing an existing one and whether you are annotating newly collected data or aligning previously collected data. Well-known approaches include Ontology Development 101, Methontology, SABiO – Systematic Approach for Building Ontologies and the NeOn Methodology. These provide structured workflows for requirement gathering, conceptualisation, formalisation and evaluation, helping ensure your ontology or mappings are coherent, reusable and aligned with best practices. The following steps describe how to connect model elements to ontology terms and manage these bindings over time.

Step 1 - Identify what needs annotation

List the model elements that represent concepts or enumerated values, such as classes, attributes, value sets and common data elements.
Record current labels, definitions and intended use to guide later term selection.

Step 2 - Select appropriate ontologies

Search trusted registries that cover your domain and application scope, such as: BioPortal, Ontology Lookup Service (OLS), Open Biological and Biomedical Ontologies (OBO) Foundry, Linked Open Vocabularies (LOV) and BARTOC (Basic Register of Thesauri, Ontologies & Classifications).
Evaluate the ontologies’ coverage, granularity, community adoption, maintenance status and licence conditions. For more information about that, check, e.g. “Ten Simple Rules for Selecting a Bio-ontology”, the OBO Foundry Principles and the NCBO Ontology Recommender 2.0.
Prefer ontologies with persistent, dereferenceable IRIs, clear versioning and open licences (e.g. Creative Commons Attribution (CC BY), Open Data Commons Attribution License (ODC-By), Creative Commons Zero (CC0)).
Before creating new mappings, check whether equivalent mappings already exist (for example, in FAIRsharing, or BioPortal mappings). Reusing existing mappings promotes consistency and reduces effort.
Document chosen sources and versions before mapping.

Step 3 – Bind model elements to ontology terms

Choose the most specific term that fits the intended meaning and avoid overly broad terms.
Record both the label and IRI for each binding (e.g. an IRI such as https://purl.obolibrary.org/obo/NCIT_C28421 for “Male”).
Capture mapping intent where useful, using predicates from Simple Knowledge Organization System (SKOS), such as skos:exactMatch, skos:closeMatch, owl:equivalentClass and rdfs:subClassOf. You can check the SKOS documentation here.
Keep bindings separate from the source ontology. Do not change the original ontology’s IRIs.
Store bindings in a structured, machine-readable form (e.g. Resource Description Framework (RDF), Web Ontology Language (OWL), Simple Knowledge Organization System (SKOS) or a dedicated mapping file such as Simple Standard for Sharing Ontological Mappings (SSSOM)) with provenance fields such as who mapped, when, source version and rationale. Always include the ontology version IRI or release date for reproducibility.
When no exact term exists, use the closest appropriate term and record the gap (i.e., note that no exact ontology term was found and why). Record this both in your mappings and in the documentation. For example, you can add an annotation in OWL (use rdfs:comment or skos:note) and add provenance in the mapping (e.g. using SSSOM). If the gap is significant, consider proposing a new term to the ontology maintainers.
Use the fewest ontologies possible to make interoperability and maintenance easier. Using a large number of different ontologies, for instance, may make it difficult to check that terms are not contradictory.

🧪 Example

Local value “Neck cancer” has no exact term, so you temporarily link it to “Head and neck cancer” using https://purl.obolibrary.org/obo/DOID_5520 with a skos:closeMatch and add a comment such as “No exact term for ‘Neck cancer’; new-term request submitted at https://example.org/issue/12345.”.

Represent the model and its annotations using RDF or OWL and express constraints using Shapes Constraint Language (SHACL).
Use dereferenceable IRIs for both your model elements and referenced ontology terms.
- Where you control the identifiers, create persistent IRIs that dereference to human- and machine-readable descriptions. For external ontology terms, reuse their IRIs as-is. If a required term’s IRI does not dereference, still use it as-is and add a reference link (e.g. using rdfs:seeAlso) to a stable catalogue record.
Include ontology citations and versions in the model metadata and changelog.
Validate mappings and constraints automatically as part of your quality checks. Resolve any unsatisfiable classes or constraint violations before publication and keep validation reports as part of your documentation.
Recommended tools for implementation and validation include:
- Protégé. Ontology editing and reasoning (supports HermiT, Pellet)
- ROBOT (Ontology Build Tool). Command-line ontology manipulation
- OpenRefine. Cleaning and reconciling tabular data with ontologies
- TopBraid SHACL API or pySHACL (Python implementation of SHACL). SHACL validation
- SSSOM Toolkit. Managing mappings in SSSOM format
Publish the annotated model and bindings in a stable, publicly accessible location with a clear licence (e.g. GitHub, FAIRsharing, or institutional repositories).
Mappings and annotations themselves can be shared under permissive licences (e.g. CC BY 4.0 or CC0), unless source ontology licences impose restrictions.
Even when data cannot be openly shared (e.g. due to privacy or legal restrictions), publish the model and bindings so that others can understand, align and interoperate with your work.

Step 5 – Maintain and Govern Bindings

Ontologies evolve over time, so mappings and bindings must be reviewed and updated to remain valid and interpretable. Regular maintenance ensures your model stays aligned with current standards and avoids broken or outdated references.

Plan regular reviews. Check for new ontology releases at fixed intervals (e.g. every 6 or 12 months).
Watch for deprecations. Replace any deprecated or merged terms with their recommended alternatives and keep notes of these changes.
Update and record. Use version control (e.g. Git) to track all mapping changes and maintain a changelog including the ontology version used and the reason for each update.
Migrate if needed. When key ontology terms change, adjust your annotated data or mappings using tools such as ROBOT or simple scripts.
Re-validate. After each update, re-run SHACL validation or reasoning checks to confirm that the model remains coherent.

As the volume of (meta)data and repositories grows, continuous ontology maintenance can become a significant burden. Establish governance rules that clarify which bindings will be maintained, how often they are reviewed and by whom, and make these rules part of your project and repository documentation.

Expertise requirements for this step

Experts that may need to be involved, as described in Metroline Step: Build the Team, are described below.

Ontology experts. Select appropriate ontologies, apply alignment patterns and evaluate ontology quality and scope.
Metadata experts. Ensure coverage of required metadata elements and provide clear documentation for users.
Semantic web technology experts. Represent bindings in RDF, OWL and SKOS; maintain internal coherence of the model; design persistent IRIs, implement SHACL validation and automate mapping publication workflows.

Practical examples from the community

CARE-SM (Clinical and Registry Entries Semantic Model). Annotates clinical registry concepts using OBO Foundry-aligned ontologies for precise semantics and cross-resource mapping.
ELIXIR (European Life-science Infrastructure for Biological Information) Bioschemas. Extends Schema.org and uses ontology terms to improve dataset discovery on the web.
Model (Investigation–Study–Assay). Links experimental metadata fields to Ontology for Biomedical Investigations (OBI) and Chemical Entities of Biological Interest (ChEBI) terms to support consistent interpretation and validation.

Training

One interesting educational resource is the Ontology mapping with Ontology Xref Service (OxO) provided in the FAIRCOOKBOOK. This specific recipe shows how to use EMBL-EBI ontology Xref Service (OxO) to map ontology terms between source and target vocabularies.

An interesting resource is the OBO Semantic Engineering Training (Open Biological and Biomedical Ontologies Organized Knowledge). Although developed for a particular project, its content offers a practical demonstration of how to address a genuine case.

Suggestions

This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.