STATUS: READY FOR REVIEW
Research is formalized curiosity. It is poking and prying with a purpose. Zora Neale Hurston
Querying FAIR resources is the process of asking structured questions to retrieve data that is easy to find, access, and reuse. When resources follow FAIR principles, queries become more efficient because the data is well-organized and clearly described. In this step, we will explore the tools and platforms that enable effective querying of FAIR-compliant resources.
Short description
When machine-readable (meta)data is exposed (see Metroline step: Transform and Expose FAIR metadata), it becomes an accessible FAIR resource. In other words, a dataset or metadata collection which can be found, queried, and reused. Such resources are often hosted or described in catalogues and/or via FAIR Data Points, which expose (meta)data in a standardised way. This ability to discover and reuse data using the metadata resources is what makes FAIR so powerful: it turns isolated data into actionable knowledge for science.
These catalogues offer different levels of interaction:
- Browsing. You can navigate through a FAIR Data Point (FDP) or catalogue to discover available datasets, for example in the The National Health Data Catalogue or EBI BioStudies.
- Filtering and faceted search. Similar to filtering products in a webshop, results can be narrowed by disease, data type, species, or other metadata attributes, as supported by the European Health Research Data and Sample Catalogue.
- Visual query builders. Some platforms provide user-friendly query forms that automatically generate queries behind the scenes, as seen in Wikidata or the European Nucleotide Archive.
- Direct querying. For more advanced use, researchers and developers can write and run their own queries using external clients or scripts, for example through database queries with R.
Query results can be displayed in formats like HTML, JSON, XML or CSV, depending on the tool or user preference.
This Metroline page focuses on SPARQL as it is the standard query language for RDF-based resources, which are the foundation of the semantic web and linked data. These concepts aim to make data interoperable and machine-readable across domains, enabling powerful integration and reuse. SPARQL’s standardised syntax and ability to retrieve both metadata and data from diverse sources make it uniquely suited for querying structured web resources. While SPARQL is prominent in the semantic web domain, there are many other query languages tailored to different data models, research fields, and application needs (see table below).
| Query Language | Purpose | Used In | Example Repositories |
|---|---|---|---|
| Structured Query Language (SQL) | Querying relational databases | Tabular data, metadata | Dryad, Dataverse, OpenAIRE |
| SPARQL (for RDF/Linked Data) | Querying semantic web data | Ontologies, linked datasets | UniProt, OpenPHACTS, ELIXIR, Bio2RDF |
| GraphQL | Flexible API queries | Nested data structures | EMBL-EBI |
| Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) | Metadata harvesting | Repository interoperability | Zenodo, Figshare, institutional repositories |
| JSONPath / XPath | Extracting data from JSON/XML | API responses, metadata | Ensembl, NIH |
| Cypher | Querying graph databases | Networked biological data | Neo4j-based bioinformatics platforms |
Why is this step important
Querying FAIR data is important because it is how you actually use the data. FAIR data is only valuable if it can be discovered, filtered, combined and analysed and querying is how this is made possible.
- Find exactly what you need. General search and filtering allow you to locate datasets or specific information quickly, without manually checking every record.
- Explore and understand data. Browsing and faceted search help you see what datasets exist, what they describe and how they are structured.
- Combine and reuse information efficiently. Advanced queries (e.g. SPARQL) let you combine and analyse data from multiple sources without moving large datasets.
How to
This how-to gives information about querying FAIR resources, starting with simple browsing and filtering, moving to visual query tools, and advancing to federated multi-source querying with SPARQL.
Step 1 Start with browsing and filtering
The easiest way to explore FAIR data is through a catalogue or FAIR Data Point interface, such as the National Health Data Portal, FAIRsharing.org or Local FAIR Data Points (see Metroline Step: Transform and expose FAIR (meta)data to learn more about FAIR Data Points).
Here you can:
- Browse datasets and read metadata (description, owner, access conditions).
- Search by keywords, e.g. “muscular dystrophy” or “metabolomics”.
- Filter results by categories such as data type, disease, measurement, or year.
This helps you discover what exists before performing any (complex) queries.
To begin, we search for “Inflammatory bowel disease” in the search bar on Wikidata.org. This leads us to the item Q917447 which represents IBD in Wikidata. This item confirms that IBD is a recognised disease entity with structured metadata (such as classifications, related conditions, and identifiers) providing a solid starting point for further data exploration. We gained insight into what the catalogue contains, what metadata is available, and how we might formulate more specific queries to retrieve related information.
Step 2 - Use visual or guided tools to construct queries
Some linked-data portals offer visual query builders that help users construct SPARQL queries without needing to learn the syntax. These tools automatically translate your selections (such as ticking checkboxes or choosing from dropdown menus) into SPARQL and run the query in the background. Such as SPARQL Query builder or Wikidata Query Builder.
The results are typically displayed in a table or graph, making it easy to explore data without writing any code. This approach is ideal for users who want to go beyond simple browsing but aren’t yet ready to write SPARQL manually.
We want to continue our exploration of inflammatory bowel disease. In our first exploration of the Wikipedia page, we saw that IBD has genetic association to the gene IL23R. We want to query what other items this gene has genetic association to. In the Wikidata query builder, we would then put “genetic association” under Property and “IL23R” as value and run the query. We get 6 results, as seen in the below table.
| item | itemLabel |
|---|---|
| wd:Q179945 | psoriasis |
| wd:Q917447 | inflammatory bowel diseases |
| wd:Q32144272 | inflammatory bowel disease 17 |
| wd:Q52849 | ankylosing spondylitis |
| wd:Q1472 | Crohn’s disease |
| wd:Q1477 | ulcerative colitis |
Step 3 - Access the SPARQL endpoint to write and refine SPARQL queries
Note: The following steps are meant specifically for querying catalogues and repositories with SPARQL endpoint. If you are trying to query a catalogue based on another querying approach (e.g. SQL), these may not be directly applicable.
When you need more flexibility, connect directly to the SPARQL endpoint. Depending on the catalog you can use:
- A web-based interface (e.g. YASGUI or Virtuoso Query Editor).
- External clients and libraries in Python, JavaScript, C#, or Java.
Try simple queries first, such as listing datasets or retrieving specific metadata fields. As you become more comfortable, you can write more complex queries that join related information, apply filters, or aggregate data using SPARQL syntax.
We saw that the gene IL23R is associated with the disease psoriasis. Now, let’s take it a step further and run a more complex SPARQL query using the Wikidata query service to find which genes are associated with both Inflammatory Bowel Disease (IBD) and psoriasis.
See and run the query yourself at this link: https://w.wiki/FsqH
The results show all genes linked to both psoriasis and an IBD condition. For each gene, you can also see the specific IBD disease it is associated with (such as Crohn’s disease or ulcerative colitis) providing a richer context for analysis.
Step 4 - Combine multiple FAIR sources (federated queries)
When your question spans several data sources, use federated querying. This allows you to connect endpoints across registries, institutions, or countries, combining data without moving it.
In SPARQL, federated queries are implemented using the SERVICE keyword, which lets you call another SPARQL endpoint within your query. This enables seamless integration of data across different FAIR sources. See documentation on SPARQL federated querying here.
Step 5 - Export and reuse query results
Query results can be downloaded in multiple formats (e.g. CSV, JSON, XML) for reuse in data analysis tools like Python, R, or Excel. Depending on the query language and platform, it may also be possible to integrate queries directly into your workflow (for example, by calling SPARQL endpoints from Python or R scripts) so that results flow into subsequent analysis steps without the need to download files manually.
For human users, many catalog interfaces also provide built-in visualisation options, allowing results to be displayed as tables, graphs, or maps directly in the browser without additional tools.
In Wikidata, you can visualise query results in different ways by switching between different result views. Try to run the example query from above https://w.wiki/FsqH and experiment with the various visualisation and export options.
Expertise requirements for this step
To successfully perform this step, you may need help from the following experts:
- Researcher/domain expert. Uses domain knowledge to formulate queries and interpret results.
- Data scientist. Executes queries, processes results and handles federated queries.
- Semantic expert. Ensures correct use of metadata, vocabularies and ontologies for queries.
See Metroline Step: Build the Team for more information.
Practical examples from the community
SPHN Data Exploration and Analysis System (DEAS).
DEAS is a cross-hospital federated query tool developed by the Swiss Personalized Health Network (SPHN) to replace the previous Federated Query System. It enables researchers to securely query aggregated clinical data from multiple Swiss university hospitals without moving patient-level data.
Training
- FAIR Cookbook – Exploring data with SPARQL (Python). Explains how to use Python and SPARQL to query a FAIR-aligned RDF representation.
- Apache Jena SPARQL Tutorial. This SPARQL tutorial provides a concise introduction to the language, illustrating its major features through examples without aiming for full coverage.
Suggestions
This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.