Skip to content Skip to footer

Open research data

As open as possible, as closed as necessary

Open research data includes materials – whether in digital or analog form – that are observed, collected, processed, or generated in the course of scientific activity. These data are recognized by the academic community as essential not only for verifying research outcomes but also for enabling future studies.

Open research data enable: 

  • verification of research results – allowing for checks on the reliability and accuracy of presented findings;
  • reuse of data – data can be utilized in new research, accelerating scientific progress; 
  • increased citation counts – publications based on open data are cited more frequently;
  • reduced research costs – access to existing data decreases the need to collect data again;
  • interdisciplinarity – data can be used across various fields.

Openness of research data brings many benefits: 

  • accelerates scientific development – facilitates and speeds up research, stimulating further discoveries and innovations; 
  • increases transparency – allows for assessment of the credibility of conducted research;
  • supports collaboration – eases information exchange among scientists;
  • enhances accessibility – data are available to all interested parties. 

Open data fosters trust and strengthens the credibility of science. It allows for independent replication of results and encourages the reuse of data beyond their original purpose – turning even seemingly minor datasets into valuable resources for other disciplines, regions, or future projects. This shift supports a culture of scientific cooperation rather than competition.

Types of research data

  • raw data – unprocessed data, collected but not analyzed;
  • observational data – captured in real-time (e.g., sensor readings, telemetry data, results of anonymous surveys, focus group studies), often unique because they cannot be “recovered”;
  • experimental data – obtained from laboratory equipment under controlled conditions, reproducible but often very costly (e.g., gene sequences, spectroscopy, magnetic field readings);
  • simulation data – gathered during tests of real or theoretical systems (e.g., climate models, economic models, engineering systems);
  • derived / compiled data  – results of data analyses or aggregated data from various sources. Reproducible, but acquiring them can be very expensive (databases, texts, 3D models, bibliometric data);
  • reference data  – curated or organic datasets, usually peer–reviewed, published, and selected (e.g., Statistics Poland data, chemical structures, gene sequence databases).

5-star Open Data

A 5-star deployment scheme for Open Data, developed by Tim Berners-Lee, who was one of the pioneers of the Web and Linked Data.
  1. * make your stuff available on the Web (whatever format) under an open license.
  2. ** make it available as structured data (e.g., Excel instead of image scan of a table).
  3. *** make it available in a non-proprietary open format (e.g., .CSV instead of .XLS).
  4. **** use URIs to denote things, so that people can point at your stuff.
  5. ***** link your data to other data to provide context.

More information: 5-star Open Data.

Stopka