DeVoted – Documentation

1. Introduction

In this page you can find all the documentation about the project DeVoteD. DeVoteD explores data about voter turnouts and democracy indeces in order to understand how much the exercise of the voting right impact on the democracy indices. You can find all the scripts of our data in the GitHub page of the project.

2. Scenario

In order to accomplish our research case, we collected data from different sources and re-used it to create our own dataset. We aimed at re-using datasets free of cognitive biases, prejudices and discriminations, fair and reliable, legally valid, relevant, consistent and accurate. [ Coming soon ]

3. Datasets

3.1. Original Datasets

The datasets used to investigate the relationship between the voting and democracy indeces include the following data from the respective sources:

High-level democracy indeces. Source: Variety of Democracy (V-Dem)
Mashed-up data about political parties. Source: Party Facts
International data about political parties. Source: The Manifesto Project
International data about voter turnouts. Source: The Voter Turnout Database

3.2. Mashed-up Dataset

In order to manage the mash-up of different datasets, with different licenses we followed the Guidelines for Open Data provided by the EU. In accordance with these guidelines, we pursued the objective to make our research data findable, accessible, interoperable and re-usable (FAIR).

Findable: the first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

F1. Metadata are assigned a globally unique and persistent identifier: both the data we retrieved in the original datasets, the mashed up data and the metadata we created according to the DCAT-AP are compliant with this point, presenting URI.
F2. Data are described with rich metadata (defined by R1 below): we associated a rich amount of metadata compliant with the DCAT-AP specification, including not only all the mandatory classes with their respective mandatory properties but also some recommended and optional properties that were useful for our data.
F3. Metadata clearly and explicitly include the identifier of the data they describe: for each dataset that is part of a catalogue and for our own dataset we associated to the metadata a unique identifier of the data described by means of the DCAT-AP optional property for datasets dct:identifier.
F4. Metadata are registered or indexed in a searchable resource: All the data we used are identified by an URL that allows to access the source where they are registered. For the creation of the metadata associated with our data we used the DCAT-AP specification, whose aim is to enable a cross-data portal search for data sets and make public sector data better searchable across borders and sectors. Therefore, we can state that our Metadata are registered in a searchable infrastructure.

Accessible: once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.

A1. Metadata are retrievable by their identifier using a standardised communications protocol: All the data we collected and mashed up and the relative metadata are retrievable through the HTTP or its extension HTTPS. Moreover, we provided also an explicit and clear contact protocol in the metadata by means of the names and emails of the data and metadata providers.
A1.1. The protocol is open, free, and universally implementable: HTTP and HTTPS are compliant with these characteristics.
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary: The HTTP and HTTPS provide for authentication of the accessed website.
A2. Metadata are accessible, even when the data are no longer available: Metadata will remain accessible from the homepage of the website we created about the project.

Interoperable: the data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

I1. Metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation: we used JSON for the representation of the mashed up data and RDF with the XML syntax to describe and structure the metadata.
I2. Metadata use vocabularies that follow FAIR principles: the annotation format we used allow to use machine-readable terms from any controlled vocabulary. We used the ISO standard vocabulary to represent nations, the Linked Open Data vocabulary specification called DCAT-AP. These vocabularies are documented and resolvable using globally unique and persistent identifiers.
I3. Metadata include qualified references to other Metadata: JSON and the RDF schema account for the data exchange and cross reference among metadata respectively.

Reusable: the ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

R1. Metadata are richly described with a plurality of accurate and relevant attributes: our data and metadata are described through a rich and vary series of labels including the date of collection and modification of the data, the licence, the publisher, their content.
R1.1. Metadata are released with a clear and accessible data usage license: all data we used were released without the specification of the usage license except for Party Facts. License in specified for the dataset and respective metadata we created (Creative Common License CC BY 4.0).
R1.2. Metadata are associated with detailed provenance: our project includes information about the provenance of data in a machine-readable format in the metadata codification. The website presents also a description of the workflow that led to your data.
R1.3. Metadata meet domain-relevant community standards: we used the ISO standard for geographic information.

The principles mentioned above include three types of entities: data, metadata and infrastructure. Given the analysis, we can state that our research data are almost 100% compliant with the FAIR principles, with the few exceptions due to the lack of license specification.

4. Preliminary Analysis

4.1. Quality Analysis

Manifesto Project
Pros:

Transparent methodology with detailed coding manuals and inter-coder reliability measures
Long-established academic project with peer review oversight
Open access to raw data and replication materials
Covers extensive time periods enabling longitudinal analysis

Cons:

Western-centric bias in coverage and analytical frameworks
Limited coverage of smaller or newer parties
Potential selection bias toward parties with accessible manifestos

PartyFacts
Pros:

Integrative approach linking multiple party datasets reduces fragmentation
Transparent linking methodology with version control
Collaborative, open-source development model
Addresses key interoperability challenges in comparative politics
Regular maintenance and community contributions

Cons:

Quality in the its core varies widely between countries
Linking accuracy may vary across different political systems
Limited validation of historical party continuity assumptions
Potential propagation of errors from constituent databases
Coverage gaps for non-European political systems

IDEA Voter Turnout Database
Pros:

Comprehensive global coverage with standardized definitions
Rigorous data validation processes and source documentation
Regular updates with transparent revision procedures
Free public access supporting democratic transparency
Collaboration with national election authorities

Cons:

Reliance on official sources may miss informal electoral practices
Definitional challenges across different electoral systems
Limited contextual information about barriers to participation
Potential underrepresentation of marginalized voter experiences
Time lags in data availability for recent elections

V-DEM Core Dataset
Pros:

Extensive expert networks providing local knowledge
Sophisticated uncertainty measures and confidence intervals
Comprehensive coverage of democratic dimensions beyond elections
Transparent aggregation methodology with multiple validation checks
Strong institutional backing and sustained funding

Cons:

Expert-based ratings introduce potential subjective bias
Western democratic norms may not capture non-liberal democratic forms
Complex methodology may obscure individual indicator limitations
Potential elite bias in expert recruitment
High-frequency updates may create artificial precision in gradual changes

4.2. Legal Analysis

The data we collected for the purposes of our research derive from different sources and therefore are subject to different types of license, when specified. When the licence was available and specified by the publisher, we found the Creative Common License CC0 BY 1.0. This license allows the user to freely use, share, modify, and distribute the material for any purpose without permission or attribution. It places the work in the public domain and does not allow the user to apply legal restrictions to others.

4.2.1. License, legal issues, version, maintenance

The original datasets used:

V-Dem Core: we couldn't find information about the license. The maintenance is regular on a annual basis. Since the 2014 at least one new version has been released.

Party Facts: the licence used is CC0 by-1.0. The last version was published in the 2023. But, as it can be read in the documentation section news, the project update regularly the datasets, indeed the last update was done in May, 5, 2025.

IDEA Voter Turnout Database: we couldn't find information about the license. The maintenance is regular on a annual basis. Since the 2014 at least one new version has been released.

Manifesto Project: we couldn't find information about the licence, nor about the version of the data. Lack of attribution of license means to abandon data forever, which means that other companies can recopyright it. Although the project has been explicited terms of use stating that redistribution of the data is forbidden unless explicitly authorized in writing by the project. If sharing is approved, the user must include all accompanying files, including the Terms of Use document. The user must also: clearly identify the data’s provenance, properly cite the dataset in any publications and notify the Manifesto Project of any published research using their data and provide them with a copy. In conclusion the Manifesto Project’s Terms of Use are much more restrictive than an open license. For researchers, it’s usable and valuable—but not "open" in the legal sense used by open science and open data communities. The maintenance is regular as declaired the organization itself and it is proved by the annual release of new, updated and corrected versions of the dataset.

While the data of IDEA Voter Turnout Database are provided exclusively in XLSX, all the other data are provided in a variety of format including open ones. V-Dem and Manifesto Project provide data in CSV, STATA, R and SPSS giving to the user the possibility to choose different tools to analyse the data. Manifesto Project provides data in XLSX, too. Party Facts provides data in the open format TAB, i.e. data as tab-separated values. The legal situation of Manifesto Project dataset affected to some extent our research: data were available behind login.

4.2.2. GDPR (EU General Data Protection Regulation)

Given that all the datasets, except for V-Dem, used are exclusively publicly shared aggregate datasets, where the information collected were never about individuals, but institutions, the GDPR does not apply.

The V-Dem project ensures the anonymity of its expert coders to address both current and potential future security concerns and for legal compliance such as the GDPR applying data encryption, access control and separation of the data from those of the individual coder (see the Methodology section of the FAQ).

The other projects didn't take such measures because the data collected were not collected by individual coders, but from official results publicly available.

4.3. Ethical Analysis

How sustainable and bias-free are our data providers?

V-Dem: continually reviews its methodology—and occasionally adjust it—with the goal of improving the quality. V-Dem has a rigorous expert recruitment at a global level. Experts are usually academics or professionals with specialist and evidenced knowledge in one or more domains. Approximately two-thirds are nationals or residents of the country they provide information on. The quality and impartiality of the data is highly dependent on the Country Experts. Consequently, V-Dem pays a great deal of attention to their recruitment and use the following selection criteria: validated expertise, local, in-depth knowledge, seriousness of purpose, impartiality, diversity in professional background among the Experts. V-Dem does not reveal the identity of the Experts. V-Dem uses the Bayesian Item-Response Theory (IRT) to convert the the ordinal responses experts provide into continuous estimates of the concepts being measured. This allow to estimate the traits of the concepts. This also allow for the possibility that experts have different thresholds for their ratings. This method allows for their reliability to idiosyncratically vary, accounting for the concern that not all experts are equally expert on all concepts and cases.

Party Facts: aggregates no individual-level data and does not have a coder network. It also lacks documented ethics or bias-control protocols. Data reliability depends on contributor reports and mapping validation processes.

IDEA Voter Turnout Database: uses official electoral data. There is no mention of data collector protection or coder ethics. There is a potential bias issues: risks stem from reliance on potentially falsified official data, but no mitigation strategy described.

Manifesto Project: has transparency in coding methodology, though without explicit mention of coder binding ethics or diversity protocols.

4.4. Technical Analysis

V-Dem
Party Facts

Format: TAB
Metadata:

Persistent Identifier
Publication Date
Title
Author
Point of Contact
Description
Subject
Related Publication
Depositor
Deposit Date
Related Dataset

URI: you can findhere the download page, there is no direct URI
Provenance: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TJINLQ

IDEA Voter Turnout Database
Manifesto Project

5. Sustainability

The DeVoteD catalogue and dataset were created within a course at the University of Bologna and is not actively maintained, while the datasets used for this catalogue are maintained by the relative institutions. However, our scripts remain available and can be rerun at any time on new files. If somebody notices that one of our input files is available in a newer version, we would be glad to be informed about it in order to update our file with the automated script. Our scripts are licensed under CC 4.0. We invite the community to update our files and contribute the updated files to our GitHub project. We will review the files and then add them if correct.

6. Visualisations

In the page Visualisation the user can find some graphical representations that are thought to help them better understand the data we collected. There is a map visualisation which allows the user to select one among the economic and political aspects we analysed, and shows it both in the map and in the correspondent bar chart. By selecting a country in the map, the user can see the total value on the map, and the value year by year from 2008 to 2018 in the bar chart. In the Graphs section we made available a bubble chart representing the total number of displaced people per country between 2008 and 2018, and a graph allowing you to visualise two different aspects among the economic and political ones at the same time, in the form of bar or line chart.

7. RDF Metadata

We used the DCAT Application Profile for data portals in Europe, version 3.0.1, to encode the metadata about all our data, including the original datasets and our own dataset Flucht. Click here to see the code on our GitHub project.

8. Final Conclusions

8.1 Executive Summary

The DeVoteD project successfully investigated the relationship between voter turnout and democracy indices through the integration of four major political datasets spanning 2008-2018. While the project achieved its technical objectives of creating a FAIR-compliant dataset, the empirical findings challenge conventional assumptions about the relationship between electoral participation and democratic quality.

8.2 Dataset Integration and Quality Assessment

8.2.1 Methodological Strengths

The project demonstrates exemplary data integration practices by successfully combining:
V-Dem Core Dataset: High-level democracy indices with sophisticated uncertainty measures
Party Facts: Comprehensive party system mappings across countries
Manifesto Project: Political party positioning data
IDEA Voter Turnout Database: Global electoral participation statistics

The integration process followed FAIR principles rigorously, achieving near-complete compliance with findability, accessibility, interoperability, and reusability standards. The project's transparent documentation and version control represent best practices in reproducible research.

8.2.2 Data Quality Challenges

Despite methodological rigor, several quality issues emerged:

Licensing Inconsistencies: The most significant limitation involves incompatible licensing frameworks. While Party Facts uses CC0 1.0, the Manifesto Project employs restrictive terms prohibiting redistribution without explicit authorization. V-Dem and IDEA databases lack clear licensing information, creating legal uncertainty for data reuse.
Coverage Biases: All source datasets exhibit Western-centric biases in both geographical coverage and analytical frameworks. This systematically underrepresents non-liberal democratic experiences and may not capture diverse forms of democratic governance prevalent in non-Western contexts.
Temporal Limitations: The 2008-2018 timeframe, while substantial, misses recent democratic developments including the rise of populist movements and democratic backsliding phenomena that became prominent after 2018.

8.3 Empirical Findings: Democracy and Electoral Participation

8.3.1 Core Discovery: Weak Democracy-Turnout Relationship

The correlation analysis reveals a counterintuitive finding that challenges fundamental assumptions about democratic participation:

Key Correlation Values
Liberal Democracy ↔ Voter Turnout: r = 0.25
Electoral Democracy ↔ Voter Turnout: r = 0.22
Participatory Democracy ↔ Voter Turnout: r = 0.19
Deliberative Democracy ↔ Voter Turnout: r = 0.22
Egalitarian Democracy ↔ Voter Turnout: r = 0.26

Voter turnout shows only weak-to-moderate positive correlations with all democracy indices (r = 0.19-0.26), with the strongest relationship occurring with egalitarian democracy (r = 0.26). These correlations, while statistically significant, explain less than 7% of variance in democratic quality.
Implications: This suggests that higher voter turnout does not necessarily indicate stronger democracy. Democratic quality appears to depend more heavily on institutional factors, governance structures, and systemic features than on the mere quantity of electoral participation.

8.3.2 Party System Competition Effects

All democracy indices show consistent moderate negative correlations with the Herfindahl-Hirschman Index (r ≈ -0.54 to -0.55), confirming that democratic quality increases with political competition and decreases with party system concentration.

8.4 Theoretical and Policy Implications

8.4.1 Rethinking Democratic Participation

The weak turnout-democracy correlation suggests that participation quality matters more than participation quantity. High turnout in systems with limited competition, restricted media freedom, or weak institutional constraints may not enhance democratic governance. This finding has several implications:

Policy Focus: Electoral reforms should prioritize institutional quality over turnout maximization
Measurement: Democratic assessments should weight institutional factors more heavily than participation rates
Intervention Design: Democracy promotion efforts should address systemic constraints before focusing on voter mobilization

8.4.2 Democratic System Interdependence

The strong correlations between democratic dimensions suggest that piecemeal democratic reforms may be insufficient. Countries seeking democratic improvement may need comprehensive approaches addressing multiple institutional domains simultaneously rather than focusing on single aspects like electoral procedures or civil liberties in isolation.

8.5 Research Limitations and Future Directions

8.5.1 Methodological Constraints

Aggregation Effects: Country-level analysis may obscure within-country variation in democratic experience across regions, social groups, or time periods.
Causal Inference: Correlation analysis cannot establish causal relationships between turnout and democracy. Future research should employ longitudinal designs or natural experiments to identify causal mechanisms.
Measurement Validity: Expert-based democracy ratings, while sophisticated, may not capture citizen experiences of democratic governance or alternative democratic traditions.
Further Data Gathering: We resulted working with unsufficient per party votes shares data which may have caused miscalculations of concentration indeces. Future gatherings should fulfill the missing data.

8.5.2 Recommended Extensions

Disaggregated Analysis: Examine subnational variation in turnout-democracy relationships
Temporal Dynamics: Investigate how turnout-democracy relationships evolve during democratic transitions
Qualitative Integration: Combine quantitative measures with qualitative assessments of democratic experience
Contemporary Update: Extend analysis to post-2018 period to capture recent democratic developments

8.6 Sustainability and Open Science Contributions

8.6.1 Technical Legacy

The project's GitHub repository, automated scripts, and DCAT-AP compliant metadata provide a sustainable foundation for future research. The modular approach allows researchers to update individual datasets while maintaining overall coherence.

8.6.2 Community Contribution

By identifying licensing obstacles and documenting integration challenges, the project contributes to broader conversations about open science infrastructure in political research. The detailed legal and ethical analysis provides a template for similar projects.

8.6.3 Recommendations for Data Providers

Based on integration experience, the project recommends that major political datasets:
1. Adopt standardized open licenses (preferably CC0 or CC BY)
2. Provide machine-readable metadata following DCAT standards
3. Implement regular update cycles with clear versioning
4. Document methodology and bias mitigation strategies transparently

8.7 Conclusion

The DeVoteD project successfully demonstrates that rigorous data integration can yield important insights into fundamental questions about democracy and participation. The counterintuitive finding that voter turnout correlates only weakly with democratic quality challenges conventional wisdom and suggests new directions for both research and policy.
While technical execution was exemplary, the project highlights persistent challenges in political data infrastructure, particularly regarding licensing compatibility and bias mitigation. These findings emphasize the need for coordinated efforts to improve data sharing standards in political science research.
The results suggest that democracy is not simply about getting more people to vote, but about creating institutional conditions where citizen participation can meaningfully influence governance outcomes. This insight has profound implications for how we measure, understand, and promote democratic governance in an era of global democratic challenges.