The automation of these tasks or their parts would greatly benefit journalism and perhaps help the public to verify the credibility of various media.
It is evident that fact-checking needs external knowledge or detailed context.
However, in order to achieve the goal of a robust automatic fact-checking system, we must first find a way how to evaluate such a system.
For English, there are publicly available datasets researchers can use to evaluate their systems.
However, no systematic research has been conducted in West Slavic languages yet; thus we establish a common ground for further research by providing large datasets for fact-checking in Czech, Polish, and Slovak languages including initial experiments which reveal complexity of the task, set a baseline which uses a standard machine learning approach, and set an upper bound which uses manually created external knowledge.
We provide three datasets for fact-checking - one for each language downloaded from the following fact-checking websites.
Each dataset contains claims of politicians annotated with one of four classes: FALSE
, and MISLEADING
The labels have the following meaning:
- FALSE These statements are not in line with publicly available numbers or information. It may also be a situation where the calculation method of the indicator differs, but none of these sources confirms the number or claim in question.
- TRUE Statement using the right information in the right context.
- UNVERIFIABLE If it is not possible to find the source of the claim, or it is not possible to confirm or refute it based on the available information.
- MISLEADING These are statements that use correct facts, but in a wrong or incomplete context, or are being torn out or otherwise distorted from the original context. These are inappropriate or disproportionate comparisons.
Demagog — 9.1k Czech, 2.8k Polish and 12.6k Slovak labeled claims with reasoning: demagog.zip (~16.5 MB)
The Corpora are licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
Please, cite the appropriate article if you use any of the available resources.