CRAVE Dataset
CRAVE provides resources for evaluating multimodal misinformation detection models, including pre-computed evaluation features, trained model checkpoints, and Python evaluation scripts.
Dataset hosted by the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey.
Overview
The release contains two experiment packages:
outcontext-misinfo-progressrelevant-evidence-detection
Each package contains evaluation-ready feature files, a trained checkpoint, and
scripts for reproducing evaluation results. The files under data/eval/
are pre-extracted multimodal features, provided to support reproducible benchmarking
without recomputing image/text features from raw media.
Evaluation data are provided for the following benchmarks where applicable: DP, FIVEPILS, MMFakeBench, and VERITE.
Downloads
| Archive | Description | Approx. size | SHA256 |
|---|---|---|---|
| outcontext-misinfo-progress.tar.gz | Out-of-context misinformation evaluation package. | 1 GB | a08380a1fcc033c32bef266730d7f0b229dca37479d626214b60f11154de8a5b |
| relevant-evidence-detection.tar.gz | Relevant evidence detection evaluation package. | 1.6 GB | a08380a1fcc033c32bef266730d7f0b229dca37479d626214b60f11154de8a5b |
A checksum file is also available: checksums.txt.
Download and Installation
Download the archives:
wget https://cvssp.org/data/crave/outcontext-misinfo-progress.tar.gz
wget https://cvssp.org/data/crave/relevant-evidence-detection.tar.gz
Verify the checksums:
sha256sum -c checksums.txt
Extract the archives:
tar -xzf outcontext-misinfo-progress.tar.gz
tar -xzf relevant-evidence-detection.tar.gz
Expected Directory Structure
outcontext-misinfo-progress/
├── checkpoints/
├── data/
│ └── eval/
│ ├── DP/
│ ├── FIVEPILS/
│ ├── MMFAKEBENCH/
│ └── VERITE/
├── checkpoint-evaluation.py
├── model.py
└── utils.py
relevant-evidence-detection/
├── checkpoints/
├── data/
│ └── eval/
│ ├── DP/
│ ├── FIVEPILS/
│ ├── MMFAKEBENCH/
│ └── VERITE/
├── checkpoint-evaluation.py
├── model.py
└── utils.py
Running the Evaluation
After extracting an archive, change into the relevant experiment directory and run:
cd outcontext-misinfo-progress
python checkpoint-evaluation.py
or:
cd relevant-evidence-detection
python checkpoint-evaluation.py
Python requirements
This release contains Python evaluation scripts. Please install the dependencies
required by the accompanying code. If a requirements.txt file is added,
install dependencies with:
pip install -r requirements.txt
At minimum, the evaluation code is expected to require Python 3, PyTorch, NumPy,
and supporting libraries imported by checkpoint-evaluation.py,
model.py, and utils.py.
Hardware requirements
The release uses pre-computed features and model checkpoints, so evaluation does not require re-extracting raw multimodal features. GPU acceleration is recommended for faster evaluation, but exact GPU memory requirements depend on batch size and the model configuration used by the evaluation script. If GPU memory is limited, reduce the evaluation batch size in the script configuration.
Data and Licensing Terms
The dataset is made available for evaluation and benchmarking of multimodal misinformation detection methods. Users should not attempt to identify private individuals, redistribute raw third-party content outside the terms of the original platforms, or use the data for purposes beyond the stated research/evaluation scope.
Where source material originates from social-media or web platforms, users are responsible for complying with the applicable platform terms and policies. The dataset maintainers should list all source platforms represented in the release here.
Acknowledgements
This work was supported by the Centre for the Decentralised Digital Economy (DECaDE), a UK Research and Innovation (UKRI) Next Stage Digital Economy Centre.
DECaDE is funded through a £29 million investment by UK Research and Innovation and brings together expertise in artificial intelligence, distributed ledger technologies, cybersecurity, design, business, and law across multiple institutions. :contentReference[oaicite:0]{index=0}
The centre is led by the University of Surrey and includes key partners such as the University of Edinburgh and Digital Catapult, alongside a wide network of academic, industry, and public-sector collaborators. :contentReference[oaicite:1]{index=1}
This dataset aligns with DECaDE’s research on misinformation, media provenance, and content authenticity, which aims to support trust, transparency, and integrity in digital media ecosystems. :contentReference[oaicite:2]{index=2}
Contact
For questions about the dataset, please contact:
Junaid Awan, University of Surrey
m.awan@surrey.ac.uk
Version
Version: 1.0
Release date: 24 April 2026