CRAVE Dataset

CRAVE provides resources for evaluating multimodal misinformation detection models, including pre-computed evaluation features, trained model checkpoints, and Python evaluation scripts.

Dataset hosted by the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey.

Overview

The release contains two experiment packages:

Each package contains evaluation-ready feature files, a trained checkpoint, and scripts for reproducing evaluation results. The files under data/eval/ are pre-extracted multimodal features, provided to support reproducible benchmarking without recomputing image/text features from raw media.

Evaluation data are provided for the following benchmarks where applicable: DP, FIVEPILS, MMFakeBench, and VERITE.

Downloads

Archive Description Approx. size SHA256
outcontext-misinfo-progress.tar.gz Out-of-context misinformation evaluation package. 1 GB a08380a1fcc033c32bef266730d7f0b229dca37479d626214b60f11154de8a5b
relevant-evidence-detection.tar.gz Relevant evidence detection evaluation package. 1.6 GB a08380a1fcc033c32bef266730d7f0b229dca37479d626214b60f11154de8a5b

A checksum file is also available: checksums.txt.

Download and Installation

Download the archives:

wget https://cvssp.org/data/crave/outcontext-misinfo-progress.tar.gz
wget https://cvssp.org/data/crave/relevant-evidence-detection.tar.gz

Verify the checksums:

sha256sum -c checksums.txt

Extract the archives:

tar -xzf outcontext-misinfo-progress.tar.gz
tar -xzf relevant-evidence-detection.tar.gz

Expected Directory Structure

outcontext-misinfo-progress/
├── checkpoints/
├── data/
│   └── eval/
│       ├── DP/
│       ├── FIVEPILS/
│       ├── MMFAKEBENCH/
│       └── VERITE/
├── checkpoint-evaluation.py
├── model.py
└── utils.py

relevant-evidence-detection/
├── checkpoints/
├── data/
│   └── eval/
│       ├── DP/
│       ├── FIVEPILS/
│       ├── MMFAKEBENCH/
│       └── VERITE/
├── checkpoint-evaluation.py
├── model.py
└── utils.py

Running the Evaluation

After extracting an archive, change into the relevant experiment directory and run:

cd outcontext-misinfo-progress
python checkpoint-evaluation.py

or:

cd relevant-evidence-detection
python checkpoint-evaluation.py

Python requirements

This release contains Python evaluation scripts. Please install the dependencies required by the accompanying code. If a requirements.txt file is added, install dependencies with:

pip install -r requirements.txt

At minimum, the evaluation code is expected to require Python 3, PyTorch, NumPy, and supporting libraries imported by checkpoint-evaluation.py, model.py, and utils.py.

Hardware requirements

The release uses pre-computed features and model checkpoints, so evaluation does not require re-extracting raw multimodal features. GPU acceleration is recommended for faster evaluation, but exact GPU memory requirements depend on batch size and the model configuration used by the evaluation script. If GPU memory is limited, reduce the evaluation batch size in the script configuration.

Data and Licensing Terms

The dataset is made available for evaluation and benchmarking of multimodal misinformation detection methods. Users should not attempt to identify private individuals, redistribute raw third-party content outside the terms of the original platforms, or use the data for purposes beyond the stated research/evaluation scope.

Where source material originates from social-media or web platforms, users are responsible for complying with the applicable platform terms and policies. The dataset maintainers should list all source platforms represented in the release here.

Acknowledgements

This work was supported by the Centre for the Decentralised Digital Economy (DECaDE), a UK Research and Innovation (UKRI) Next Stage Digital Economy Centre.

DECaDE is funded through a £29 million investment by UK Research and Innovation and brings together expertise in artificial intelligence, distributed ledger technologies, cybersecurity, design, business, and law across multiple institutions. :contentReference[oaicite:0]{index=0}

The centre is led by the University of Surrey and includes key partners such as the University of Edinburgh and Digital Catapult, alongside a wide network of academic, industry, and public-sector collaborators. :contentReference[oaicite:1]{index=1}

This dataset aligns with DECaDE’s research on misinformation, media provenance, and content authenticity, which aims to support trust, transparency, and integrity in digital media ecosystems. :contentReference[oaicite:2]{index=2}

Contact

For questions about the dataset, please contact:
Junaid Awan, University of Surrey
m.awan@surrey.ac.uk

Version

Version: 1.0
Release date: 24 April 2026