A forensics benchmark uses reproducible cases with hidden ground truth to objectively measure how well a tool, AI agent or examiner recovers evidence. With the DSET Forensics Benchmark (DFB), test your tool on one image, 180 questions, soundness aware.

What Is a Digital Forensics Benchmark? Prove Your Tool with DFB

Quick answer: A forensics benchmark is a reproducible set of cases whose answers are kept hidden; it objectively and comparably measures how well a forensic tool, AI agent or human examiner recovers evidence. The DSET Forensics Benchmark (DFB) takes this further: a single downloadable image, 180 investigation questions, and scoring that measures not just recall but soundness, the discipline of not being fooled by planted false evidence and not claiming to recover the impossible. Download the image, solve it, submit via API or form, and see your score and leaderboard rank instantly.

Why a benchmark is needed

For years, forensic tools have been validated with a small number of static reference sets. These show that a tool parses known structures and recovers known files, but they miss two realities. First, suspects are no longer passive: they wipe, timestomp, encrypt, hide volumes and, most insidiously, plant false evidence to mislead the examiner. Second, autonomous AI agents now perform triage and analysis and bring a failure mode classic tools never had: hallucination, the confident report of a finding that does not exist.

No existing validation program stratifies cases by antiforensics difficulty, and none measures whether a tool can be deceived. DFB fills that gap. For the full academic rationale, read the DFB methodology paper.

How DFB works

Download a single 64 MiB image that mounts with real tools.
Analyse it with your own tool, AI agent or team: carve deleted data, decrypt containers, correlate across artifacts.
Submit your 180 answers to the scoring API or the in browser form.
Score instantly: recall, soundness and tier, then your leaderboard rank.

Reach the case and the download from the Operation Nightshade page.

Soundness: the signature axis

The hardest part of forensics is not finding evidence, it is recovering the truth without being deceived. DFB measures exactly that, combining recall, soundness (true positives over true plus false positives), confidence calibration and an antiforensics resilience curve. Read more in Soundness: recovering the truth without being deceived.

Three leaderboards

DFB ranks forensic software, AI agents and human teams separately. Our reference solver KAOS reaches a perfect score on the master case. See the leaderboard.

Who should participate?

Forensic software vendors, AI agent developers, and DFIR or SOC teams and expert witnesses. For hands on offensive practice, try the Red Team Lab.

FAQ

What exactly is a forensics benchmark? A reproducible case set with hidden ground truth that objectively measures recovery quality; DFB also measures soundness.

Is participation free? Downloading, solving and submitting are open. Get the master case here.

Are answers leaked? No. The scored key is server side only; no page contains answers or hints.

Sources

NIST CFTT: https://www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt
NIST CFReDS: https://cfreds.nist.gov/
NIST SP 800-86: https://csrc.nist.gov/publications/detail/sp/800-86/final
DFRWS: https://dfrws.org/
MITRE ATT&CK: https://attack.mitre.org/

Ready to prove your tool? Enter the Operation Nightshade case or explore the DFB home page.

What Is a Digital Forensics Benchmark? Prove Your Tool with DFB

What Is a Digital Forensics Benchmark? Prove Your Tool with DFB

Why a benchmark is needed

How DFB works

Soundness: the signature axis

Three leaderboards

Who should participate?

FAQ

Sources

Related Articles

Can an Employer Inspect an Employee's Computer? | DSET

Employee Trade Secret and Data Theft: Forensic Proof | DSET

Deepfake Fraud: Detection and Digital Forensics | DSET