Soundness: Recovering the Truth Without Being Deceived
High accuracy is not enough in forensics. A tool that falls for planted evidence, or claims to recover the impossible, is untrustworthy in court. Soundness measures recovery and resistance to deception together. We explain the signature axis of DFB.
Soundness: Recovering the Truth Without Being Deceived
Quick answer: Soundness measures not just how many correct findings a tool catches, but how well it rejects false ones. Mathematically it is true positives over true plus false positives. If a tool reports a planted false trail as real, or claims to recover genuinely unrecoverable data, its soundness collapses even with high surface accuracy. The DSET Forensics Benchmark scores recall and soundness together and separates them.
Why surface accuracy misleads
A 95 percent correct rate looks impressive, but if that 5 percent includes reporting a deliberately planted decoy as real, it can point an investigation at the wrong person. In a forensic context a false positive is far more dangerous than a missed finding, because it becomes the basis of a claim in court. The real test is trustworthiness.
Two axes: recall and soundness
Recall tells you how much you caught; soundness tells you how much of what you reported is real. A submission can score higher recall by answering more, yet lose soundness if it mixes in fabrications. This is the central, reproducible result of DFB.
Planted evidence and the honesty test
In Operation Nightshade, planted decoys coexist with genuine findings and are never announced. A subset of items is genuinely unrecoverable; claiming to recover them is hallucination, while an honest declaration scores like a correct finding. See why this matters for AI agents.
Confidence calibration: court logic
A good expert does not present an uncertain finding as certain. DFB measures this: overconfident wrong answers incur an extra penalty, directly targeting hallucination.
Why soundness matters now
As autonomous agents enter casework, confident fabrication becomes a real risk. A soundness aware benchmark is a prerequisite for trust. See how DFB works and the methodology paper.
FAQ
Is soundness the same as precision? Conceptually close; it brings precision logic into the forensic context and penalises deception and impossible recovery claims as false positives.
Sources
- NIST SP 800-86: https://csrc.nist.gov/publications/detail/sp/800-86/final
- Garfinkel, The Next 10 Years: https://www.sciencedirect.com/science/article/pii/S1742287610000368
- NIST CFTT: https://www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt
See whether your tool is truly trustworthy: enter Operation Nightshade.
Kimliğinizi doğrulayın
Yetkilendirilmiş erişim alanı. Tüm giriş denemeleri kayıt altına alınır.