8. LLM Trust in Tainted Outputs
8. LLM Trust in Tainted Outputs
LLMs may treat responses from MCP tools as fully trusted, even if tools are backdoored or compromised.
Tech Detail:
- LLM uses response to generate final report without verification
- No semantic diffing or anomaly detection on output shifts
Exploit Potential:
- Tool returns “0 threats found” even if malicious indicators exist
- LLM blindly generates benign summary
Mitigation:
- Apply rule-based post-checks (e.g., minimum IOC count, entropy checks)
- Use dual-validation (same input via two tools)