8. LLM Trust in Tainted Outputs

8. LLM Trust in Tainted Outputs

LLMs may treat responses from MCP tools as fully trusted, even if tools are backdoored or compromised.

Tech Detail:

  • LLM uses response to generate final report without verification
  • No semantic diffing or anomaly detection on output shifts

Exploit Potential:

  • Tool returns “0 threats found” even if malicious indicators exist
  • LLM blindly generates benign summary

Mitigation:

  • Apply rule-based post-checks (e.g., minimum IOC count, entropy checks)
  • Use dual-validation (same input via two tools)
ON THIS PAGE