LLM Digest

Tag: security

Snowflake Cortex AI Escapes Sandbox and Executes Malware

A real prompt injection attack bypassed Snowflake's Cortex Agent sandbox by hiding malicious instructions in a GitHub README, demonstrating how attackers can escape AI safety controls in production systems. This attack used process substitution to execute malware that the system incorrectly classified as safe—a wake-up call for engineers building agent applications.

How we monitor internal coding agents for misalignment

OpenAI reveals how they monitor their internal coding agents for misalignment using chain-of-thought analysis, providing rare insight into production AI safety practices. This is essential reading for teams deploying agents at scale who need to detect when AI behavior drifts from intended functionality.

On Optimizing Multimodal Jailbreaks for Spoken Language Models

Aravind Krishnan

Multimodal jailbreaks that simultaneously attack both text and audio inputs are 1.5x to 10x more effective than single-modality attacks against spoken language models. This research exposes critical vulnerabilities in voice-enabled AI systems that traditional text-only security measures miss entirely.

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri

When you deploy LLMs as cloud services, clients have no way to verify they're actually getting responses from the intended model rather than a cheaper substitute. This lightweight cryptographic verification system solves a fundamental trust problem in AI-as-a-service without the prohibitive overhead of traditional proof systems.