Researchers have identified a worrying class of remote code execution (RCE) vulnerabilities across multiple AI inference engines. These flaws affect major AI serving platforms-from Meta’s Llama to NVIDIA Triton and open-source inference systems—raising serious risks around model theft, persistent compromise, and infrastructure hijacking.
What’s the Core Issue?
- The root cause is a pattern dubbed ShadowMQ, found in several AI inference frameworks.
- This pattern involves ZeroMQ (ZMQ) sockets receiving serialized Python objects and deserializing them using Python’s
pickle, a known unsafe serialization method. - Because of insecure code reuse across different projects, this flawed logic appears in several widely used engines.
Affected Platforms
According to Oligo Security and other researchers, the following inference engines are impacted:
- vLLM – CVE-2025-30165
- NVIDIA TensorRT-LLM – CVE-2025-23254
- Modular Max Server – CVE-2025-60455
- Meta Llama-Stack – previously reported CVE-2024-50050
- NVIDIA Triton Inference Server – multiple RCE issues; e.g., CVE-2025-23319, CVE-2025-23320, CVE-2025-23334
- Ollama – including DoS, authentication bypass, and arbitrary file copy vulnerabilities
- PyTorch – bug in
torch.load()(CVE-2025-32434) that enables code execution when loading serialized models.
Why These Flaws Are Dangerous
- Model Theft: Inference servers often host proprietary models that are valuable IP. RCE here could enable an attacker to exfiltrate them.
- Persistence: Once an attacker gains code execution, they can deploy backdoors, cryptominers, or other tools inside the inference environment.
- Lateral Movement: AI inference nodes are now part of the attack surface — compromising them could lead to pivot points within the infrastructure.
- Unsafe Defaults: The vulnerabilities stem from insecure patterns (like pickle deserialization) that are often copied between projects, making this a systemic issue.
Recommended Mitigations
- Patch Immediately: Apply all available updates from vendors (e.g., update Triton to version 25.07+, vLLM to a patched release, etc.).
- Restrict Network Exposure: Do not expose ZMQ or other inference sockets publicly – bind them only to localhost or private networks.
- Use Safe Serialization: Replace
pickle.loads()with safer formats like JSON or Protobuf wherever possible. - Enable ZMQ Security: Use ZMQ’s built-in security mechanisms (e.g., CURVE) or proxy traffic over TLS.
- Harden Runtime Environment: Run inference processes with least privilege, enable container isolation, and monitor for unusual child-process creation.
- Audit Code Reuse: Review open-source or third-party frameworks for recycled, unsafe patterns; perform static analysis or code review especially for deserialization logic.
Conclusion
As AI becomes more deeply integrated into enterprise infrastructure, the security of inference engines is no longer just about model integrity – it’s also about platform trust. These remote code execution vulnerabilities expose a critical layer of the AI stack to severe risk. Organizations must treat inference infrastructure as a first-class security concern: patch quickly, isolate aggressively, and review code patterns to protect against advanced compromise.
About COE Security
COE Security partners with organizations in financial services, healthcare, retail, manufacturing, and government to secure AI-powered systems and ensure compliance. Our offerings include:
- AI-enhanced threat detection and real-time monitoring
- Data governance aligned with GDPR, HIPAA, and PCI DSS
- Secure model validation to guard against adversarial attacks
- Customized training to embed AI security best practices
- Penetration Testing (Mobile, Web, AI, Product, IoT, Network & Cloud)
- Secure Software Development Consulting (SSDLC)
- Customized CyberSecurity Services
In response to these inference-engine vulnerabilities, COE Security also provides AI infrastructure risk assessments, code-reuse auditing, secure serialization reviews, and network-segmentation consulting for AI deployment environments.
Follow COE Security on LinkedIn for ongoing insights into secure, compliant AI adoption and to stay updated and cyber safe.