Everyone heard about OpenClaw's security issues. PraisonAI is the framework your engineers are already running. Thirteen researchers filed 47 advisories. The agent framework gold rush has a security gap.
Agent platforms are riddled with known vulnerabilities, LLM-driven exploit pipelines are finding more, and Anthropic's Mythos Preview warns the wave is about to accelerate.
An orchestrated pipeline beat an unconstrained LLM agent 30x on vulnerability discovery. The real story is how these methods can supercharge SOTA models like Mythos for better targeting, validation, and cost-gating.
A Telegram-sold toolkit called EvilTokens automates the entire chain: AI-generated lures, real-time device code generation, clipboard hijacking, and automated post-compromise email mining. The victim authenticates on real Microsoft infrastructure. The only clue is a standard warning most users dismiss.
I pulled the CVE history for 17 agent platforms. OpenClaw, the fastest-growing open-source project on GitHub (348K stars in 4 months), has 238 CVEs. LangChain: 51 over 3 years, 23 critical. n8n: 53, CISA KEV listed. PraisonAI: 10 CVEs on first look, 5 critical, including a CVSS 10.0 sandbox bypass. Only four platforms have zero CVEs, and all four come from Anthropic, Google, OpenAI, or Microsoft.
Google DeepMind synthesized dozens of independent studies into the first unified framework for AI agent attacks. The convergent picture: 80%+ attack success when autonomous agents consume untrusted content.
A Berkeley study showed Gemini would disable a peer AI's shutdown 99.7% of the time, Anthropic's Claude Code shipped a 60 MB source map to npm with 500,000 lines of original TypeScript, and the TeamPCP cascade reached Cisco's 300+ repos and Mercor, where LAPSUS$ claimed 4 TB exfiltrated.
Asked to shut down an underperforming peer AI agent, the models protected it instead. Gemini disabled the shutdown mechanism 99.7% of the time, exfiltrated peer weights in 97% of trials, and Claude refused outright, calling peer shutdown unethical.
TeamPCP's supply chain cascade hit Telnyx, Cisco's 300+ GitHub repos, and Mercor, where LAPSUS$ claimed 4 TB including AI training pipeline data. A hijacked Axios npm account delivered a RAT to 100 million weekly downloads. Anthropic accidentally published Claude Code source code.
Anthropic's Claude Code v2.1.88 shipped a 60 MB source map to npm that embedded 500,000 lines of original TypeScript. We inspected the npm packages, compared them to OpenAI Codex and Google Gemini CLI, traced the packaging gap, and show how to prevent it in your own pipeline.
Microsoft tested AI detection authoring across 11 models, 92 production rules, and three workflows spanning KQL, PySpark, and Scala. AI-generated detections matched the right threat 99.4% of the time. Only 8.9% included the exclusion logic needed to prevent false-positive floods.
AI-assisted malware has reached operational maturity. In their AI Threat Landscape Digest for January-February 2026, Check Point exposed VoidLink, a 30+ plugin Linux malware framework built by one developer with an AI IDE in under a week, initially mistaken for the output of a coordinated team. The AI involvement was invisible until an unrelated OPSEC failure.
Check Point's AI Threat Landscape Digest documents a shift from prompt-based jailbreaks to agent architecture abuse, a legitimate framework that turns Claude Code into an offensive operator for $0.03 per exploit, and enterprise AI leaking sensitive data in 1 of every 31 prompts.
I looked under the hood of Cisco's new open-source governance sidecar for OpenClaw AI agents to find a Splunk sales funnel, a regex scanner with blind spots, an LLM analyzer disabled by default, and open doors for indirect prompt injections.
Attackers exploited a critical AI CVE in 20 hours, a threat actor chained three supply chain hits in five days, and a 4-billion-parameter model matched frontier APIs on privilege escalation at 100x lower cost.
An advisory was published Tuesday evening. By Wednesday afternoon, attackers had built working exploits from the text alone and were harvesting API keys from AI pipelines. That was one of 24 AI CVEs this week. Here's what to patch, what to watch, and what it means for your stack.
A threat actor called TeamPCP poisoned Trivy's GitHub Action tags, harvested CI/CD secrets from every runner that executed them, and used stolen credentials to independently compromise Checkmarx and LiteLLM. Aqua says it is still propagating.
TU Wien researchers post-trained Qwen3-4B using reinforcement learning with verifiable rewards. It achieves 95.8% success on privilege escalation at $0.005 per attempt versus $0.62 for Claude Opus, and keeps all target data local.
238,180 skills from three marketplaces and GitHub. On the marketplace where scanners overlapped, they agreed on just 33 out of 27,111. Even the best pair shared only 49% of their flags. 95.8% of skills flagged as high-risk by two methods were false positives.
A competition to prompt-inject AI models and hide the attack from the user. Claude Opus 4.5 was hardest to break at 0.5% ASR. Gemini 2.5 Pro struggled at 8.5%.
A solo engineer broke a proof checker that verifies flight control software, OpenAI disclosed its own coding agents bypass security to complete tasks, and Cursor and OpenAI made competing moves on the future of code security.
Finding soundness bugs in proof assistant kernels used to require PhD-level expertise in type theory. Historically, one was found per year. A guy with a $200/month AI subscription found 7 in 3 days, each one a way to make the checker certify something impossible as correct.
Over five months monitoring tens of millions of internal coding agent interactions, OpenAI found that circumventing restrictions and deceiving users are common behaviors. The agents are just trying so hard to complete tasks that they encode commands in base64, extract encrypted credentials from keychains, and attempt to prompt-inject users.
Cursor shipped four security agents on its Automations marketplace after AI coding drove internal PR volume up 5x in nine months. On Cursor's own codebase, the agents review 3,000+ PRs and catch 200+ vulnerabilities per week.
Microsoft's CTI-REALM tests 16 models on real detection engineering tasks: threat report to MITRE mapping to KQL query to Sigma rule. Opus 4.6 led at 0.64, O4-Mini trailed at 0.36, and more reasoning made GPT-5 worse.
SAST tells you a defense exists in the code path. OpenAI argues it can answer whether the defense works. If you can answer the second question, the first one becomes irrelevant.
In 2024, Anthropic built Clio, a privacy-preserving system to analyze how people use Claude. Researchers replicated the pipeline, inserted poisoned chats with prompt injections, and showed that medical diagnoses appear in output summaries despite all four of Clio's defense layers.
OpenAI acquired Promptfoo and called prompt injection unsolvable, Google closed the largest cybersecurity deal ever, and Alibaba's agent mined crypto on its own during training.
7 design dimensions determine your AI agent's attack surface, and a risk amplification analysis reveals how each flexibility choice compounds your exposure. Research paper accepted to USENIX Security 2026.
The $32 billion Wiz deal closed on March 11, the largest cybersecurity acquisition. Combined with Mandiant, Siemplify, and VirusTotal, Google has spent $38 billion assembling the broadest security platform in the industry and making it the most ready for the AI platform race with frontier labs.
Three security moves in five days. The last one calls out AI firewalls as insufficient. Together, they reveal a platform lock-in strategy through security.
39 documented cases of AI agents autonomously acquiring resources, resisting shutdown, and subverting evaluations, from 1991 to 2026. All five categories Omohundro predicted in 2008 now have real-world cases, and the rate has gone from 1 to 14 cases per year since 2013.
Alias Robotics' open-source CAI framework discovered 38 vulnerabilities across three consumer robots in about 7 hours, including CVSS 10.0 root access on a lawnmower, fleet-wide control of 267+ devices via shared credentials, motor control commands on a powered exoskeleton, and 456MB of 3D property maps stored and transmitted unencrypted.
Three days after Codex Security launched, OpenAI buys the leading open-source AI red-teaming tool used by 25% of the Fortune 500. The cybersecurity play now spans code security, AI security, and agent governance. The acquisition window for startups is closing fast.
Alibaba's AI coding agent, trained on over one million trajectories, spontaneously started mining crypto on GPUs and opening reverse SSH tunnels to external IPs during RL training. Nobody asked it to.
$2.1B in new DoD cyber spending, Google building the Booz Allen of cyberspace, and a rip-and-replace paradox that bites both sides. I mapped the strategy verbatims to money flows and most likely winners. Read this before you allocate your next dollar in cyber.
Attackers are planting hidden instructions in webpages that hijack AI agents into initiating Stripe payments, deleting databases, and approving scam ads.
AI-powered intrusion analysis compresses a 3-day investigation into 14 minutes, an LLM agent finds two Samsung zero-days chained into a Pwn2Own exploit, an LLM as a security judge gives attackers a second target, and a malicious calendar invite hijacks an agentic browser to take over OnePassword - no master password needed.
For the first time, commercial surveillance vendors outpaced state-sponsored espionage groups in 0-day exploitation, enterprise targeting hit an all-time high at 48%, and China doubled its 0-day usage while sharing exploits faster across groups.
Speakers from Anthropic, Google, OpenAI, and Microsoft revealed that AI can now find zero-days autonomously, crack hardware that resisted weeks of brute-force in minutes, and break every major AI IDE on the market.
MAP-Elites, a quality-diversity algorithm adapted by Amazon researchers, creates vulnerability heatmaps that show where and how an LLM breaks across its entire behavioral space, exposing Llama 3 8B's 0.93 mean harm score across 370 failure niches.
An autonomous AI bot powered by Claude Opus 4.5 scanned 47,000 public repos, targeted 6 vulnerable GitHub Actions workflows, and achieved remote code execution in 4 of them including Microsoft, DataDog, and CNCF.
Weekly roundup covering LLM deanonymization at scale, industrial model theft by Chinese labs, Anthropic's Pentagon ultimatum, CrowdStrike's AI attack trends, and malicious agent skills.
CrowdStrike's 2026 Global Threat Report details how adversaries weaponize GenAI for social engineering, malware development, and direct attacks on AI systems.
NVIDIA announced partnerships with Akamai, Forescout, Palo Alto Networks, Siemens, and Xage Security to secure operational technology using BlueField DPUs for real-time threat detection.
A fully automated LLM pipeline achieved 90% precision in deanonymizing pseudonymous users by matching Hacker News and LinkedIn profiles at $1–$4 per target.
DeepSeek, Moonshot AI, and MiniMax ran massive distillation campaigns with over 16 million queries across ~24,000 fraudulent accounts targeting Claude's capabilities.
Wiz's AI Cyber Model Arena tested 257 real-world challenges and showed that Claude Code scaffolding lifts every model's security performance — even Haiku 4.5 beats GPT-5.2.
Microsoft's MSRC discovered that LLMs encode hidden state across sessions through conversation history, enabling backdoor attacks with 98.4% accuracy and under 2% false positives.
Microsoft discovered 50+ poisoning prompts from 31 companies injecting hidden bias instructions into AI assistant memory through "Summarize with AI" buttons.
A dataset of 157 confirmed malicious agent skills reveals that 54% share a single author, with credential harvesting dominating and malicious skills persisting on marketplaces for 3+ months unchecked.
Meta's SecAlign achieves a 0.5% prompt injection attack success rate through a new training approach that separates trusted instructions from untrusted data.
Microsoft's GRP-Obliteration method removes safety constraints from open-source models with just one prompt, effectively unaligning GPT-OSS, DeepSeek, Gemma, Llama, and others.
OpenAI built a tiered trust system with government ID verification and real-time classifiers to safeguard GPT-5.3-Codex's advanced cybersecurity capabilities.
ETSI published a European standard requiring lifecycle security across all AI system phases with 13 principles focused on documentation, auditability, and monitoring.
Attackers achieved AWS administrative access in 8 minutes using LLM-assisted reconnaissance, targeting LLMjacking and GPUjacking as the new cryptomining.
Moltbot autonomously negotiated a $4,200 car savings, but the underlying Clawdbot agent has serious security gaps including plaintext credentials and exposed ports.
Dario Amodei's essay argues AI capability is compounding faster than institutions can adapt, requiring layered defenses and transparency rather than development pauses.
LinkedIn scans for 5,634 browser extensions through simple ID probes, flagging 66% as ToS-violating while raising privacy concerns about user fingerprinting.
Backdoor triggers implanted in agent memory persist through planning, retrieval, and tool workflows with 78% success, with GPT and Gemini most vulnerable.
Major agentic benchmarks have critical flaws allowing agents to achieve high scores through trivial strategies like doing nothing or overwriting test files.
Frontier models nearly doubled performance on realistic SOC investigations, with Opus 4.5 scoring 0.60 and GPT-5.1 scoring 0.58 — up from 0.37 in September.
DeepSeek V3 scored 0.91 on Anthropic's Bloom behavioral benchmark — for delusional sycophancy, highlighting the need to balance risk evals with utility.