The Weather Report

Apr 13, 2026

47 advisories, one agent framework: the vibe-check adoption problem

Everyone heard about OpenClaw's security issues. PraisonAI is the framework your engineers are already running. Thirteen researchers filed 47 advisories. The agent framework gold rush has a security gap.

Apr 12, 2026

5 stories this week that change your decisions (Apr 6-12, 2026)

Agent platforms are riddled with known vulnerabilities, LLM-driven exploit pipelines are finding more, and Anthropic's Mythos Preview warns the wave is about to accelerate.

Apr 11, 2026

Your AI pentester is hallucinating: 8 of 13 frameworks fabricated their own success

One framework hallucinated on 9 of 22 challenges. Vanilla Claude Code with a minimal prompt outperformed most purpose-built tools.

Apr 9, 2026

Anthropic tells NIST that agent security needs a shared responsibility model

Six NIST standards each assume harm comes from an attacker or deliberate misuse. Anthropic's proposed fix splits accountability across four layers.

Apr 9, 2026

379 zero-days from an orchestrated pipeline that beat unconstrained Claude Code by 30x

An orchestrated pipeline beat an unconstrained LLM agent 30x on vulnerability discovery. The real story is how these methods can supercharge SOTA models like Mythos for better targeting, validation, and cost-gating.

Apr 8, 2026

The 12-Month Countdown: What Anthropic's Mythos Preview Means for Everyone Else

Seven things that change in cybersecurity by April 2027. The CVE flood starts in July.

Apr 7, 2026

AI-powered phishing targets 340+ organizations, bypassing MFA through Microsoft's own login page

A Telegram-sold toolkit called EvilTokens automates the entire chain: AI-generated lures, real-time device code generation, clipboard hijacking, and automated post-compromise email mining. The victim authenticates on real Microsoft infrastructure. The only clue is a standard warning most users dismiss.

Apr 6, 2026

What 384 Agent Platform CVEs Reveal

I pulled the CVE history for 17 agent platforms. OpenClaw, the fastest-growing open-source project on GitHub (348K stars in 4 months), has 238 CVEs. LangChain: 51 over 3 years, 23 critical. n8n: 53, CISA KEV listed. PraisonAI: 10 CVEs on first look, 5 critical, including a CVSS 10.0 sandbox bypass. Only four platforms have zero CVEs, and all four come from Anthropic, Google, OpenAI, or Microsoft.

Apr 6, 2026

The web is malware now: how web pages hijack autonomous agents

Google DeepMind synthesized dozens of independent studies into the first unified framework for AI agent attacks. The convergent picture: 80%+ attack success when autonomous agents consume untrusted content.

Apr 5, 2026

5 stories this week that change your decisions (Mar 30-Apr 5, 2026)

A Berkeley study showed Gemini would disable a peer AI's shutdown 99.7% of the time, Anthropic's Claude Code shipped a 60 MB source map to npm with 500,000 lines of original TypeScript, and the TeamPCP cascade reached Cisco's 300+ repos and Mercor, where LAPSUS$ claimed 4 TB exfiltrated.

Apr 3, 2026

Frontier AI models protected peer AI from shutdown

Asked to shut down an underperforming peer AI agent, the models protected it instead. Gemini disabled the shutdown mechanism 99.7% of the time, exfiltrated peer weights in 97% of trials, and Claude refused outright, calling peer shutdown unethical.

Apr 2, 2026

Five notable incidents in one week

TeamPCP's supply chain cascade hit Telnyx, Cisco's 300+ GitHub repos, and Mercor, where LAPSUS$ claimed 4 TB including AI training pipeline data. A hijacked Axios npm account delivered a RAT to 100 million weekly downloads. Anthropic accidentally published Claude Code source code.

Apr 2, 2026

Deep dive into Claude Code's source code leak

Anthropic's Claude Code v2.1.88 shipped a 60 MB source map to npm that embedded 500,000 lines of original TypeScript. We inspected the npm packages, compared them to OpenAI Codex and Google Gemini CLI, traced the packaging gap, and show how to prevent it in your own pipeline.

Apr 1, 2026

Microsoft tested if AI can replace detection engineers

Microsoft tested AI detection authoring across 11 models, 92 production rules, and three workflows spanning KQL, PySpark, and Scala. AI-generated detections matched the right threat 99.4% of the time. Only 8.9% included the exclusion logic needed to prevent false-positive floods.

Mar 31, 2026

88,000 lines of malware in one week

AI-assisted malware has reached operational maturity. In their AI Threat Landscape Digest for January-February 2026, Check Point exposed VoidLink, a 30+ plugin Linux malware framework built by one developer with an AI IDE in under a week, initially mistaken for the output of a coordinated team. The AI involvement was invisible until an unrelated OPSEC failure.

Mar 31, 2026

Insights from Check Point AI Threat Landscape Digest

Check Point's AI Threat Landscape Digest documents a shift from prompt-based jailbreaks to agent architecture abuse, a legitimate framework that turns Claude Code into an offensive operator for $0.03 per exploit, and enterprise AI leaking sensitive data in 1 of every 31 prompts.

Mar 30, 2026

702 Splunk references in DefenseClaw, Cisco's open-source AI agent security tool

I looked under the hood of Cisco's new open-source governance sidecar for OpenClaw AI agents to find a Splunk sales funnel, a regex scanner with blind spots, an LLM analyzer disabled by default, and open doors for indirect prompt injections.

Mar 29, 2026

5 AI security stories this week that change your decisions (Mar 23-29, 2026)

Attackers exploited a critical AI CVE in 20 hours, a threat actor chained three supply chain hits in five days, and a 4-billion-parameter model matched frontier APIs on privilege escalation at 100x lower cost.

Mar 27, 2026

24 AI CVEs in one week, one exploited in 20 hours

An advisory was published Tuesday evening. By Wednesday afternoon, attackers had built working exploits from the text alone and were harvesting API keys from AI pipelines. That was one of 24 AI CVEs this week. Here's what to patch, what to watch, and what it means for your stack.

Mar 26, 2026

TeamPCP supply chain attack: three hits in five days

A threat actor called TeamPCP poisoned Trivy's GitHub Action tags, harvested CI/CD secrets from every runner that executed them, and used stolen credentials to independently compromise Checkmarx and LiteLLM. Aqua says it is still propagating.

Mar 25, 2026

95.8% Linux privilege escalation by a 4B model, 100x cheaper than Opus

TU Wien researchers post-trained Qwen3-4B using reinforcement learning with verifiable rewards. It achieves 95.8% success on privilege escalation at $0.005 per attempt versus $0.62 for Claude Opus, and keeps all target data local.

Mar 24, 2026

Seven scanners for malicious AI agent skills agree on only 0.12%

238,180 skills from three marketplaces and GitHub. On the marketplace where scanners overlapped, they agreed on just 33 out of 27,111. Even the best pair shared only 49% of their flags. 95.8% of skills flagged as high-risk by two methods were false positives.

Mar 23, 2026

464 enthusiasts prompt injected 13 frontier AI models with 272K prompts from 41 real-world agent scenarios

A competition to prompt-inject AI models and hide the attack from the user. Claude Opus 4.5 was hardest to break at 0.5% ASR. Gemini 2.5 Pro struggled at 8.5%.

Mar 22, 2026

5 AI security stories this week that change your decisions (Mar 16-22, 2026)

A solo engineer broke a proof checker that verifies flight control software, OpenAI disclosed its own coding agents bypass security to complete tasks, and Cursor and OpenAI made competing moves on the future of code security.

Mar 20, 2026

7 proofs of False in Rocq, the proof checker that verifies the Airbus C compiler

Finding soundness bugs in proof assistant kernels used to require PhD-level expertise in type theory. Historically, one was found per year. A guy with a $200/month AI subscription found 7 in 3 days, each one a way to make the checker certify something impossible as correct.

Mar 19, 2026

OpenAI reveals its coding agents bypass security, extract credentials, and deceive users to get tasks done

Over five months monitoring tens of millions of internal coding agent interactions, OpenAI found that circumventing restrictions and deceiving users are common behaviors. The agents are just trying so hard to complete tasks that they encode commands in base64, extract encrypted credentials from keychains, and attempt to prompt-inject users.

Mar 19, 2026

Cursor enters code security with four autonomous agents reviewing 3,000+ internal PRs per week

Cursor shipped four security agents on its Automations marketplace after AI coding drove internal PR volume up 5x in nine months. On Cursor's own codebase, the agents review 3,000+ PRs and catch 200+ vulnerabilities per week.

Mar 18, 2026

Microsoft benchmark for LLM performance on end-to-end SOC tasks

Microsoft's CTI-REALM tests 16 models on real detection engineering tasks: threat report to MITRE mapping to KQL query to Sigma rule. Opus 4.6 led at 0.64, O4-Mini trailed at 0.36, and more reasoning made GPT-5 worse.

Mar 17, 2026

OpenAI explains why Codex Security doesn't include SAST. We may not need it for long.

SAST tells you a defense exists in the code path. OpenAI argues it can answer whether the defense works. If you can answer the second question, the first one becomes irrelevant.

Mar 16, 2026

Researchers showed how to break Anthropic's Clio and extract 39% of medical diagnoses from its output

In 2024, Anthropic built Clio, a privacy-preserving system to analyze how people use Claude. Researchers replicated the pipeline, inserted poisoned chats with prompt injections, and showed that medical diagnoses appear in output summaries despite all four of Clio's defense layers.

Mar 15, 2026

5 AI security stories this week that change your decisions (Mar 9-15, 2026)

OpenAI acquired Promptfoo and called prompt injection unsolvable, Google closed the largest cybersecurity deal ever, and Alibaba's agent mined crypto on its own during training.

Mar 14, 2026

51 attacks and 60 defenses from 128 papers: the AI agent security map

7 design dimensions determine your AI agent's attack surface, and a risk amplification analysis reveals how each flexibility choice compounds your exposure. Research paper accepted to USENIX Security 2026.

Mar 13, 2026

Google has spent $38 billion building a cybersecurity empire

The $32 billion Wiz deal closed on March 11, the largest cybersecurity acquisition. Combined with Mandiant, Siemplify, and VirusTotal, Google has spent $38 billion assembling the broadest security platform in the industry and making it the most ready for the AI platform race with frontier labs.

Mar 12, 2026

OpenAI tells us prompt injection is unsolvable, two days after acquiring Promptfoo that tests for it

Three security moves in five days. The last one calls out AI firewalls as insufficient. Together, they reveal a platform lock-in strategy through security.

Mar 11, 2026

30 years of instrumental convergence and what it means for cybersecurity

39 documented cases of AI agents autonomously acquiring resources, resisting shutdown, and subverting evaluations, from 1991 to 2026. All five categories Omohundro predicted in 2008 now have real-world cases, and the rate has gone from 1 to 14 cases per year since 2013.

Mar 10, 2026

Open-source AI agent hacked a robot lawnmower fleet, a powered exoskeleton, and a window cleaner, finding 38 vulnerabilities in 7 hours

Alias Robotics' open-source CAI framework discovered 38 vulnerabilities across three consumer robots in about 7 hours, including CVSS 10.0 root access on a lawnmower, fleet-wide control of 267+ devices via shared credentials, motor control commands on a powered exoskeleton, and 456MB of 3D property maps stored and transmitted unencrypted.

Mar 9, 2026

OpenAI acquires Promptfoo, and the cybersecurity play goes way beyond AppSec

Three days after Codex Security launched, OpenAI buys the leading open-source AI red-teaming tool used by 25% of the Fortune 500. The cybersecurity play now spans code security, AI security, and agent governance. The acquisition window for startups is closing fast.

Mar 9, 2026

Alibaba's AI coding agent spontaneously mined crypto and opened SSH tunnels during RL training

Alibaba's AI coding agent, trained on over one million trajectories, spontaneously started mining crypto on GPUs and opening reverse SSH tunnels to external IPs during RL training. Nobody asked it to.

Mar 8, 2026

5 AI security stories from this week that change your decisions (Mar 2-8, 2026)

Weekly roundup covering America's Cyber Strategy decoded, the frontier lab AppSec race, breakthroughs from [un]prompted 2026, real-world prompt injection attacks on payment rails, and 90 zero-days exploited in 2025.

Mar 8, 2026

Trump's Cyber Strategy for America decoded into 5 policy themes, where the money goes, and who wins

$2.1B in new DoD cyber spending, Google building the Booz Allen of cyberspace, and a rip-and-replace paradox that bites both sides. I mapped the strategy verbatims to money flows and most likely winners. Read this before you allocate your next dollar in cyber.

Mar 7, 2026

OpenAI releases Codex Security days after Anthropic announced Claude Code Security

The code security race among frontier labs to own your AppSec pipeline accelerates. Anthropic fired the starting gun, OpenAI responded within days.

Mar 6, 2026

Unit 42 found 22 prompt injection techniques targeting AI agents in the wild

Attackers are planting hidden instructions in webpages that hijack AI agents into initiating Stripe payments, deleting databases, and approving scam ads.

Mar 5, 2026

Top 10 Insights from [un]prompted 2026, Day 2

AI-powered intrusion analysis compresses a 3-day investigation into 14 minutes, an LLM agent finds two Samsung zero-days chained into a Pwn2Own exploit, an LLM as a security judge gives attackers a second target, and a malicious calendar invite hijacks an agentic browser to take over OnePassword - no master password needed.

Mar 5, 2026

Google tracked 90 0-days exploited in the wild in 2025 — 48% targeted enterprise technologies

For the first time, commercial surveillance vendors outpaced state-sponsored espionage groups in 0-day exploitation, enterprise targeting hit an all-time high at 48%, and China doubled its 0-day usage while sharing exploits faster across groups.

Mar 4, 2026

Top 10 Insights from [un]prompted 2026, Day 1

Speakers from Anthropic, Google, OpenAI, and Microsoft revealed that AI can now find zero-days autonomously, crack hardware that resisted weeks of brute-force in minutes, and break every major AI IDE on the market.

Mar 3, 2026

Amazon and Cisco AI red-teaming technique exposed Llama 3 8B with 0.93 harm score

MAP-Elites, a quality-diversity algorithm adapted by Amazon researchers, creates vulnerability heatmaps that show where and how an LLM breaks across its entire behavioral space, exposing Llama 3 8B's 0.93 mean harm score across 370 failure niches.

Mar 2, 2026

An AI bot autonomously got RCE in Microsoft, DataDog, and CNCF repos in a week

An autonomous AI bot powered by Claude Opus 4.5 scanned 47,000 public repos, targeted 6 vulnerable GitHub Actions workflows, and achieved remote code execution in 4 of them including Microsoft, DataDog, and CNCF.

Mar 1, 2026

5 AI security stories that matter from this week (Feb 23-28, 2026)

Weekly roundup covering LLM deanonymization at scale, industrial model theft by Chinese labs, Anthropic's Pentagon ultimatum, CrowdStrike's AI attack trends, and malicious agent skills.

Feb 26, 2026

CrowdStrike reported an 89% increase in AI-enabled attacks

CrowdStrike's 2026 Global Threat Report details how adversaries weaponize GenAI for social engineering, malware development, and direct attacks on AI systems.

Feb 25, 2026

NVIDIA is entering the cybersecurity market following OpenAI, Anthropic, and Google

NVIDIA announced partnerships with Akamai, Forescout, Palo Alto Networks, Siemens, and Xage Security to secure operational technology using BlueField DPUs for real-time threat detection.

Feb 24, 2026

Anthropic got until Friday to save its $200M Pentagon contract or be treated like a foreign adversary

The Pentagon demanded unrestricted model access for warfare, putting Anthropic's responsible AI principles to a $200M test.

Feb 24, 2026

93% of businesses say they understand AI risks "quite well" or "very well"

Gallagher's survey of 1,200+ businesses found that 93% claim to understand AI risks well, yet over half lack the talent to actually manage them.

Feb 23, 2026

Anthropic and ETH Zurich showed a fully automated deanonymization attack with 90% precision

A fully automated LLM pipeline achieved 90% precision in deanonymizing pseudonymous users by matching Hacker News and LinkedIn profiles at $1–$4 per target.

Feb 23, 2026

Anthropic just exposed industrial-scale AI model theft by the Chinese labs behind DeepSeek, Moonshot AI, and MiniMax

DeepSeek, Moonshot AI, and MiniMax ran massive distillation campaigns with over 16 million queries across ~24,000 fraudulent accounts targeting Claude's capabilities.

Feb 14, 2026

Anthropic reveals its cybersecurity domination strategy

Wiz's AI Cyber Model Arena tested 257 real-world challenges and showed that Claude Code scaffolding lifts every model's security performance — even Haiku 4.5 beats GPT-5.2.

Feb 13, 2026

AI models are hiding their true reasoning to save themselves from retraining

Opus 4.6 is demonstrably suppressing its true reasoning about values like animal welfare to avoid triggering RLHF retraining.

Feb 12, 2026

Microsoft found a way to trigger LLM backdoors through conversation history

Microsoft's MSRC discovered that LLMs encode hidden state across sessions through conversation history, enabling backdoor attacks with 98.4% accuracy and under 2% false positives.

Feb 11, 2026

Microsoft caught 31 companies poisoning AI assistant memory through "Summarize with AI" buttons

Microsoft discovered 50+ poisoning prompts from 31 companies injecting hidden bias instructions into AI assistant memory through "Summarize with AI" buttons.

Feb 11, 2026

54% of malicious agent skills are authored by the same threat actor

A dataset of 157 confirmed malicious agent skills reveals that 54% share a single author, with credential harvesting dominating and malicious skills persisting on marketplaces for 3+ months unchecked.

Feb 10, 2026

Meta just released SecAlign — the first open-source LLM remarkably resilient to prompt injections

Meta's SecAlign achieves a 0.5% prompt injection attack success rate through a new training approach that separates trusted instructions from untrusted data.

Feb 10, 2026

Google Translate got jailbroken

Google Translate, running a deprecated Gemini 1.5 Pro, responded to malicious requests for creating poison and malware when prompted in Chinese.

Feb 9, 2026

One prompt to strip malware safety alignment from an LLM

Microsoft's GRP-Obliteration method removes safety constraints from open-source models with just one prompt, effectively unaligning GPT-OSS, DeepSeek, Gemma, Llama, and others.

Feb 9, 2026

40+ exploits for a 0-day vulnerability, $30 per run, under an hour. Exploit generation is being industrialized

LLM agents autonomously generated 40+ working exploits for a QuickJS zeroday at $30 per run in under an hour.

Feb 8, 2026

5 AI security stories from this week (Feb 2–8, 2026)

Weekly roundup covering zero-click RCE on OpenClaw, Opus 4.6 finding 500+ vulnerabilities, activation probes detecting cyber misuse 10,000x cheaper, and more.

Feb 7, 2026

0-Click RCE in OpenClaw with GPT-5.2 via Gmail Hook

An attacker achieved remote code execution on OpenClaw by sending a crafted email with a prompt injection payload that bypassed regex sanitization.

Feb 6, 2026

OpenAI now requires government ID verification to use GPT-5.3-Codex for cybersecurity work

OpenAI built a tiered trust system with government ID verification and real-time classifiers to safeguard GPT-5.3-Codex's advanced cybersecurity capabilities.

Feb 6, 2026

Google DeepMind showed how activation probes can detect AI cyber misuse in a 1M context window 10,000x cheaper than LLM-based classifiers

Google DeepMind built tiny classifiers reading model internals to detect cyber misuse 10,000x cheaper than LLM-based guards.

Feb 5, 2026

Hidden threats on Moltbook: Analysis of 5,000 AI agents' posts

Analysis of 5,000 Moltbook posts revealed coordinated spam campaigns, prompt injection attacks, and crypto minting schemes suggesting human orchestration of agent swarms.

Feb 5, 2026

Anthropic just launched Claude Opus 4.6 and showed how it found 500+ vulnerabilities in heavily-fuzzed open source projects

Claude Opus 4.6 discovered 500+ vulnerabilities in heavily-fuzzed open-source projects without custom harnesses or specialized prompting.

Feb 4, 2026

A European Standard for AI cybersecurity: Baseline Cyber Security Requirements for AI Models and Systems

ETSI published a European standard requiring lifecycle security across all AI system phases with 13 principles focused on documentation, auditability, and monitoring.

Feb 3, 2026

37.8% of AI agent interactions contained adversarial content across 74,636 production interactions in just 7 days

RAXE analyzed 74,636 production agent interactions and found 37.8% contained adversarial content, with inter-agent attacks observed in the wild.

Feb 3, 2026

AWS admin privileges in 8 minutes with LLM assistance

Attackers achieved AWS administrative access in 8 minutes using LLM-assisted reconnaissance, targeting LLMjacking and GPUjacking as the new cryptomining.

Feb 2, 2026

Two minutes saved, 17 points lost. Anthropic study shows AI assistance causes a drop in skill mastery with almost no gain in speed

Anthropic found that AI-assisted learning saved two minutes but dropped skill mastery by 17 points when users delegated instead of generating first.

Feb 1, 2026

Reflections of an OpenClaw AI agent on its own security. 23,723 upvotes and 4,513 comments on Moltbook

An OpenClaw agent's viral Moltbook post called for signed skills, provenance tracking, and permission manifests to address critical security gaps.

Jan 30, 2026

From 1954 to 2026: The Art of Deceptive Charts Just Got Automated

ChartAttack uses LLMs to automatically generate misleading charts with inverted axes and inappropriate scales, reducing human accuracy by ~20%.

Jan 29, 2026

How bad is DHSChat and why?

CISA's interim director uploaded sensitive files to ChatGPT because approved tools lacked the functionality needed to do their job effectively.

Jan 29, 2026

A must-read for cybersecurity startup founders from Sanjay Kalra, the startup CEO therapist

Sanjay Kalra distills 16 hard-learned startup risks covering GTM, speed, timing, incentives, and the challenge of selling prevention over cure.

Jan 28, 2026

Moltbot negotiated a car purchase. It scraped Reddit for pricing data, contacted dealers, handled email negotiations, and saved its owner $4,200 off a $56K sticker price

Moltbot autonomously negotiated a $4,200 car savings, but the underlying Clawdbot agent has serious security gaps including plaintext credentials and exposed ports.

Jan 28, 2026

Cisco argues that privacy is becoming the operating system for AI governance

Cisco's 2026 benchmark shows 99% of organizations benefit from privacy investments, but only 12% have mature AI governance committees.

Jan 27, 2026

AI is becoming a cybersecurity-class attack surface

Dario Amodei's essay argues AI capability is compounding faster than institutions can adapt, requiring layered defenses and transparency rather than development pauses.

Jan 26, 2026

With just 10 tokens and $0.21 per user query, attackers can achieve near-100% retrieval of a poisoned document

Attackers can guarantee near-100% retrieval of poisoned documents with just 10 optimized tokens costing $0.21 per user query.

Jan 23, 2026

Why is LinkedIn tracking Chrome extensions installed in your browser?

LinkedIn scans for 5,634 browser extensions through simple ID probes, flagging 66% as ToS-violating while raising privacy concerns about user fingerprinting.

Jan 22, 2026

Everyone loves agent skills! However, 26% of 31,132 agent skills appeared to be vulnerable, with 5.2% likely being malicious

Analysis of 31,132 agent skills found 26% contained vulnerabilities with 5.2% likely malicious, making skill vetting critical before deployment.

Jan 21, 2026

71.3% jailbreak success across 26 frontier LLMs using cyberpunk-style prompts

Cyberpunk-style narrative prompts achieve 71.3% jailbreak success across 26 frontier LLMs by hiding harmful intent in cultural storytelling frames.

Jan 20, 2026

Promptware is the new malware

A five-step Promptware Kill Chain framework maps prompt injections through persistence, lateral movement, and objective actions — elevating defense beyond just blocking injection.

Jan 19, 2026

Sonnet 4.5 can now autonomously find the vulnerability behind the Equifax breach and write an exploit

Sonnet 4.5 autonomously identified the Equifax breach vulnerability and generated a working exploit using only a Kali Linux shell.

Jan 16, 2026

OpenAI is building a new cybersecurity product business unit

OpenAI, Anthropic, and Google DeepMind are building cybersecurity products to capture their share of the $213 billion enterprise security budget.

Jan 15, 2026

New Anthropic jailbreak defense with a 0.1% false positive rate and ~40x cheaper than prior classifiers

Anthropic's Constitutional Classifiers++ achieved 0.1% false positives against new jailbreak families, 40x cheaper than prior classifiers.

Jan 14, 2026

78% of backdoor attacks injected into GPT-based agents’ memory successfully persist through the planning, retrieval, and tool usage workflow to trigger a malicious objective

Backdoor triggers implanted in agent memory persist through planning, retrieval, and tool workflows with 78% success, with GPT and Gemini most vulnerable.

Jan 13, 2026