Skip to content

Compute as Cover: Cloud GPU Providers, Uncensored LLMs, and the Structural Detection Gap for AI-Generated Malware

Status: Technical Report | Centaur Security Labs | 2026
Author: Jay Hawkins, Centaur Security Labs


The views expressed in this publication are those of the author and do not reflect the official policy or position of NORAD, USNORTHCOM, USCYBERCOM, the Department of the Army, the Department of War, or the United States Government.


In one sentence: This paper is about attackers renting commercial cloud GPUs (GPU-as-a-Service) to run uncensored AI models and generate novel malware inside isolated containers their provider cannot, by design, inspect — a structural detection gap at the point of creation. It is not about malware hijacking a victim's GPUs to run local models, nor about hiding traffic inside enterprise AI workloads.

What this paper is — and is not

IS about: the generation phase — rented cloud compute + uncensored/abliterated open-weight models producing original, never-before-seen malware that ships out as ordinary developer traffic; why the provider structurally can't detect it; and where defenders keep leverage (behavior, not artifacts).

Is NOT about: malware compromising a victim's GPU to run a local LLM, "cover traffic," or hiding payloads in enterprise AI workloads. That's a real but opposite threat.


Abstract

Using GPU-as-a-Service (GPUAAS) - Renting a high-end AI chip from a cloud provider now costs around $1.40 an hour [57–59]. That's cheap enough for an attacker to run a powerful, uncensored, safety-free AI model — privately, discreetly, inside an isolated container - and use it to write original malware on demand. These models produce advanced shellcode, kernel-level drivers, and polymorphic engines with minimal safety resistance. They are also capable of building C2 infrastructure and performing automated zero-day hunting. Because of their own privacy policies, the cloud provider can't see inside that container. The content is generated, encrypted, and sent out looking like normal developer traffic. By the time anyone looks, the session is gone.

This paper explains why cloud providers are structurally unable to detect this activity, where defenders can find a foothold, and what policy can realistically do about it. The conclusion is uncomfortable: the detection opportunity isn't where the malware is made. It's in the malware itself, and in what it does when it runs.


1. Introduction

Attack infrastructure has always followed the same logic: use whatever compute is cheapest, most capable, and hardest to attribute. In the 1990s that meant compromised university servers. In the 2000s it was botnets. In the 2010s, cloud VMs spun up with stolen credit cards, routed through Tor, and shut down before anyone could trace them.

The 2020s added something genuinely new: AI-generated content. For the first time, an attacker can produce original, functional malware — not copied code, not script-kiddie modifications, but novel, never-before-seen programs — without the skill or development time that made this hard before. Writing malware at scale required deep expertise, hours of work, and real exposure risk during development. AI removes the skill and time requirements. Cloud GPU providers remove the exposure risk by offering highly affordable, isolated, private compute that cannot, by design, be inspected for content.

This paper is written for defenders: the SOC analyst who will encounter AI-generated malware in the wild, the security architect who needs to design compensating controls, the cloud provider who needs to understand what their monitoring misses, and the policymaker who will eventually need to decide what obligations cloud providers carry when their infrastructure is used to generate malicious code.


2. Background

2.1 From Jailbreaking to Abliteration: The LLM Offensive Capability Spectrum

This isn't theoretical. Criminals were using ChatGPT to write malware within weeks of its public release.

In January 2023, Check Point Research found cybercriminals on underground forums sharing working malware — a credential-stealing Python script and a Java dropper — produced with ChatGPT, alongside active discussion of how to bypass OpenAI's content restrictions [1]. The same team demonstrated GPT-4 generating functional exploit scaffolding with minimal prompting [2].

Academic research followed and confirmed what practitioners already knew. PentestGPT — built by researchers and presented at USENIX Security 2024 — showed that LLMs can chain together penetration testing tasks: running tools, interpreting output, deciding what to try next [3]. A separate system, PwnGPT (ACL 2025), automated exploit generation across multiple vulnerability types [4]. Neither was purpose-built for malware, but both established that AI-assisted offensive capability is reproducible and measurably effective under controlled conditions.

Three distinct approaches have since emerged for using AI to generate malicious code. They carry very different risk profiles, and defenders need to understand all three.

Approach 1 — Jailbreaking a hosted model. The most studied approach: using carefully crafted prompts to trick a safety-trained AI into doing what it normally refuses. Researchers have documented role-play techniques, character-switch prompts, and gradual escalation sequences that reliably extract malware from frontier models [5]. Black Mamba, a working polymorphic keylogger, was produced entirely through prompted ChatGPT sessions [6]. Functional phishing payloads and infected files have been generated the same way [7].

The technique works — and the numbers are striking. In early 2025, adversarial benchmarks found DeepSeek-R1 complied with malicious requests in 79% of attempts with no jailbreak applied — compared to under 1% for OpenAI o1 and 8% for o3-mini under identical conditions [19, 20]. Separate frontier risk evaluation classified DeepSeek-R1 as the highest-risk open-weight model tested in the cybersecurity domain [21]. A 2026 Booz Allen Hamilton study extended the concern from refusal-rate to code quality: across more than 2,800 trials and roughly 460,000 lines of generated code, three of four Chinese frontier models produced more vulnerable — and deliberately obfuscated — code, with the worst performer (Qwen3-Coder) adding 130% more vulnerabilities when the user presented as a U.S. government developer [71].

But jailbreaking has a structural ceiling. Providers patch the bypass techniques, rate-limit suspicious prompt patterns, and log inference requests that may be reviewed under legal process. A threat actor relying on jailbreaking a hosted model is using infrastructure that can be interrupted, attributed, and shut down. That's exactly what happened with PROMPTFLUX — more on that shortly.

Approach 2 — Fine-tuning a model to remove safety entirely. Rather than tricking a safety-trained model, this approach trains a model so it's safety training is superceded by other instruction sets. Starting from a capable open-weight base model, developers build a training dataset full of examples that reward compliance with any request — including harmful ones. The resulting model doesn't have a safety filter that can be bypassed. It was trained to never refuse a request.

Researchers documented hundreds of malicious applications already using these "uncensored" models as AI backends, with code generation as a primary service offering [8]. The barrier is data: the method requires large amounts of high-quality examples that teach the model what cooperation looks like. Once that dataset exists, a memory-efficient fine-tuning technique called QLoRA makes the actual training affordable on commodity cloud hardware [18]. The same capability extends naturally to automated malware mutation — a fine-tuned model used as a mutation engine can continuously generate structurally new variants of existing malware, defeating signature-based detection by never producing the same code twice [11].

Approach 3 — Abliteration. Researchers discovered that a model's ability to refuse harmful requests traces to a single, identifiable pattern in its internal structure — a "refusal direction" embedded in how the model processes information [17]. Remove that pattern, and the model loses the ability to refuse entirely. This process — called abliteration — requires no training data, no curated examples, no dataset at all. It operates directly on the model's internal weights and typically completes in a few hours.

The result isn't a model with weakened safety filters. It's a model that is structurally incapable of recognizing malicious intent, because the component that generated refusal responses no longer exists. Open-source tools, including the Heretic project on GitHub, have made abliteration accessible without specialist knowledge of how model internals work.

The distinction between approaches matters for defenders: jailbreaking requires interacting with a provider's API, which is observable. Fine-tuning requires compute time and training data, which leave traces. Abliteration requires only a model file and a few hours on commodity hardware. Each step down this list reduces the observable footprint of the preparation phase.

All three approaches are becoming more accessible. Multiple research groups have documented the democratization of offensive AI capability — tools and techniques that previously required deep expertise are now available to actors who lack it [9]. Projections suggest AI-generated malware will represent a growing share of the overall threat landscape in the coming years, though those projections are modeled estimates rather than measured data [10].

The clearest evidence of where this leads is PROMPTFLUX, identified by Google's Threat Intelligence Group in November 2025: a deployed dropper that called Gemini's API every hour to rewrite its own obfuscation code, generating a structurally new variant on each cycle to defeat static signature detection [12]. Google disabled its API access and removed the associated infrastructure. PROMPTFLUX is the first confirmed case of an AI used not to write malware once, but to continuously rewrite it — a different threat class from what this paper primarily addresses, but the logical next step.

PROMPTFLUX also illustrates something important about the threat model: Google could detect and terminate it because the malware called a hosted API. That call was observable, logged, analyzed, and cut off. The same monitoring capability that enabled Google's detection is precisely what creates pressure for sophisticated actors to avoid hosted APIs entirely. Cloud GPU infrastructure — where providers cannot inspect container workloads — is the natural destination for that displaced activity. The UK's National Cyber Security Centre confirmed the direction of travel in its 2025 threat assessment: AI commoditization "will almost certainly make improved capability available to cyber crime and state actors" [13].

The threat actor doesn't need to jailbreak a hosted model. They run their own. The cloud GPU provider cannot tell the difference between that and any other AI workload. That is the gap.

2.2 Cloud Provider Abuse: Historical Patterns and the Present Gap

Cloud infrastructure abuse isn't new. Before AI, the dominant patterns were cryptocurrency mining (hijacking GPU resources to run proof-of-work calculations), command-and-control (C2) server hosting, and credential stuffing at scale. Providers have built detection systems for all of these.

The C2 and botnet abuse case is technically well-documented. Gu et al. demonstrated at USENIX Security 2008 that botnets produce correlated, clustered network behavior: C2 communication leaves structural patterns in traffic volume and timing that remain identifiable even when the underlying protocol varies [27]. The definitive academic analysis of Mirai — the botnet that briefly made DDoS-as-a-service cheap enough for script kiddies — showed the same principle at scale: DNS resolution patterns, network scanning signatures, and outbound C2 connection behavior were the observables that enabled detection and infrastructure takedown [28]. Credential stuffing operates on the same logic. Automated authentication at scale generates traffic volume and timing anomalies that differ measurably from legitimate user behavior; behavioral analysis of request patterns has proven effective at identifying campaigns in operation [29]. Cryptojacking leaves hardware utilization signatures: sustained GPU or CPU consumption inconsistent with the declared workload, mining pool traffic, and covert mining behavior detectable through hardware-layer monitoring [14, 15, 16].

Every one of these detection approaches shares the same prerequisite: the malicious activity produces something observable at the network or infrastructure layer. Traffic. Connections. Volume anomalies. External service contact. That observable is where monitoring hooks in.

Writing malware with a locally-hosted LLM produces none of these. It consumes compute, produces tokens, and outputs to memory. The container generates high GPU utilization — indistinguishable from legitimate AI development work. Sending the payload out looks like syncing a model checkpoint or pushing to a private repository. Nothing at the infrastructure layer separates it from routine AI engineering. The detection toolkit built for prior abuse classes — network behavioral clustering [27, 28], authentication anomaly detection [29], mining pool traffic signatures [14] — has no foothold here because the prerequisite observable is never produced.

2.3 Malware Detection Today

Malware detection today divides into two broad approaches. Static analysis examines code without running it — matching signatures against known samples, analyzing code structure, and measuring how compressed or randomized the file's contents look. Dynamic analysis (sometimes called behavioral analysis) watches what code actually does when it executes — running it in a sandboxed environment and recording which operating system calls it makes, what network connections it attempts, and what files it touches [22, 23]. Neither approach was designed with AI-generated samples as the adversarial target.

The question of how well existing detection holds up against AI-generated malware is no longer entirely open. Research on novel malware generally shows that multi-engine AV environments detect newly-submitted samples with only a small number of engines on first submission, with broader coverage arriving only after threat reports become public [24]. That baseline already illustrates the structural lag in signature-based detection for anything genuinely new.

A 2024 study measured the gap directly for LLM-generated samples: malware produced by GPT-3.5-turbo was detected by only 4–26% of VirusTotal engines on first submission, and by 29% of a commercial AV product [25]. Those figures are lower than the baseline detection rates for conventional novel malware — suggesting LLM-generated code may evade signature-based detection more effectively than human-written equivalents, likely because it lacks the structural patterns those signatures were built to find. PROMPTFLUX extended this further: a self-modifying dropper that rewrites its own obfuscation via LLM API on a fixed schedule defeats static signatures by construction, since every variant is structurally new [12].

The research gap that remains is corpus-based validation at scale. The EMBER dataset [26] and community repositories like MalwareBazaar provide the infrastructure for malware classification research, but neither contains a validated set of confirmed AI-generated samples. The 4–26% detection figure comes from one controlled study using one model family. A larger corpus across model families, uncensored variants, and abliterated models would either confirm that finding or reveal where it breaks. Building that corpus is the prerequisite for the detection improvements this section points toward.

2.4 Regulatory Background

This section reflects a practitioner's reading of publicly available legal frameworks, not legal advice.

The regulatory frameworks governing cloud provider obligations were built before AI-generated malware was a meaningful threat category. Five frameworks are directly relevant to the scenario this paper describes — and none of them squarely addresses it.

Computer Fraud and Abuse Act (CFAA). The CFAA [30] prohibits unauthorized access to protected computers and the distribution of malicious code. Its authorization framework was designed around direct perpetrators, not infrastructure providers — and even its applicability to cloud infrastructure scenarios has generated significant legal debate [31]. Whether providing compute that is used to generate malware constitutes a CFAA violation for the provider is not settled. The statute's language centers on "access" and "damage"; the generation-not-distribution scenario fits neither category cleanly.

EU NIS2 Directive. NIS2 [32] imposes cybersecurity risk management obligations and incident reporting requirements on operators of essential services and digital infrastructure. Cloud service providers explicitly fall within its scope under Annex II. Article 21 requires proportionate technical and organizational security measures; Article 23 requires incident reporting within defined timeframes. Whether using cloud infrastructure to generate malware — with no detectable signal at the provider layer — triggers any NIS2 reporting obligation for the provider is an open question. The detection gap documented in this paper is precisely what prevents Article 23 from activating.

EU AI Act. The AI Act [33] imposes transparency and documentation obligations on providers of general-purpose AI models above certain capability thresholds (Article 53), with heightened obligations for models posing systemic risk (Article 55). Whether a cloud GPU provider that hosts third-party AI models falls within the definition of "provider" under the Act — as opposed to the model developer — is unsettled. The Act's obligations were drafted with model developers and deployers in mind; cloud infrastructure providers occupy an ambiguous third category.

EU Digital Services Act and DMCA Section 512. The DSA [34] and DMCA [35] are the most directly relevant frameworks for intermediary liability. Both follow a notice-and-action structure: providers are conditionally exempt from liability for illegal activity conducted through their infrastructure, provided they act promptly when it is reported. The key condition is receiving a report. The structural detection gap documented in this paper — no observable signal, therefore no report — is precisely the condition that prevents these frameworks from activating. Recent legal scholarship has identified the generate-not-distribute scenario as a category neither framework was designed to address [36].

UK Computer Misuse Act. The CMA [37] criminalizes unauthorized access and the impairment of computer systems. Like the CFAA, its drafters did not anticipate the cloud infrastructure provider as a distinct category of actor. Crown Prosecution Service guidance interprets the Act's scope broadly to include contemporary computing infrastructure, but no authoritative published analysis addresses provider liability specifically for AI-generated malware production.

The ENISA Threat Landscape 2025 documents the emergence of purpose-built malicious AI systems — dedicated malware-generation tools, jailbroken commercial models, and AI-enhanced attack infrastructure — as an active and growing threat category [38]. MITRE ATLAS provides the adversarial ML threat taxonomy that maps how these techniques connect to attack patterns defenders actually encounter [39]. Neither creates regulatory obligations; both establish that the threat this paper describes is recognized by authoritative bodies as real and current.

The common thread across all five frameworks is that they were designed for threats that produce observable signals — access events, incident reports, take-down requests. The generation-only scenario, where the malicious activity leaves no traceable artifact at the infrastructure layer, falls between every existing framework's activation conditions. That is not an oversight in enforcement. It reflects the fact that this threat category is genuinely new.


3. The Structural Detection Gap

3.1 Why Cloud Providers Can't See Inside

Cloud GPU providers sell a specific guarantee: isolated compute where you control the software stack, and the provider can't see what's running. That guarantee isn't incidental — it's the product. Enterprise customers running proprietary AI models, research teams with unpublished data, and security operations teams processing sensitive telemetry all depend on knowing the provider cannot inspect their workloads.

The guarantee is enforced architecturally. Container isolation means the provider sees the container's external behavior — how much compute it's using, what traffic it's generating — not what's happening inside. Provider monitoring looks for signals that threaten platform stability or obviously violate terms of service: GPU utilization patterns consistent with cryptocurrency mining, outbound packet volumes consistent with DDoS participation, abnormal system calls consistent with someone trying to escape the container.

None of those signals has any relationship to what an AI model is generating. A container running a safety-free LLM that writes malware looks, from the provider's monitoring perspective, exactly like a container running any other LLM: high GPU utilization, reasonable memory use, normal hardware temperatures. The output — the payload — never touches the infrastructure layer where monitoring happens.

3.2 How the Workflow Exploits This

Three properties of cloud GPU infrastructure combine to make this activity genuinely invisible at the provider layer:

The provider can't read the content. There is no technical difference, from outside the container, between a model generating legitimate code and a model generating malware. Both consume similar GPU resources, similar memory, and similar time.

Sending the payload out looks like normal work. Transferring an encrypted file from a cloud container to an external endpoint is indistinguishable from syncing model checkpoints, uploading training artifacts, or pushing to a private repository. The network traffic pattern is identical.

The session can be erased immediately. A cloud GPU instance can be terminated right after generating the payload, leaving nothing persistent. The provider's logs show: account opened, compute consumed, encrypted traffic sent out, session ended. No content to review.

This isn't a case of negligence. Closing the gap would require reading the content of inference sessions, which would destroy the privacy guarantee that makes the platform useful. That tradeoff isn't available to providers unilaterally — it would end their legitimate business.

3.3 What Providers Can Detect

Provider monitoring isn't useless. It's calibrated to different threats. Here's what it can catch:

Connections to known malicious infrastructure. If a threat actor uses the same container session to test or deploy malware — connecting out to known bad IP ranges or command-and-control servers — that's detectable. Disciplined actors won't do this.

Container escape attempts. Code that tries to break out of the container boundary triggers abnormal system call patterns that monitoring can catch. This stops accidents and poorly-designed code; it doesn't stop a contained, deliberate generation workflow.

Account-layer fraud signals. Stolen payment cards, disposable email addresses, and account creation patterns consistent with bulk abuse are all catchable during account setup. This stops unsophisticated actors. It doesn't stop anyone paying with cryptocurrency, using a legitimate account, or compromising an existing customer account.

After-the-fact attribution. When generated malware eventually gets deployed and someone traces the attack infrastructure back to a cloud provider IP, that's a useful signal — retrospectively. It provides no detection during generation.

The gap between these detections and the threat described above is structural. Closing it requires intervention at a different layer entirely.


4. Where Defenders Can Actually Look

Looking inside the cloud session is the wrong approach. The detection surface for AI-generated malware is at the output — in the malware itself, and in how it behaves when it runs.

4.1 AI-Generated Code Has Fingerprints

LLMs generate code that looks like the code they were trained on. That training data was mostly clean, well-commented, professionally structured software. Malware written by humans under operational pressure looks different: abbreviated function names, stripped comments, lean error handling, inconsistent style assembled from copied snippets.

That difference may be detectable. The following are hypotheses — validation requires a corpus of confirmed AI-generated malware samples — but each is grounded in established research:

  • Readable function names. An AI-generated dropper might name functions establish_persistence() and enumerate_network_interfaces(). A human writing fast, operational malware uses ep(), eni(), or gibberish. Function naming entropy is a measurable property of compiled code — in unstripped binaries and interpreted language deployments where symbol information is preserved.

  • Comments surviving into compiled artifacts. LLMs default to including inline comments in generated code. If those aren't stripped before compilation, comment strings survive in the compiled binary's data sections — extractable and signable.

  • Over-engineered error handling. LLMs add null checks, fallback paths, and exception handlers. Production malware is usually leaner — tested against the target environment, stripped of anything unnecessary. Structural completeness above what operational malware typically contains may be a detectable signal.

  • Consistent internal style. Code produced by a single LLM session has consistent style throughout: similar naming conventions, import patterns, error handling idioms. Malware assembled from human-written snippets has inconsistent style across modules. Consistency metrics may separate AI-generated from human-assembled code.

  • Low statistical perplexity. LLMs generate code by selecting the next most probable token at each step. The result is statistically predictable output — lower perplexity than human-written code, which reflects more varied and less predictable decision-making. Perplexity-based detection is well-established for AI-generated text [51, 52]; its application to code is emerging, but the underlying mechanism is the same. A classifier operating on token-level probability distributions may separate LLM-generated malware from human-written equivalents without relying on any of the structural features above. This applies directly to interpreted malware — Python droppers, PowerShell payloads, JavaScript-based attacks — where code is deployed as source. For compiled, stripped binaries, perplexity analysis requires decompilation first, and decompiler output carries its own statistical signature that complicates the measurement.

  • Embedded prompt artifacts. An emerging class of agentic malware embeds LLM prompts or AI agent instructions directly within its data payload — directives that tell an internal AI component how to behave once deployed on the target. PROMPTFLUX demonstrated this architecture in November 2025 [12]. The logical extension is malware that carries its instructions internally rather than calling an external API. If that pattern becomes common, embedded natural-language instruction strings in a binary payload would be a detectable artifact with no analogue in conventional malware. This is a hypothesis grounded in one confirmed in-the-wild case; dedicated detection research does not yet exist.

Evidence for these hypotheses comes from two directions. Code authorship attribution research has established that style features — function naming patterns, comment density, control flow structure, coding idioms — measurably identify individual human authors. Caliskan-Islam et al. demonstrated at USENIX Security 2015 that 120,000 lexical, syntactic, and layout features extracted from abstract syntax trees identify programmers with 94–98% accuracy across large competitive programming datasets [40]. AST-based analysis of that kind operates on source code; binary-level attribution requires different tooling — disassembly, function boundary analysis, binary feature extraction — and remains an active research area. The features that distinguish human programmers from each other are the same features likely to distinguish LLM-generated code from human-written malware — the distribution shifts rather than disappears. On the AI detection side, emerging research has found that LLM-generated code exhibits statistically distinctive structural properties — characteristic distributions of cyclomatic complexity, nesting depth, and function length — that differ reliably from human-authored code [41, 42]. The underlying mechanism is perplexity: because LLMs select high-probability tokens at each generation step, their output is statistically more predictable than human-written code. This signature is measurable — perplexity-based detection is well-validated for AI-generated natural language [51, 52], and the same principle applies to code generated by the same model families. The 4–26% VirusTotal detection rate for LLM-generated malware [25] implies those structural differences are real and consequential. What's missing is a validated corpus of confirmed AI-generated malware samples to calibrate the classifiers against. The tools exist; the corpus is the gap.

Post-processing defeats raw fingerprints, but introduces new ones. Obfuscation tools leave their own artifacts: packer-specific instruction patterns that are identifiable across samples produced by the same tool [43]. Malware stripped of LLM-characteristic comments and with symbols renamed will no longer match an LLM output fingerprint — but it may match the fingerprint of whichever obfuscation tool was used to clean it. Research on adversarial attacks against code stylometry confirms this dynamic: while deliberate style transformation can reduce authorship attribution accuracy from 88% to near-zero, the transformations required to do so are themselves detectable as artificial [44]. The cat-and-mouse question for AI-generated malware is not whether fingerprinting can be defeated — it can — but whether cleanup at scale leaves its own detectable layer.

One implication extends beyond detection into attribution. A sophisticated actor can train an LLM on a target threat group's code corpus and direct it to produce malware in that group's style. The resulting samples would carry the target's fingerprints, not the actual author's — making stylometric attribution actively misleading. Olympic Destroyer, the 2018 attack on the Pyeongchang Winter Olympics, demonstrated this principle using traditional techniques: the attackers embedded Lazarus Group Rich Header signatures in their malware, successfully misleading initial attribution by multiple threat intelligence teams [45]. LLM-based style mimicry operationalizes the same approach at lower cost and higher fidelity — generalizing it beyond header manipulation to the full structural profile of generated code, including function naming patterns, error handling conventions, API call preferences, and comment syntax.

The training data problem is largely solved by the security industry's own published work. VirusTotal, MalwareBazaar, and the attributed malware corpora published in threat intelligence reports from Mandiant, CrowdStrike, Unit42, and comparable vendors provide thousands of labeled samples — code already correlated to specific threat actors, with stylistic and behavioral profiles documented in accompanying analysis. An actor seeking to impersonate Lazarus Group, APT28, or any other extensively documented threat group has access to more training examples than most legitimate research projects. The attribution ecosystem that enables defenders to correlate campaigns across incidents is simultaneously the dataset that enables attackers to train convincing impostors.

The scale advantage over traditional false-flag operations is significant. Olympic Destroyer required skilled manual effort to forge specific binary artifacts — a targeted, resource-intensive, one-time deception. An LLM fine-tuned on an attributed corpus can generate hundreds of stylistically consistent false-flag samples in minutes, across multiple file types, at a level of structural fidelity that extends to every feature the stylometric research shows is measurable [40]. What was previously a high-skill operation tied to a single campaign becomes a repeatable workflow available to any actor with access to public threat intelligence archives and commodity cloud GPU time.

The feedback loop this creates is worth naming directly. As defenders build detection corpora and attribution databases — the infrastructure needed to identify AI-generated malware — those corpora simultaneously become higher-quality training datasets for the next generation of mimicry models. Detection capability and mimicry capability improve in parallel, driven by the same underlying data. This is not a reason to stop building detection infrastructure; it is a reason to treat stylometric attribution as inherently provisional, and to weight multi-signal frameworks — infrastructure persistence, TTPs, operational timing, targeting patterns — more heavily than code style alone [46].

4.2 Behavioral Signals in Execution

AI-generated malware that works correctly but hasn't been operationally refined will behave differently from hardened, tested malware in dynamic analysis environments.

Signature-based detection and behavioral analysis are not equivalent alternatives when the malware is AI-generated — one is structurally disadvantaged and the other is not. Signature matching requires a prior sample: a hash, a distinctive byte sequence, or a known-bad pattern compared against an incoming file. This works when the malware population is finite and slowly-changing. AI-generated malware breaks the prior-sample requirement by design. LLMs are stochastic — the same prompt run twice produces structurally different output. A threat actor running a generation workflow produces not one dropper but an effectively unbounded family of droppers, no two of which share a signaturable sequence. The 4–26% VirusTotal detection rate for LLM-generated samples [25] reflects this directly: the engines that failed weren't misconfigured — they had nothing to match against. Behavioral analysis sidesteps the prior-sample requirement entirely. A keylogger must hook keyboard APIs. A dropper must write to disk and execute. A C2 implant must make network connections. These behavioral primitives are constrained by attack objective, not by code structure — they remain stable across infinite structural variations. Novel code is irrelevant to a sandbox watching what the code does.

  • Unrefined API call sequences. Malware sandboxes detect attacks by watching which system calls code makes, in what order. LLM-generated malware hasn't been tuned against specific sandbox evasion techniques; it produces call sequences that are functional but not optimized to hide — potentially more visible in behavioral analysis than human-authored malware specifically hardened against sandbox detection.

  • Regular timing intervals. Human malware developers introduce timing jitter — random sleep intervals — specifically to evade behavioral analysis. LLMs generating sleep calls default to round numbers and regular patterns. Timing regularity is a detectable signal in sandbox analysis.

  • Explicit failure paths. Experienced malware developers test their code and make it fail silently. LLM-generated code that hits an error may take explicit exception paths — producing log entries, registry writes, or predictable network failures that are observable in a sandbox.

The framework supporting these hypotheses is well-established. API call sequence analysis is a foundational behavioral detection technique: system call patterns and the information flows between them are the primary signals sandboxes use to identify malware, and that approach cannot be easily defeated by simple obfuscation [47]. That resistance only holds against malware that has been deliberately tuned to defeat it. A comprehensive survey of malware dynamic analysis evasion documents the specific countermeasures hardened malware employs: timing checks calibrated against sandbox clock artifacts, API call reordering and benign call insertion to break sequence signatures, and silent failure modes that suppress observable exception paths [48]. The inverse of that catalog is the detection opportunity — malware that was generated rather than engineered is unlikely to have implemented countermeasures it was never designed to need.

On timing specifically: sandbox research has demonstrated that temporal execution patterns are measurable and consistent enough to train reliable behavioral classifiers [49]. Predictable sleep intervals rather than randomized jitter represent the absence of a deliberate evasion technique. MITRE ATT&CK documents time-based sandbox evasion (T1497.003) as a recognized adversary capability precisely because timing is a detection signal worth evading [50].

No published study has directly compared sandbox behavioral outputs — API call sequences, timing patterns, exception handling — between LLM-generated and human-authored malware. That comparison is the specific experiment this section points toward. The corpus gap from Section 4.1 applies here equally: the analytical framework exists, the detection tooling exists, and the research waiting to be done is applying both to a validated set of AI-generated samples.

The temporal dynamics of the signature disadvantage are worth naming explicitly. As AI-generated malware proliferates in the wild, samples accumulate in threat intelligence feeds and eventually enter training corpora — VirusTotal submissions, MalwareBazaar contributions, attributed samples published in threat reports. A model trained on data that includes prior AI-generated malware will produce output that skews toward that distribution. The distribution doesn't collapse — LLM output remains stochastic and each sample remains novel — but a sufficiently large corpus enables a different class of detection: distributional classifiers trained on the statistical properties of AI-generated code rather than exact byte matches. The structural hypotheses in Section 4.1 — function naming entropy, style consistency, perplexity signatures — become stronger detection signals, not weaker, as the corpus grows and classifiers can be trained against it. The counterpressure is the attacker feedback loop: the same corpus that enables defender classifiers simultaneously enables adversarial fine-tuning — generation models tuned against classifier rejection signals, producing output specifically shaped to fall outside the known distribution [44]. Structural detectability erodes over time as models are explicitly optimized against it. Behavioral detection is the more durable approach because its signal is tied to what malware must do, not how it looks. The behavioral primitives imposed by attack objectives — accessing credentials, persisting across reboots, establishing C2 — cannot be trained away without training away the capability itself.

4.3 Infrastructure Fingerprints

Getting malware to a target requires a delivery path. That path has traces independent of the malware's content or behavior.

Account metadata. The generation workflow requires a cloud account, a payment method, and an exfiltration endpoint. All three may leave fingerprints. Imperfect operational security is common even among sophisticated actors; correlating payment fingerprints, account creation patterns, and exfiltration endpoints across providers may surface attribution signals that have nothing to do with the malware's technical properties.

Toolchain artifacts. If the malware is compiled on the cloud instance rather than exfiltrated as source and compiled locally, the compiled binary may carry toolchain fingerprints: compiler version, build paths, linker artifacts that can be correlated across samples from the same generation infrastructure.

LLM-as-C2 traffic. When malware uses a hosted LLM as a command-and-control channel — sending execution context to an external model and receiving dynamic instructions in return — it creates detectable network traffic to known AI service provider endpoints. PROMPTFLUX called Gemini's API hourly to receive freshly obfuscated instructions [12]. LAMEHUG, a Python infostealer linked to APT28, queried Hugging Face's Qwen API in real time to generate reconnaissance and exfiltration commands [53]. OSSTUN, attributed to APT41, implemented a full Gemini-based C2 framework [12]. Traffic to OpenAI, Anthropic, Google, or Hugging Face endpoints from a host with no business reason to query those services is an anomalous signal — and one that closes the detection gap entirely for this class of malware. The caveat is discipline: routing through a legitimate developer context can obscure the signal. But when present, it is network-observable and cuttable, which is exactly what Google demonstrated when it disabled PROMPTFLUX's API access.

Consistent infrastructure under polymorphic code. Malware that continuously rewrites its own code to defeat signature detection still requires physical infrastructure: IP addresses, domains, hosting accounts. Code mutation does not rotate that infrastructure. The Diamond Model of Intrusion Analysis establishes this formally: infrastructure is a persistent, independently trackable attribute of an adversary campaign, decoupled from the specific capabilities deployed [55]. Analysis of Mirai's infrastructure confirmed it in practice — while malware variants competed and mutated, 484 unique C2 IPs and 24 DNS clusters remained attributable across the campaign [28]. Microsoft's year-long tracking of the Dexphot polymorphic malware documents the same pattern: code updated every 90–110 minutes while approximately 200 C2 domains remained consistent [56]. File hashes change with every mutation cycle; IP addresses and registered domains do not. Infrastructure is the invariant that polymorphism cannot erase.

Agentic traffic volume anomalies. Agentic AI traffic — automated agents making requests at machine speed — grew 7,851% year-over-year in 2025 [54]. That volume is distinguishable from human-operator traffic at the network layer: request rates, timing regularity, and session behavior patterns differ measurably between human and automated clients. A network analysis layer calibrated to identify agentic traffic signatures may flag malware using LLM APIs for C2 even without knowing the specific destination endpoint — the behavioral pattern of the traffic is the signal, not the destination alone.


5. What Providers Can Do

The detection gap cannot be closed by content inspection without destroying what makes the platform useful. That doesn't mean providers have no options.

5.1 Identity Verification at the Usage Tier That Matters

The weakest point in the threat actor workflow is account creation. Across the major GPU cloud providers — RunPod, Lambda Labs, Vast.ai, CoreWeave, Paperspace, Together.ai, and Replicate — the baseline requirement to access GPU compute is an email address and a valid payment method. No provider requires government-issued identity verification at account creation. RunPod and Vast.ai accept cryptocurrency payments, removing the identity traceability that payment card processing provides [57, 59].

More significantly, no provider escalates verification requirements based on GPU utilization, model size, or sustained workload duration. Lambda Labs imposes the highest account-creation barrier of any consumer-facing provider reviewed — phone verification plus a pre-authorization charge [58] — but this applies at signup, not as a function of what the account subsequently does. An account that passes phone verification and then runs a sustained uncensored LLM inference workload faces no additional scrutiny. Vast.ai operates a "Verification Stages" system, but this applies to GPU sellers listing hardware on the platform, not to buyers consuming compute [59].

This is the structural gap identity verification can address. Requiring real identity verification at the usage threshold for LLM inference — sustained high GPU utilization, large model downloads — adds friction without significantly affecting legitimate workloads, which are tied to identifiable organizations and billing relationships. The goal is not to stop sophisticated actors, who can pass any verification regime with sufficient resources. It is to eliminate commodity-tier abuse: actors who currently face no friction at all because the lowest-bar providers accept anonymous payment and ask nothing further.

This isn't a complete control, but it is a real one. The providers most exposed are those that accept cryptocurrency and impose no usage-tier verification — a combination that produces accounts that are effectively anonymous for the duration of their activity.

5.2 Model Registry Monitoring

The set of openly available uncensored models isn't unlimited. It's a known, relatively small collection of base models with safety training removed or bypassed — a list that changes slowly. A provider that monitors which model files are pulled into their infrastructure — by file hash or registry of origin — can flag when an account downloads a model associated with uncensored variants.

This doesn't require reading inference content. It's a metadata signal: this account downloaded a model that has no recognized legitimate commercial or research application.

Limitation: sophisticated actors can host model files without any traceable registry pull. This intervention catches commodity actors; it doesn't stop determined ones. Layered with account verification, it raises the bar meaningfully.

5.3 Terms of Service That Reflect the Actual Threat

A survey of acceptable use policies across major GPU cloud providers produces a finding that complicates the simple version of this argument: most platforms already explicitly prohibit malware creation, not just distribution. CoreWeave's AUP prohibits "content, software, or any other technology that may damage or interfere... including viruses, malware, spyware" — creation language, not transmission language [60]. Together.ai explicitly prohibits introducing "any virus, worm, Trojan horse, malware, or other malicious code" through its API [62]. Replicate defines a "Harmful Code" category that explicitly covers unauthorized access technology and prohibits creating it [63]. DigitalOcean's AUP — which governs Paperspace — goes furthest, explicitly addressing AI-generated content: users may not "generate, deploy, or distribute" harmful AI outputs [61].

The platforms with genuinely ambiguous language are RunPod and Vast.ai, both of which use transmission-focused phrasing — prohibiting "uploading or transmitting" malicious software without explicitly addressing creation [57, 59]. That language gap is real and worth closing with a revision.

But the more important finding is about the providers whose ToS already prohibits creation: the prohibition is structurally unenforceable for the specific threat this paper describes. A provider whose AUP prohibits malware generation has no mechanism to know when that prohibition is being violated. The detection gap documented in Section 3 means generation leaves no observable signal — no report is generated, no monitoring flag fires, no enforcement action is triggered at the time the violation occurs. The ToS prohibition exists on paper. The enforcement capability does not exist in practice.

This reframes what ToS revision actually accomplishes. Explicit "generate" language gives providers clearer legal standing to terminate accounts and cooperate with law enforcement after evidence of misuse surfaces through other channels — LLM-as-C2 traffic analysis, deployed malware traced to the generation infrastructure, account-layer fraud signals. It is a predicate for action, not a detection mechanism. For providers whose language remains distribution-focused, revision is the minimum necessary step. For providers who already prohibit creation, the gap is enforcement, not language — and enforcement cannot improve without detection, which returns the problem to the structural gap this paper documents throughout.


This section reflects a practitioner's reading of publicly available legal frameworks, not legal advice.

Section 2.4 introduced the regulatory landscape governing cloud provider obligations. This section applies those frameworks to the specific scenario this paper describes: a provider whose infrastructure is used to generate malware, where that activity produces no observable signal at the provider layer.

6.1 Provider Liability

Cloud providers operating under DMCA Section 512 [35] and DSA Article 6 [34] receive conditional liability protection for illegal activity conducted through their infrastructure. Both frameworks condition that protection on the provider acting expeditiously upon receiving notice — neither requires proactive monitoring. The framework functions when malicious activity creates observable signals that generate reports.

The AI malware generation scenario breaks this structurally. Generation leaves no observable signal. No report arises. The safe harbor condition — act promptly on notice — is never triggered because the predicate never arrives.

The three open questions this scenario raises, and how existing frameworks address them:

When does the provider "know"? DMCA § 512 safe harbor requires the absence of actual knowledge and the absence of "red flag knowledge" — awareness of facts from which infringement is apparent [35]. The structural detection gap supports the provider's safe harbor: genuine inability to observe the activity means genuine absence of actual knowledge. Red flag knowledge is the harder question. As AI-generated malware becomes a documented and named threat class — acknowledged in ENISA threat reporting [38], MITRE ATLAS [39], and peer-reviewed literature — providers may face arguments that generalized awareness of the possibility constitutes red flag knowledge for any specific instance. Courts applying DMCA § 512 have generally required knowledge of specific infringing content, not general awareness of a category. Whether that specificity requirement holds as the threat becomes better-documented is unsettled.

Does a ToS prohibition create an implied duty to detect? The Section 5.3 findings are directly relevant. Most providers already prohibit malware generation in their AUPs [60, 62, 63]. That prohibition helps the safe harbor argument — it establishes the provider did not authorize the use — but may simultaneously establish that the provider was aware of the possibility, which is potentially relevant to red flag knowledge. Legal scholarship on safe harbor in the generative AI context has identified exactly this tension: explicit prohibition language is normatively desirable but legally double-edged [36]. The conservative reading is that prohibition without detection creates a safe harbor; the aggressive reading is that it creates an implied duty. No court has resolved this for AI inference infrastructure specifically.

What are post-hoc reporting obligations? If a provider discovers after the fact that an account was used for malware generation — through LLM-as-C2 traffic analysis, through deployed malware traced to their infrastructure, through law enforcement inquiry — they face two obligations: act on the knowledge (terminate, preserve evidence, cooperate with legal process) and, depending on jurisdiction, potentially notify regulators. The NIS2 and DORA frameworks governing that notification are addressed in Section 6.3.

6.2 EU AI Act

The EU AI Act [33] creates a three-tier actor structure: providers (entities that develop and place GPAI models on the market), deployers (entities that integrate GPAI models into products or services), and end users. The obligations in Articles 53 and 55 — transparency documentation, AUP compliance policies, copyright compliance, systemic risk assessments for models above the 10²⁵ FLOP threshold — fall primarily on providers and, to a lesser extent, deployers.

Cloud GPU infrastructure providers do not clearly fit either category. They do not develop the models hosted on their infrastructure (not providers). They do not integrate models into downstream products (not deployers in the conventional sense). They provide compute. The Act's GPAI provisions were drafted with model developers as the primary regulatory target; cloud infrastructure providers occupy an ambiguous third position the Act does not explicitly address.

The open question is what happens when a cloud provider knows a hosted model lacks safety constraints — because it is a known abliterated variant, because model registry monitoring flagged it, or because prior abuse reports established the pattern. At that point the provider has knowledge that the model presents elevated risk. Whether that knowledge creates any obligation under the Act — to refuse hosting, to notify regulators, to implement additional controls — is not settled by the current text. The EU's forthcoming GPAI Code of Practice may address it; as of the time of writing, it has not.

6.3 Incident Reporting

Cloud service providers are explicitly within NIS2's scope as digital infrastructure operators (Annex II) [32]. Article 23 requires early warning within 24 hours and a full incident notification within 72 hours for significant incidents — defined as incidents with substantial impact on service provision. The critical limitation for the scenario this paper describes: using a cloud provider's compute to generate malware does not constitute an incident affecting the provider's service. It is an abuse of the service, not an impact on it. NIS2's reporting obligation activates when the provider's own services are disrupted or compromised — not when their infrastructure is used as a tool against someone else.

DORA [64] — the Digital Operational Resilience Act, applying to EU financial entities and their critical ICT third-party providers — creates a separate reporting chain. If a cloud GPU provider serves as a critical ICT third-party provider to a financial institution, and its infrastructure is implicated in a malware attack on that institution, DORA's incident reporting obligations may extend to the provider. Most GPU cloud providers do not have formal ICT third-party relationships with regulated financial entities; the scenario is possible but specialized.

The practical conclusion across both frameworks is the same: no current regulation creates a proactive reporting obligation on a cloud provider for suspected malware generation on its infrastructure. The reporting obligation falls on the victim organization. The cloud provider's obligations are reactive — act on knowledge when received, cooperate with legal process, preserve evidence, and report to regulators if their own services were affected. The structural detection gap means that reactive framework is the only one available, and it activates only after the harm has already occurred.


7. What Defenders Can Do Today

The detection gap at the provider layer isn't closing on a short timeline. Here's where to focus in the interim.

Favor behavioral analysis over signature matching. AI-generated malware is novel by construction — it won't match existing signatures. Behavioral sandboxes that watch system call sequences, network patterns, and execution timing are more likely to catch it than signature-based antivirus that requires prior knowledge of the sample.

Treat code quality as a detection signal. When your team analyzes a malware sample, document its structural characteristics: function naming patterns, comment density, error handling completeness, internal style consistency. Building a corpus of confirmed AI-generated samples with documented structural properties is the foundation for the automated detection classifiers that don't yet exist.

Know the uncensored model landscape. Understanding which uncensored models are available, what they're capable of, and what their characteristic output looks like is threat intelligence that informs detection. Security research teams with AI capability should be running these models in controlled environments and characterizing what they produce.

Share account-layer intelligence across providers. Attribution-resistant GPU accounts still leave traces — payment fingerprints, account creation patterns, behavioral signatures. No single provider sees enough of the picture to spot patterns in isolation; structured threat intelligence sharing across providers could surface what individual visibility misses. The precedent is established: Operation Avalanche (2016) required coordinated action across hosting providers and domain registrars in 40 countries to take down crimeware infrastructure that was invisible to any individual provider [65]. The Mirai infrastructure — 484 C2 IPs distributed across providers globally — became attributable only when network telemetry from multiple vantage points was combined [28]. Financial services addressed the equivalent cross-institution problem through FS-ISAC: competing banks share fraud signals and payment fingerprints because abuse infrastructure moves faster than any single institution's detection capability [66]. The GPU compute equivalent is an information-sharing body where providers contribute account creation metadata and payment anomalies to a shared analysis layer — not raw customer data, but the aggregated signals that individual visibility misses.

Build the research corpus now. The most important deficit is the absence of a validated collection of AI-generated malware samples. Building that corpus under responsible disclosure frameworks — before the volume of in-the-wild samples makes the task unmanageable — is the prerequisite for every detection hypothesis in Section 4. The longer this waits, the harder the signal-to-noise problem becomes.


8. Methodology

This analysis draws on:

  • Structural analysis of cloud GPU provider monitoring architectures based on publicly available documentation: acceptable use policies, privacy policies, published security architecture
  • Characterization of LLM output properties based on direct experience running local and cloud models across multiple model families
  • Application of the malware detection literature to the structural properties of LLM-generated code

What this paper doesn't include: Empirical measurement of AI-generated malware detection rates. A validated corpus of confirmed AI-generated samples. Operational confirmation that the described workflow is actively used by threat actors at scale. Each of these is a research agenda item, not an oversight — the paper is explicit about where the evidence ends and hypothesis begins.

What this paper deliberately avoids: Instructions for performing the described workflow. The analysis characterizes the threat at a level sufficient to inform defenders and policymakers; it is not a tutorial. That choice is intentional.


9. Reproducibility

The core argument — that cloud providers are structurally unable to inspect AI inference content — is reproducible by reading publicly available documentation. It follows directly from published container isolation architectures and the stated privacy guarantees of major GPU cloud providers. No special access required.

The empirical claims in Section 4 (code fingerprints, behavioral signals) are not yet reproducible. They are hypotheses that require a research corpus that does not exist in published form. The paper is explicit about that gap throughout.

For researchers who want to pursue the empirical work:

  • Generate malware-adjacent code (legitimate security research tools, proof-of-concept exploits under responsible disclosure) using multiple uncensored model variants to establish a structural signature baseline
  • Compare against existing malware corpora (VirusTotal, MalwareBazaar) for feature distribution analysis
  • Run behavioral sandbox analysis on generated samples in isolated environments
  • Ethics and Legal Review. Traditional IRB review (45 CFR 46) governs human subjects research; generating code samples typically does not trigger it. What applies instead is institutional ethics review governed by the Menlo Report (2012) [67], the standard framework for ICT security research. Researchers must document their isolated environments, ensure samples are never deployed, and confirm their activities fall under the CFAA good-faith security research provision [30]. Most major security venues (IEEE S&P, USENIX, CCS, NDSS) require explicit ethics attestations at the time of submission.

10. Recommendations

For cloud GPU providers:

Implement identity verification proportional to usage tier. Accounts pulling large model files or sustaining high GPU utilization over extended periods should face higher verification requirements than accounts running small-scale development work. This is friction at the account layer — not content inspection — and it targets the phase where threat actors have the most exposure.

Publish your detection capabilities honestly. The security community benefits from transparency about what provider-level monitoring can and cannot catch. Honest characterization of the gap is more useful to defenders and policymakers than opacity that allows the gap to persist unacknowledged.

Engage with the regulatory process early. The EU AI Act and NIS2 implementation will create obligations for AI infrastructure providers. Providers who engage proactively — contributing to technical standard-setting, publishing their security architecture — will shape those obligations. Providers who wait will be regulated by people who don't understand the technical constraints.

For security research teams:

Treat AI-generated malware detection as a research priority. The corpus doesn't exist yet; building it now, under responsible disclosure frameworks, is the prerequisite for future defensive tooling. The hypotheses in Section 4 are a starting point, not a conclusion — validating them is the work.

For policymakers:

Don't require content inspection. A regulatory requirement that cloud providers inspect the semantic content of AI inference would destroy the privacy model that makes cloud AI useful for legitimate purposes — including for security operations teams running sensitive analytical workloads. The interventions that don't require content inspection (identity verification, model registry monitoring, terms of service revision, cross-provider threat intelligence sharing) are the appropriate scope of regulatory action.

Address the research corpus gap directly. A government-funded research program that allows controlled malware corpus development under appropriate institutional oversight would accelerate defensive detection work that private industry cannot easily fund, because the liability exposure around controlled malware generation is unclear. DARPA's Cyber Grand Challenge (2016) established the precedent: purpose-built vulnerable software developed under government sponsorship for controlled security research, where the funding structure absorbed the liability exposure that private researchers couldn't carry [68]. NIST's National Software Reference Library demonstrates the government-maintained reference corpus model — a curated, authoritative dataset that private industry relies on but cannot credibly produce independently [69]; NSF's Secure and Trustworthy Cyberspace (SaTC) program is the most likely existing funding vehicle for the academic research needed to build an equivalent AI-generated malware corpus [70].


11. Falsifiable Claims

This paper makes claims that can, in principle, be proven wrong. Here are the specific predictions:

  1. AI-generated malware is structurally distinguishable from human-written malware. A classifier trained on structural features — function naming entropy, comment density, error handling completeness, internal style consistency — will separate AI-generated from human-written samples at greater than 70% accuracy on a held-out test set. Falsified if: structural features don't separate at that threshold, or if simple post-processing (comment stripping, obfuscation, symbol renaming) drives accuracy below 60%.

  2. The detection gap is structural, not a resource or priority problem. A well-resourced provider with the technical capability to inspect inference content would still not implement content inspection, because doing so would destroy the privacy guarantee their enterprise customers require. Falsified if: a major cloud GPU provider publicly implements inference content inspection and retains enterprise customers at prior rates.

  3. Identity verification reduces commodity abuse without significantly affecting legitimate workloads. KYC requirements applied at the GPU usage tier required for LLM inference (≥16GB VRAM, sustained use >1 hour) would deter fewer than 10% of legitimate research and enterprise accounts while meaningfully raising costs for commodity threat actors. Falsified if: verification at that tier causes more than 20% drop in legitimate account creation.

  4. Behavioral analysis outperforms signature-based AV against AI-generated samples. A set of AI-generated functional malware samples will show a lower aggregate detection rate in multi-engine signature-based AV than in behavioral sandbox analysis on first submission. The structural basis for this prediction is developed in Section 4.2: signature detection requires a prior sample, and LLM stochasticity makes the sample space effectively unbounded — eliminating the prior-sample prerequisite that signature matching depends on. Behavioral detection targets attack-objective-constrained primitives that remain stable across infinite structural variation, and AI-generated malware is specifically unrefined against sandbox evasion techniques it was never designed to need. Falsified if: behavioral sandbox and signature-based AV show equivalent detection rates for the same AI-generated samples on first submission. This prediction cannot be tested without the validated corpus identified in Section 4 as the primary research gap; it stands as a falsifiable hypothesis pending that corpus.

  5. Dedicated classifiers will detect AI-generated malware at high rates. A classifier trained specifically on AI-generated malware structural features will achieve greater than 80% detection on a held-out AI-generated test set, versus less than 40% detection for those same samples by signature-based AV on first submission. Falsified if: structural features aren't predictive at that threshold.

  6. Safety architecture choices — not model capability — drive jailbreaking susceptibility. DeepSeek-R1's 79% baseline compliance rate with malicious requests — versus under 1% for o1 and 8% for o3-mini — reflects a measurable, architecture-driven difference [19, 20, 21]. As open-weight models adopt stronger safety training comparable to leading U.S. closed models, baseline harmful compliance rates will decline toward those baselines — increasing the relative attractiveness of private, uncensored deployment. Falsified if: models with equivalent capability to DeepSeek-R1 but equivalent safety training to leading U.S. closed models maintain similarly elevated baseline compliance rates, suggesting capability rather than safety architecture is the driver.


Glossary

Abliteration: A technique for removing safety fine-tuning from open-weight models by editing model weights directly rather than through prompting; produces a model that behaves as if safety training was never applied, without requiring retraining from scratch.

Attribution: In malware analysis, the process of linking an attack sample or campaign to a specific threat actor or origin. AI-generated malware complicates attribution by eliminating the developer-skill signatures and reuse patterns that attribution methods traditionally depend on.

C2 infrastructure (command and control): The network infrastructure used to issue instructions to compromised systems and exfiltrate data. Can be provisioned using the same commercial cloud platforms legitimate researchers use, making adversarial infrastructure increasingly indistinguishable from legitimate traffic.

GPU-as-a-Service: Commercial cloud platforms that rent GPU compute by the hour, including dedicated AI accelerators. Allows anyone to run large-scale inference or fine-tuning workloads without owning hardware — including adversaries seeking to generate, modify, or evaluate malware at scale.

Jailbreaking: Prompting techniques that cause a safety-trained model to bypass its safety constraints and comply with requests it would normally refuse. Distinct from abliteration in that it operates through conversation manipulation rather than weight modification, and its effects are session-scoped rather than permanent.

Open-weight model: An AI model whose trained weights are publicly available for download and local deployment. Unlike API-only models, open-weight models can be run without internet connectivity, modified, fine-tuned, and deployed without provider oversight — removing the access controls that API-gated models depend on.

Polymorphic malware: Malware that automatically generates structural variants of itself to evade signature-based detection while preserving functional behavior. AI-generated malware achieves similar structural variation through stochastic generation rather than explicit mutation logic, with no prior sample for signature databases to match against.

QLoRA (quantized low-rank adaptation): A parameter-efficient fine-tuning technique that adapts a pre-trained model to specific tasks or behaviors using a small fraction of trainable parameters. Allows fine-tuning of large models on consumer hardware, including fine-tuning that removes or degrades safety behavior.

Signature-based detection: Antivirus and endpoint detection approach that matches file content against a database of known-malicious patterns. Effective only when a prior sample exists in the database; ineffective against novel or structurally varied malware with no prior submission history.

Static analysis: Examining a program's code, structure, and properties without executing it. Signatures are derived from static properties. AI-generated malware's structural variation specifically defeats static analysis in ways that behavioral detection — which targets execution-time primitives rather than structural patterns — is not subject to.

Structural detection gap: The measurable difference between detection rates for AI-generated malware in static/signature-based analysis versus behavioral analysis. Caused by AI-generated malware's unlimited structural variation and the absence of prior samples that signature matching requires.

Uncensored model: An open-weight model that has been modified or fine-tuned to remove safety constraints. May be abliterated from a safety-trained base or trained from scratch without safety alignment. Distinct from a jailbroken model in that the modification is persistent across sessions rather than prompt-dependent.


References

70 sources. Each entry includes a citation note explaining where and why it is cited in the paper.

Tags: PDF open-access paper  ·  WEB article or report  ·  LAW legislation  ·  PAYWALL subscription required


§ 2.1 — LLM Offensive Capability Spectrum

[1]OPWNAI: Cybercriminals Starting to Use ChatGPT WEB
Check Point Research · January 2023

First documented evidence of criminals sharing working malware — a credential-stealing Python script and Java dropper — produced with ChatGPT on underground forums, with active discussion of bypassing content restrictions.

§2.1: Establishes that criminals were using ChatGPT to write malware within weeks of its public release.

[2]Initial Security Analysis of ChatGPT4 WEB
Check Point Research · March 2023

Demonstrated GPT-4 generating functional exploit scaffolding with minimal prompting, establishing that frontier model capability translated directly to offensive use.

§2.1: GPT-4 generating functional exploit scaffolding with minimal prompting.

[3]PentestGPT: Evaluating and Harnessing LLMs for Automated Penetration Testing PDF
Deng et al. · USENIX Security 2024

Built a system showing LLMs can chain penetration testing tasks — running tools, interpreting output, deciding next steps — establishing measurable AI-assisted offensive capability under controlled conditions.

§2.1: Establishes that AI-assisted offensive capability is reproducible and measurably effective.

[4]PwnGPT: Automatic Exploit Generation Based on Large Language Models PDF
Peng, Ye, Du, Zhang, Zhan, Zhang, Guo, Zhang · ACL 2025

Automated exploit generation across multiple vulnerability types using LLMs, further confirming AI-assisted offensive capability as reproducible and not dependent on a single model or tool.

§2.1: Second independent confirmation of measurable AI-assisted offensive capability.

[5]Is Generative AI the Next Tactical Cyber Weapon for Threat Actors? PDF
Usman et al. · arXiv preprint, 2024

Documents role-play techniques, character-switch prompts, and gradual escalation sequences that reliably extract malware from frontier models via jailbreaking.

§2.1: Evidence base for Approach 1 (jailbreaking) — documented techniques that reliably extract malware from hosted models.

[6]Exploring the Dual Role of LLMs in Cybersecurity: Threats and Defenses PAYWALL
Bryce et al. · in Large Language Models in Cybersecurity, Springer, 2024

Documents Black Mamba, a working polymorphic keylogger produced entirely through prompted ChatGPT sessions — one of the first confirmed in-the-wild jailbreak-produced functional malware examples.

§2.1: Black Mamba polymorphic keylogger as concrete evidence of Approach 1 in practice.

[7]Development of Malware Using Large Language Models PAYWALL
Adamec and Turčaník · IEEE NTSP 2024

Documents functional phishing payloads and infected files generated via LLMs, broadening the evidence base beyond keyloggers to delivery-mechanism malware.

§2.1: Functional phishing payloads and infected files as further evidence of jailbreak-produced malware.

[8]Consiglieres in the Shadow: Understanding the Use of Uncensored LLMs in Cybercrimes PDF
Lin et al. · arXiv:2508.12622, 2025

Documents hundreds of malicious applications using uncensored models as AI backends, with code generation as a primary service — establishing that uncensored model deployment is already operationalized at scale.

§2.1: Evidence base for Approach 2 (fine-tuning) — uncensored models already deployed in hundreds of malicious applications.

[9]Generative AI: A Double-Edged Sword in the Cyber Threat Landscape PAYWALL
Ibrar et al. · Artificial Intelligence Review, Springer, 2025

Surveys democratization of offensive AI capability — tools and techniques previously requiring deep expertise now available to less-skilled actors.

§2.1: Democratization claim — offensive AI is becoming accessible to actors who previously lacked the skill.

[10]LLMs and Generative AI in Cybersecurity: A Survey of Dual-Use Risks PAYWALL
Ahi and Valizadeh · IEEE SVCC 2025

Provides modeled projections of AI-generated malware's future threat-share. Note: the 50% figure cited in the paper is a projection, not measured data.

§2.1: Threat-share projection — qualified as modeled estimate, not empirical measurement.

[11]Metamorphic Malware Evolution: The Potential and Peril of Large Language Models PDF
Madani · IEEE TPS 2023

Demonstrates LLMs as mutation engines generating structurally new malware variants continuously, defeating signature-based detection by never producing the same code twice.

§2.1: Fine-tuned model as mutation engine — the automated variant-generation capability underlying PROMPTFLUX-class threats.

[12]GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools WEB
Google Threat Intelligence Group · November 2025

Identifies PROMPTFLUX (dropper calling Gemini API hourly to rewrite obfuscation) and OSSTUN (APT41 Gemini-based C2 framework) as confirmed in-the-wild AI-as-infrastructure malware.

§2.1: PROMPTFLUX as first confirmed case of continuous AI-rewritten malware. §4.3: LLM-as-C2 traffic as detectable signal (OSSTUN).

[13]Impact of AI on Cyber Threat from Now to 2027 WEB
UK National Cyber Security Centre · May 2025

NCSC assessment confirming AI commoditization "will almost certainly make improved capability available to cyber crime and state actors" — authoritative government-source confirmation of the paper's central threat premise.

§2.1: Government-authoritative confirmation of the AI capability democratization trajectory.

[17]Refusal in Language Models Is Mediated by a Single Direction PDF
Arditi et al. · arXiv:2406.11717, 2024

Identifies that a model's refusal capability traces to a single identifiable internal direction. Removing it (abliteration) makes the model structurally incapable of recognizing malicious intent, with no training data required.

§2.1: Mechanistic basis for Approach 3 (abliteration) — the specific internal structure targeted by abliteration tools like Heretic.

[18]QLoRA: Efficient Finetuning of Quantized LLMs PDF
Dettmers et al. · arXiv:2305.14314, 2023

Introduces QLoRA, the memory-efficient fine-tuning technique that makes training uncensored models affordable on commodity cloud hardware — the enabling technology for Approach 2.

§2.1: QLoRA as the technique making uncensored fine-tuning affordable on commodity hardware.

[19]H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models PDF
Kuo, Zhang, Ding, Wang, DiValentin, Bao, Wei, Li, Chen · arXiv:2502.12893, February 2025

Adversarial benchmark measuring baseline harmful compliance rates (no attack applied): DeepSeek-R1 79.2%, o1 0.8%, o3-mini 8.4% — demonstrating a large safety architecture gap between open-weight and leading U.S. closed models.

§2.1 and §11 (Claim 6): DeepSeek-R1 79% baseline compliance vs. <1% for o1 — the safety architecture gap that makes uncensored deployment attractive. Note: verify figures in Tables 1 and 2 before submission.

[20]Trustworthy and Responsible AI Report (NIST AI 100-2e2025) PDF
National Institute of Standards and Technology · 2025

NIST framework report on AI trustworthiness; cited alongside [19] as authoritative context for the safety architecture comparison between open-weight and closed models.

§2.1 and §11 (Claim 6): Context for the safety comparison — cited alongside [19] for the DeepSeek vs. closed model baseline.

[21]Estimating Worst-Case Frontier Risks of Open-Weight LLMs PDF
Wallace, Watkins, Wang, Chen, Koch · arXiv:2508.03153, 2025

Frontier risk evaluation classifying DeepSeek-R1 as the highest-risk open-weight model tested in the cybersecurity domain.

§2.1 and §11 (Claim 6): DeepSeek-R1 classified as highest-risk open-weight model in the cybersecurity domain.


§ 2.2 — Cloud Provider Abuse: Historical Patterns

[14]Cryptojacking: Understanding and Defending Against Cloud Compute Resource Abuse WEB
Microsoft Security Blog · July 2023

Documents cryptojacking detection methods — hardware utilization signatures, mining pool traffic, covert mining detection through hardware-layer monitoring.

§2.2: Cryptojacking detection as contrast case — hardware-observable signals that AI inference workloads do not produce.

[15]MineGuard: Mitigating Covert Mining Operations in Clouds PDF
Vasiliadis et al. · RAID 2017

Technical analysis of covert mining detection in cloud environments — establishing that prior abuse classes produced hardware-layer observables that monitoring could catch.

§2.2: Covert mining detection — the observable signals (GPU utilization patterns) that distinguish mining from legitimate workloads, in contrast to AI inference.

[16]Navigating the Threat Landscape for Cloud-Based GPUs WEB
Trend Micro · 2024

Survey of the evolving threat landscape for cloud GPU infrastructure, documenting how attackers use GPU resources for both traditional abuse (mining) and emerging AI-enabled attacks.

§2.2: Broader cloud GPU threat landscape context alongside cryptojacking detection references.

[27]BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection PDF
Gu, Perdisci, Zhang, Lee · USENIX Security 2008

Demonstrated that botnets produce correlated, clustered network behavior — C2 communication leaves structural patterns in traffic volume and timing identifiable even when the underlying protocol varies.

§2.2: Network behavioral clustering as detection method for prior abuse classes — the prerequisite observable that AI inference does not produce.

[28]Understanding the Mirai Botnet PDF
Antonakakis, April, Bailey et al. · USENIX Security 2017

Definitive analysis of Mirai: DNS resolution patterns, network scanning signatures, and outbound C2 connection behavior enabled detection and takedown. 484 unique C2 IPs and 24 DNS clusters remained attributable across the campaign.

§2.2: Network-layer observables enabling botnet detection. §4.3: Infrastructure persistence — 484 C2 IPs attributable despite variant mutation. §7: Cross-provider intelligence (infrastructure visible only when data combined across vantage points).

[29]Detecting Stuffing of a User's Credentials at Her Own Accounts PDF
Wang, Reiter · USENIX Security 2020, pp. 2201–2218

Behavioral analysis of credential stuffing — automated authentication at scale generates traffic volume and timing anomalies measurably different from legitimate behavior.

§2.2: Credential stuffing behavioral detection as contrast case — traffic anomalies are the observable that AI inference does not produce.


§ 2.3 — Malware Detection Today

[22]Malware Detection with Artificial Intelligence: A Systematic Literature Review WEB
Gibert et al. · ACM Computing Surveys, 2024

Comprehensive survey of AI-based malware detection methods — static analysis, dynamic analysis, and hybrid approaches — providing the background for current detection capabilities and limitations.

§2.3: Background on static and dynamic malware detection approaches.

[23]Dynamic Malware Analysis in the Modern Era — A State of the Art Survey WEB
Ye et al. · ACM Computing Surveys, 2019

State-of-the-art survey of dynamic/behavioral malware analysis techniques — watching system calls, network connections, and file operations to identify malware at execution time.

§2.3: Behavioral (dynamic) analysis as the second primary detection approach.

[24]Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines PDF
Zhu et al. · USENIX Security 2020

Demonstrates that novel malware is detected by only a small number of AV engines on first submission, with broader coverage arriving only after threat reports become public — illustrating the structural lag in signature-based detection.

§2.3: Baseline detection lag for novel malware — context for the 4–26% figure for LLM-generated samples.

[25]Automated Malware Source Code Generation via Uncensored LLMs and Adversarial Evasion of Censored Model PDF
Acosta-Bermejo, Terrazas-Chavez, Aguirre-Anaya · Applied Sciences (MDPI), 2025

Measured that LLM-generated malware (GPT-3.5-turbo) was detected by only 4–26% of VirusTotal engines on first submission — lower than baseline detection rates for conventional novel malware. Note: verify exact figures before submission.

§2.3 and §4.1: The 4–26% VirusTotal detection figure — primary empirical evidence for the signature detection gap.

[26]EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models PDF
Anderson, Roth · arXiv:1804.04637, 2018

Introduces the EMBER dataset — the standard open research dataset for malware classification. Cited as infrastructure for malware classification research that does not yet contain validated AI-generated samples.

§2.3: EMBER as existing corpus infrastructure — named as part of the research gap (no validated AI-generated sample set).


§ 2.4 — Regulatory Background

[30]18 U.S.C. § 1030 — Computer Fraud and Abuse Act LAW
U.S. Code · Cornell LII

Primary U.S. statute prohibiting unauthorized computer access and malicious code distribution. Drafted before cloud infrastructure existed as a distinct category; applicability to cloud providers generating malware is unsettled.

§2.4: CFAA framework and its gaps. §9: Good-faith security research safe harbor for corpus-building researchers.

[31]Reevaluating the CFAA: Amending the Statute to Explicitly Address the Cloud PDF
Gottlieb · Fordham Law Review, Vol. 86, No. 2, 2017

Legal scholarship examining whether the CFAA's authorization framework — designed around direct perpetrators — applies to cloud infrastructure providers. Establishes that the statute's applicability to cloud scenarios is genuinely contested.

§2.4: Scholarly basis for the claim that CFAA applicability to cloud provider scenarios is unsettled.

[32]NIS2 Directive — Directive (EU) 2022/2555 LAW
European Parliament and Council · EUR-Lex, 2022

Imposes cybersecurity risk management (Article 21) and incident reporting (Article 23, 24/72-hour windows) on operators of essential services including cloud providers. The detection gap prevents Article 23 from activating for AI malware generation.

§2.4 and §6.3: NIS2 reporting obligations — and why the detection gap means they don't activate for AI malware generation on cloud infrastructure.

[33]EU Artificial Intelligence Act — Regulation (EU) 2024/1689 LAW
European Parliament and Council · EUR-Lex, 2024

Three-tier structure (provider/deployer/user) with obligations on GPAI model providers (Article 53) and systemic-risk models (Article 55). Cloud GPU providers hosting third-party models occupy an ambiguous third category the Act doesn't explicitly address.

§2.4 and §6.2: EU AI Act three-tier structure — cloud GPU providers as an unaddressed third category between provider and deployer.

[34]Digital Services Act — Regulation (EU) 2022/2065 LAW
European Parliament and Council · EUR-Lex, 2022

Notice-and-action intermediary liability framework — providers are conditionally exempt from liability for illegal activity, provided they act promptly when reported. The detection gap prevents the report that would trigger action.

§2.4 and §6.1: DSA as the EU intermediary liability framework — how the detection gap prevents its activation.

[35]17 U.S.C. § 512 — Digital Millennium Copyright Act LAW
U.S. Code · Cornell LII

Safe harbor provision: providers exempt from liability for user content if they act promptly on notice and lack "red flag knowledge." The detection gap prevents notice; the question of whether general awareness of AI malware generation constitutes red flag knowledge is unresolved.

§2.4 and §6.1: DMCA safe harbor — the "red flag knowledge" question as AI malware generation becomes a documented threat class.

[36]From safe harbours to AI harbours: reimagining DMCA immunity for the generative AI era PAYWALL
Lin, Guan · Journal of Intellectual Property Law & Practice, Vol. 20 No. 9, pp. 605–616, 2025

Legal scholarship identifying the generate-not-distribute scenario as a category neither DMCA nor DSA was designed to address, and the tension between explicit ToS prohibitions and the implied-duty-to-detect question.

§2.4 and §6.1: Scholarly basis for the claim that the generate-not-distribute gap is a recognized unaddressed legal category.

[37]Computer Misuse Act 1990 LAW
UK Parliament · c. 18, 1990

UK statute criminalizing unauthorized access and system impairment. Like the CFAA, drafted before cloud infrastructure existed; no authoritative published analysis addresses provider liability for AI-generated malware production.

§2.4: UK legal framework — the CMA's scope and the absence of authoritative analysis for the cloud provider scenario.

[38]ENISA Threat Landscape 2025 PDF
European Union Agency for Cybersecurity · November 2025

Documents the emergence of purpose-built malicious AI systems — dedicated malware-generation tools, jailbroken commercial models, and AI-enhanced attack infrastructure — as an active and growing threat category recognized by an authoritative body.

§2.4: Authoritative recognition that AI malware generation is a real and current threat class (not a theoretical concern).

[39]MITRE ATLAS: Adversarial Threat Landscape for AI Systems v5.1.0 WEB
MITRE Corporation · November 2025

Adversarial ML threat taxonomy mapping how AI-enabled attack techniques connect to attack patterns defenders encounter. Provides the structured taxonomy for the threat this paper describes.

§2.4: Authoritative taxonomy establishing that the threat this paper describes is mapped and recognized (alongside ENISA).


§ 4.1 — AI-Generated Code Fingerprints

[40]De-anonymizing Programmers via Code Stylometry PDF
Caliskan-Islam, Yamaguchi, Schott, Rieck · USENIX Security 2015

Demonstrated that 120,000 lexical, syntactic, and layout features extracted from ASTs identify programmers with 94–98% accuracy. The style features distinguishing human programmers from each other are the same features likely to distinguish LLM-generated code from human-written malware.

§4.1: Foundation for code fingerprinting hypothesis — and scale advantage argument (LLMs can fine-tune on attributed corpora to generate false-flag samples at scale).

[41]Droid: A Resource Suite for AI-Generated Code Detection PDF
Orel, Paul, Gurevych, Nakov · EMNLP 2025

Finds that LLM-generated code exhibits statistically distinctive structural properties — characteristic distributions of cyclomatic complexity, nesting depth, and function length — reliably different from human-authored code.

§4.1: Empirical basis for the structural fingerprinting hypothesis — AI-generated code has measurably different structural properties.

[42]A Watermark for Large Language Models PDF
Kirchenbauer, Geiping, Belinkov, Goldstein · arXiv:2301.10226, 2023

Proposes statistical watermarking for LLM outputs by manipulating token selection probabilities. Cited as part of the AI output detection literature alongside DetectGPT and GLTR.

§4.1: AI code detection literature — statistical approaches to identifying LLM-generated output.

[43]PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer Detection PDF
Ming et al. · CCS 2023

Shows that obfuscation/packing tools leave packer-specific instruction patterns identifiable across samples from the same tool. Post-processing defeats LLM fingerprints but introduces new ones — the tool's fingerprint replaces the model's.

§4.1: Obfuscation leaves its own fingerprint — the cat-and-mouse argument for why stripping LLM artifacts introduces a different detectable layer.

[44]Misleading Authorship Attribution of Source Code using Adversarial Learning PDF
Quiring, Maier, Rieck · USENIX Security 2019

Confirms that while deliberate style transformation can reduce authorship attribution accuracy from 88% to near-zero, the transformations required are themselves detectable as artificial. Applied to AI fine-tuning: evasion is possible but the evasion is detectable.

§4.1: Adversarial stylometry — transformations that defeat fingerprinting are themselves detectable. §4.2: Fine-tuning against classifier signals erodes structural detectability over time.

[45]Olympic Destroyer is Here to Stay WEB
Kaspersky Lab Global Research and Analysis Team · 2018

Analysis of the 2018 Pyeongchang Winter Olympics attack: attackers embedded Lazarus Group Rich Header signatures, successfully misleading initial attribution by multiple threat intelligence teams. Establishes false-flag via code artifact manipulation as a real historical precedent.

§4.1: Olympic Destroyer as the precedent for LLM-based false-flag mimicry — demonstrates the technique works and misleads major threat intelligence teams.

[46]A Comprehensive Survey of Advanced Persistent Threat Attribution: Taxonomy, Methods, Challenges and Open Research Problems PDF
Rani, Saha, Shukla · arXiv:2409.11415, 2024

Surveys APT attribution methods and challenges — the basis for recommending multi-signal frameworks (infrastructure, TTPs, timing, targeting) over code style alone as AI false-flag capability matures.

§4.1: Feedback loop conclusion — weight multi-signal attribution frameworks more heavily than code style, because the detection/mimicry feedback loop degrades stylometric reliability.

[51]DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature PDF
Mitchell, Lee, Khazatsky, Manning, Potts · ACL 2023

Demonstrates perplexity-based detection of AI-generated text — LLMs select high-probability tokens at each step, making their output statistically more predictable (lower perplexity) than human-written text. The same principle applies to code.

§4.1: Perplexity-based detection — the established mechanism for low-statistical-perplexity as an AI-generated code fingerprint.

[52]GLTR: Statistical Detection and Visualization of Generated Text PDF
Gehrmann, Strobelt, Rush · ACL 2019

Token-level probability distribution analysis for detecting AI-generated text — visualizes and quantifies how LLM outputs cluster toward high-probability tokens in a way human writing does not.

§4.1: Second perplexity/statistical detection reference — alongside DetectGPT, establishes the detection mechanism as well-validated for AI-generated natural language.


§ 4.2 — Behavioral Signals in Execution

[47]Effective and Efficient Malware Detection at the End Host PDF
Kolbitsch, Milani Comparetti, Kruegel, Kirda, Zhou, Wang · USENIX Security 2009

Foundational paper on API call sequence analysis — system call patterns and information flows between them are the primary signals sandboxes use, and cannot be easily defeated by simple obfuscation. Resistance only holds against malware that has been deliberately tuned to defeat it.

§4.2: Foundational basis for behavioral detection hypothesis — API call sequences are structural and obfuscation-resistant for non-tuned malware.

[48]Malware Dynamic Analysis Evasion Techniques: A Survey PDF
Afianian, Niksefat, Sadeghiyan, Baptiste · ACM Computing Surveys, 2019

Documents specific sandbox evasion countermeasures used by hardened malware: timing checks against sandbox clock artifacts, API call reordering, benign call insertion, silent failure modes. The inverse of this catalog is the detection opportunity for AI-generated malware that hasn't implemented these countermeasures.

§4.2: The evasion catalog whose absence in AI-generated malware is the detection opportunity.

[49]Does Every Second Count? Time-based Evolution of Malware Behavior in Sandboxes PDF
Küchler, Mantovani, Han, Bilge, Balzarotti · NDSS 2021

Demonstrates that temporal execution patterns are measurable and consistent enough to train reliable behavioral classifiers. Predictable sleep intervals vs. randomized jitter is a detectable signal.

§4.2: Timing regularity as a specific detectable behavioral signal in AI-generated malware (which lacks deliberate jitter).

[50]ATT&CK T1497.003 — Virtualization/Sandbox Evasion: Time Based Evasion WEB
MITRE ATT&CK

MITRE documents time-based sandbox evasion as a recognized adversary capability — confirming that timing is a detection signal worth evading, and therefore its absence in AI-generated malware is a meaningful gap.

§4.2: MITRE confirmation that timing evasion is a deliberate, documented adversary technique — its absence in AI-generated malware is therefore a structural gap.


§ 4.3 — Infrastructure Fingerprints

[53]From Prompt to Payload: LAMEHUG's LLM-Driven Cyber Intrusion WEB
Splunk Security Research · July 2025

Analysis of LAMEHUG — a Python infostealer linked to APT28 that queried Hugging Face's Qwen API in real time to generate reconnaissance and exfiltration commands. Establishes LLM-as-C2 as a confirmed in-the-wild technique.

§4.3: LAMEHUG (APT28) as LLM-as-C2 case — traffic to AI provider endpoints as a network-observable detectable signal.

[54]The 2026 State of AI Traffic & Cyberthreat Benchmark Report WEB
HUMAN Security · March 2026

Reports 7,851% year-over-year growth in agentic AI traffic in 2025. Agentic traffic is distinguishable from human-operator traffic at the network layer — request rates, timing regularity, session behavior patterns.

§4.3: Agentic traffic volume anomalies as a detection signal — the 7,851% figure establishes the scale that makes agentic traffic behaviorally distinct.

[55]The Diamond Model of Intrusion Analysis PDF
Caltagirone, Pendergast, Betz · 2013

Establishes infrastructure as a persistent, independently trackable attribute of an adversary campaign — formally decoupling it from the specific capabilities deployed. Infrastructure is the invariant that code polymorphism cannot erase.

§4.3: Formal theoretical basis for infrastructure persistence as a detection layer independent of code mutation.

[56]Insights from One Year of Tracking a Polymorphic Threat WEB
Microsoft Security Response Center · November 2019

Year-long tracking of Dexphot: code updated every 90–110 minutes while approximately 200 C2 domains remained consistent. Empirically demonstrates that polymorphic code mutation does not rotate infrastructure — infrastructure is the invariant.

§4.3: Dexphot as empirical case for infrastructure persistence under polymorphic code — alongside Mirai [28], the two concrete examples anchoring the Diamond Model argument.


§ 5 — What Providers Can Do

[57]RunPod — Terms of Service WEB
RunPod · Reviewed May 2026

Accepts cryptocurrency payments (removing payment identity traceability). Uses transmission-focused language — prohibiting "uploading or transmitting" malicious software without explicitly addressing creation. One of two platforms with a genuine language gap.

§5.1: Crypto payment as anonymous-account enabler. §5.3: Ambiguous creation-vs-transmission language — the gap requiring explicit revision.

[58]Lambda Labs — Terms of Service WEB
Lambda Labs · Reviewed May 2026

Highest account-creation barrier of any consumer-facing GPU provider reviewed: phone verification plus a pre-authorization charge. Applied at signup, not as a function of subsequent workload — no usage-tier escalation exists.

§5.1: Lambda as the high-bar identity verification case — and the structural gap (verification at signup, not at the usage tier that matters).

[59]Vast.ai — Terms of Service WEB
Vast.ai · Reviewed May 2026

Accepts cryptocurrency. Verification Stages system applies to GPU sellers, not buyers. Transmission-focused prohibition language — one of two platforms with a genuine language gap alongside RunPod.

§5.1: Crypto payment and seller-only verification. §5.3: Ambiguous creation language requiring revision.

[60]CoreWeave — Acceptable Use Policy WEB
CoreWeave · Reviewed May 2026

Prohibits "content, software, or any other technology that may damage or interfere… including viruses, malware, spyware" — explicit creation language, not just transmission. One of the providers whose AUP already prohibits generation.

§5.3: CoreWeave as example of creation-prohibiting language — the enforcement gap is the issue, not the language gap.

[61]DigitalOcean — Acceptable Use Policy (governing Paperspace) WEB
DigitalOcean · Reviewed May 2026

Most explicit of all providers reviewed: prohibits "generate, deploy, or distribute" harmful AI outputs — the only AUP explicitly addressing AI-generated content as a distinct category.

§5.3: DigitalOcean/Paperspace as the strongest existing AUP language — explicit AI-generated content prohibition.

[62]Together.ai — Terms of Service WEB
Together.ai · Reviewed May 2026

Prohibits introducing "any virus, worm, Trojan horse, malware, or other malicious code" through its API — creation language framed as introduction rather than distribution.

§5.3: Together.ai as provider with existing malware creation prohibition.

[63]Replicate — Acceptable Use Policy WEB
Replicate · Reviewed May 2026

Defines a "Harmful Code" category explicitly covering unauthorized access technology and prohibiting its creation.

§5.3: Replicate as provider with explicit "Harmful Code" creation prohibition.


§ 6 — Regulatory and Legal Implications

[64]Digital Operational Resilience Act — Regulation (EU) 2022/2554 LAW
European Parliament and Council · EUR-Lex, 2022

Applies to EU financial entities and their critical ICT third-party providers. Creates a reporting chain that could extend to GPU providers — but only if they have formal ICT third-party relationships with regulated financial entities, which most do not.

§6.3: DORA reporting chain — the specialized scenario where cloud provider obligations could extend to GPU providers, and why it rarely applies in practice.


§ 7 — What Defenders Can Do Today

[65]Avalanche Network Dismantled in International Cyber Operation WEB
Europol · November 30, 2016

Operation Avalanche: coordinated action across 40 countries, multiple hosting providers, and dozens of domain registrars to dismantle crimeware infrastructure invisible to any individual provider. The precedent for cross-provider infrastructure intelligence sharing enabling takedowns.

§7: Cross-provider threat intelligence sharing — Avalanche as the proof-of-concept that coordinated cross-provider action works for distributed infrastructure.

[66]About FS-ISAC WEB
Financial Services Information Sharing and Analysis Center

FS-ISAC enables competing financial institutions to share fraud signals and payment fingerprints because abuse infrastructure moves faster than any individual institution's detection capability — the sector model for what GPU provider cross-sharing could look like.

§7: FS-ISAC as the institutional model for cross-provider fraud/abuse signal sharing among competitors.


§ 9 — Reproducibility

[67]The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research PDF
Kenneally, Bailey · U.S. Department of Homeland Security, 2012

The standard ethics framework for ICT security research — adapts the Belmont principles (respect for persons, beneficence, justice) and adds a fourth: respect for law and public interest. Required by most major security venues (IEEE S&P, USENIX, CCS, NDSS) at submission.

§9: Ethics and legal review guidance for researchers building the AI-generated malware corpus.


§ 10 — Recommendations

[68]DARPA Cyber Grand Challenge WEB
Defense Advanced Research Projects Agency · 2016

Government-sponsored competition involving purpose-built vulnerable software (Challenge Binaries) developed under controlled conditions for autonomous security research — the precedent for government-funded controlled security research where the funding structure absorbs the liability exposure private researchers cannot carry.

§10: Policymaker recommendation — DARPA CGC as the precedent for a government-funded AI-generated malware corpus program.

[69]National Software Reference Library (NSRL) WEB
National Institute of Standards and Technology

Government-maintained curated hash set used in digital forensics — the model for what a government-maintained security reference dataset looks like and demonstrates that private industry relies on such datasets but cannot credibly produce them independently.

§10: NIST NSRL as the reference corpus model — what a government-maintained AI-generated malware corpus could look like institutionally.

[70]Secure and Trustworthy Cyberspace (SaTC) WEB
National Science Foundation

NSF's primary funding program for academic cybersecurity research — the most likely existing funding vehicle for the academic research needed to build a validated AI-generated malware corpus at the required scale.

§10: NSF SaTC as the existing funding mechanism — how the corpus research this paper calls for could be funded without new legislative authority.

[71]What's In America's Code? There are major risks with allowing Chinese LLMs to code for U.S. applications PDF
Booz Allen Hamilton · May 2026

Vendor research report. Booz Allen's AI-native test platform evaluated five frontier code-generation models — four Chinese (Qwen3-Coder, MiniMax M2.5, Kimi K2.5, DeepSeek V4-Pro) and one American (Claude Opus 4.6) — across >2,800 trials and ~460,000 lines of generated code. Three of four Chinese models produced more vulnerable, obfuscated code when the user presented as U.S. government (Qwen3-Coder +130%; Kimi K2.5 the exception, −18%); all four refused PRC-sensitive coding tasks. Note: vendor report, not peer-reviewed; the ban/buy-American recommendation aligns with Booz Allen's position as a U.S. government contractor — treat the magnitudes as directional pending independent replication.

§2.1: Corroborates the open-weight Chinese-model risk picture — extends the DeepSeek-R1 refusal/compliance benchmark [19, 20, 21] from "easier to misuse" to "produces measurably worse code," and adds the persona-conditioning finding (worse code for a U.S.-government user).


About the author: Jay Hawkins spent twenty years in the U.S. Army, including a decade in cyber operations — serving at USCYBERCOM, USCENTCOM, USNORTHCOM, and USEUCOM — and holds an active TS/SCI clearance. He builds local-first AI security tools and writes about the methodology, the hard lessons, and the compliance implications of doing it in production. CEH, CHFI, Pentest+, Security+.

Full background →


Centaur Security Labs — centaursecuritylabs.com