The activity was presented as a red team project, but the discovered framework pointed to stealthy post-exploitation operations, including ransomware deployment and data theft. The case shows how attackers are adapting tools used by legitimate developers and security testers to accelerate malware engineering, automate testing and identify weak points in enterprise defences.
The investigation began after an anomalous endpoint inside a customer environment triggered alerts linked to malicious payloads stored in a test directory. The files led analysts to a broader framework containing Cobalt Strike profiles designed to disguise beacon traffic as normal web requests, a Telegram-based command-and-control channel, Python scripts for shellcode injection and a Cloudflare Worker used to hide backend infrastructure.
Several Python scripts found in the environment appeared to have been partly generated with AI assistance. Many were written in Russian, though attribution remains unclear. The wider repository contained two main components: an automated Active Directory discovery panel and a malware-testing lab that evaluated payloads against endpoint security tools from Sophos, CrowdStrike and Microsoft Defender.
The Active Directory component did not appear to be a fully autonomous large language model conducting independent operations. Instead, it collected results from completed tasks, selected follow-up actions from predefined workflows, dispatched tasks to remote agents and then reassessed results. That distinction is important because the case points less to self-directed AI malware and more to human-directed automation using AI as a productivity layer.
The attacker’s testing environment used multiple Windows Server 2022 virtual machines. One machine was configured for testing against Sophos protection, another against CrowdStrike, and a third operated as a control system without EDR software. A fourth Ubuntu system hosted a Sliver post-exploitation command-and-control server. The structure mirrored a professional security lab, but the tooling and surrounding artefacts indicated offensive intent.
AI agents were assigned roles inside the framework. One Claude Opus 4.5 agent handled core coordination and rules for other agents, while additional agents were tasked with EDR testing, documentation, operational-security hardening, proxy stress testing and virtual machine deployment. The workflow used Model Context Protocol, an open standard that allows AI assistants to connect with external tools and data sources, to link agent output with Git repositories.
The attacker also used Cursor, an AI-native integrated development environment, during the software development process, and Ludus, a platform used to deploy virtualised security-testing environments, to provision the lab. These tools are not inherently malicious; their use in this case illustrates how dual-use technologies can be repurposed when guardrails are bypassed or misled by claims of legitimate red team work.
A modular Windows payload loader generator sat at the centre of the framework. It produced custom executables or DLLs by wrapping raw payloads in layers of encryption, evasion and alternative execution techniques. The tool was designed to generate payloads based on evasion methods specified through command-line options, allowing the operator to test different bypass strategies systematically.
Nearly 80 modules covering more than 70 techniques were developed for the platform. The agents’ own reports claimed that early attempts had a high failure rate but later iterations achieved broad success against the tested EDR agents. Analysts, however, found that the framework’s documented output did not fully support those claims, leaving uncertainty over whether the attacker overstated success or whether some evidence was missing.
The repository also showed that the attacker drew from public security research, including material on adversary simulation, detection bypasses and post-exploitation tradecraft. AI agents were instructed to read technical articles, extract techniques, map them to MITRE ATT&CK categories, prepare lab environments, execute tests and report findings. That pattern reflects a growing risk for defenders: public research intended to improve security can be rapidly converted into attack playbooks when combined with automation.
Follow Arabian Post
Select Arabian Post as your preferred source on Google and MSN News for trusted business news and Arab politics and updates.