Online anonymity is facing a sharper threat as new research suggests large language models can now identify pseudonymous internet users at scale by combining fragments of public writing with broader online data. The warning comes from a February 2026 preprint by Simon Lermen and co-authors at ETH Zurich, Anthropic and the Machine Learning Alignment and Theory Scholars programme, which argues that the “practical obscurity” that once protected many users online is eroding fast.
The study tested whether AI systems could do what determined human investigators have long attempted: connect a pseudonymous profile to a real person by piecing together clues buried in posts, comments and biographical traces. According to the paper, the researchers built an attack pipeline that extracts identity-relevant details from free text, searches for candidates through embeddings, and then uses model reasoning to verify likely matches. Across three datasets, the LLM-based approach achieved as much as 68% recall at 90% precision, far outperforming older non-LLM methods that came close to zero under the same conditions.
What makes the findings more unsettling is not only accuracy but cost. Reporting on the study, The Verge said the team spent less than $2,000 overall, working out to about $1 to $4 per profile. Simon Lermen told the publication that “the economics are totally different now”, underscoring how cheap automated re-identification could become as model costs fall further. The paper itself argues that the capability may scale beyond laboratory-sized pools: in a coarse extrapolation, the authors estimated that an LLM-based attack could still produce about 35% recall at 90% precision against a pool of one million candidates, while also showing some success even when only one in 10,000 targets had a true match.
That shift matters because pseudonymity has long depended less on perfect secrecy than on friction. Older deanonymisation attacks, including the landmark Netflix Prize work by Arvind Narayanan and Vitaly Shmatikov, showed years ago that supposedly anonymous data could be linked back to individuals when matched with auxiliary information. What changes in the AI era is that the matching can be performed on raw, messy language rather than neatly structured records, and it can be automated across far more users with far less labour. The Lermen paper explicitly places itself in that lineage, saying earlier attacks depended on structured micro-data while the newer method works directly on ordinary user content across platforms.
The potential fallout stretches well beyond embarrassing burner accounts. In the paper’s discussion section, the authors warn that governments could link pseudonymous accounts to real identities for surveillance of dissidents, journalists and activists; corporations could connect anonymous forum posts to customer files for hyper-targeted advertising; and attackers could build profiles for personalised fraud or social engineering. The Verge report echoed that concern, noting the risk to journalists, dissidents and whistleblowers who rely on pseudonyms as a practical shield rather than a perfect one.
Data brokers could become the force multiplier. The architecture described in the study is based on public information and inferred traits, but commercial broker ecosystems already hold enormous volumes of offline and online personal data. The US Federal Trade Commission has warned for years that brokers assemble detailed consumer profiles from many sources, and it reiterated in February 2026 that sensitive categories of data sold or transferred by brokers can include health, financial, biometric, geolocation and other highly personal information. California has also opened its DROP system, a state platform that says a single verified request can reach more than 500 registered data brokers, a sign both of the sector’s scale and of mounting official concern over its reach.
Even so, the findings should not be overstated. The paper is a preprint, not a peer-reviewed journal study, and outside experts quoted by The Verge cautioned that curated test environments do not map neatly onto the open internet. Luc Rocher of the Oxford Internet Institute said privacy is not dead, noting that important anonymous figures remain unidentified and that secure tools still protect many communications. The researchers themselves said they did not test the system on actual unsuspecting pseudonymous users and withheld full technical details for ethical reasons. That restraint strengthens the seriousness of the warning while also reminding policymakers not to confuse a strong demonstration with proof that every anonymous account can now be exposed on demand.
Follow Arabian Post
Select Arabian Post as your preferred source on Google and MSN News for trusted business news and Arab politics and updates.