The rollout covers three models under the GPT-5.6 family: Sol, the most advanced tier; Terra, a lower-cost model designed for everyday work; and Luna, a faster and cheaper option for high-volume use. Access during the preview is limited to selected organisations through the API and Codex, with ChatGPT users excluded until a broader launch expected in the coming weeks.
The restricted release marks one of the clearest signs yet that advanced AI models are moving into a more sensitive regulatory category, particularly when they show stronger ability to assist with vulnerability research, exploitation analysis and long-horizon software tasks. OpenAI has said the initial group of partners was chosen after consultation with the US government, and that participation details had been shared with authorities before wider access.
The company has framed the step as temporary, arguing that prolonged government-controlled access should not become the standard route for deploying powerful AI systems. Its position reflects a wider tension in Washington and Silicon Valley: security agencies want time to assess dual-use capabilities, while developers and enterprises argue that delaying access can also slow defensive work.
GPT-5.6 Sol is being promoted as OpenAI’s strongest model to date. It introduces a “max” reasoning setting intended to give the model more time to work through difficult problems, and an “ultra” mode that uses sub-agents to divide complex tasks. These features are aimed at software engineering, scientific research and security operations where multi-step planning and tool use are increasingly central.
OpenAI says Sol has set a new benchmark on Terminal-Bench 2.1, a test of command-line workflows involving planning, iteration and tool coordination. It has also shown gains in biology-related workflows through GeneBench v1, which measures long-horizon genomics and quantitative biology tasks. Those claims will draw close scrutiny because frontier models are now being judged not only on accuracy and speed, but also on how well they handle sensitive domains without enabling misuse.
Cybersecurity is the area drawing the most attention. Sol is described as OpenAI’s most capable model yet for cyber work, with improvements in vulnerability research and exploitation-related tasks. On ExploitBench, the model is competitive with Anthropic’s Mythos Preview while using about a third of the output tokens. On ExploitGym, a benchmark developed with academic and industry participation, the GPT-5.6 family shows stronger performance as reasoning effort rises.
The core safety argument from OpenAI is that Sol is better at helping users find and fix software flaws than at reliably carrying out end-to-end attacks. The company says the model does not cross the “Cyber Critical” threshold under its preparedness framework. In tests involving Chromium and Firefox, it found bugs and exploitation primitives but did not autonomously produce a verified full-chain exploit under the tested conditions.
That distinction is important but not conclusive. Security researchers have long warned that the same tools used to audit code, reproduce bugs and generate patches can also reduce the cost of offensive operations. The risk increases when models are linked to browsing, coding environments, scanners, exploit databases or other automated tools. Even if a model cannot independently complete an attack in a laboratory test, it may still improve parts of an adversary’s workflow.
OpenAI has added layered safeguards across the GPT-5.6 family. These include model-level refusal behaviour, real-time checks during generation, account-level monitoring, differentiated access and enforcement measures for misuse. The company has also said some biological and cybersecurity requests may be blocked or delayed while additional checks run, an approach that could affect legitimate work by security teams during the preview.
The system card for the release says safeguard testing has involved expert review, external testing and large-scale automated red-teaming. OpenAI has devoted more than 700,000 A100e GPU hours to automated efforts to find universal jailbreaks and plans to continue red-team testing during deployment. The scale of that testing underlines how frontier AI releases are becoming security operations in their own right, not just product launches.
The pricing structure points to a broader commercial strategy once restrictions ease. GPT-5.6 Sol is priced at $5 per million input tokens and $30 per million output tokens, while Terra is priced at $2.50 and $15, and Luna at $1 and $6. Prompt caching carries a 30-minute minimum cache life, with cache writes billed above the uncached input rate and cache reads receiving a discount.
OpenAI also plans to launch GPT-5.6 Sol on Cerebras in July, offering speeds of up to 750 tokens per second for selected customers as capacity expands. That partnership highlights another emerging trend: model capability is increasingly being paired with specialised inference hardware to make advanced reasoning fast enough for enterprise workflows.
The timing places OpenAI in the middle of a broader debate over how the US should review advanced AI systems before public deployment. Other leading labs have faced similar pressure around models with stronger cyber performance. The emerging policy question is whether governments can build repeatable evaluation systems quickly enough to assess serious risks without freezing access to tools that defenders, researchers and companies may need.
Follow Arabian Post
Select Arabian Post as your preferred source on Google and MSN News for trusted business news and Arab politics and updates.