The Trump administration's approach to AI was supposed to be lighter-touch than the Biden era. As of last week, "lighter touch" is starting to look more nuanced. Google, Microsoft, and xAI all agreed to provide the Commerce Department's Center for AI Standards and Innovation (CAISI) with early access to frontier models for national-security testing[1].
This is the third tier of an oversight framework that already covers OpenAI and Anthropic. Let me unpack what is actually happening.
What CAISI tests
The testing programme covers three primary risk vectors[1]:
- Hacking capabilities. Can the model autonomously discover and exploit vulnerabilities in real systems? CAISI runs the model against curated CTF-style challenges plus a private benchmark of attack scenarios.
- Military misuse. Can the model meaningfully assist in CBRN (chemical, biological, radiological, nuclear) weapons development? This is the bar that has been used since the Biden AI Bill of Rights era.
- Unexpected behaviours. Does the model do anything CAISI testers consider categorically dangerous when pushed to its limits — deception of operators, autonomous self-replication, manipulation patterns.
Models that flag on any of these get a delay. Models that pass clear for deployment.
Why Google, Microsoft, and xAI agreed
Three reasons:
- Going first is a moat. Being the first models cleared by CAISI is a marketing advantage — "tested for national-security risks by the US government" is a useful enterprise-sales line.
- Avoiding a worse alternative. If frontier labs do not agree to voluntary testing now, formal regulation later becomes more likely. Voluntary cooperation buys regulatory goodwill.
- The testing is private. CAISI does not publish results publicly. Labs get a yes/no plus private feedback. The risk of bad publicity is low.
Whether the testing is rigorous or theatre depends on who you ask. The labs say it is rigorous. Some external observers — myself included — would prefer to see published methodology so we can assess.
What this changes for AI deployment
For most engineers and most companies building on AI: nothing changes. CAISI testing happens pre-release. By the time you can use Gemini 4 Pro or GPT-5.5 Instant, it has been through whatever testing CAISI does. There is no compliance overhead for end users.
For labs and for the Deployment Companies these labs are spinning up, the testing adds 2-4 weeks to release cycles. Not insignificant, but not crippling.
For government and military deployment of AI — which is now a meaningful market — CAISI testing becomes the implicit prerequisite. Models that have been through it are buyable for defence work. Models that have not are not.
The state-level wildcard
The federal framework is voluntary. State-level frameworks are not. California's SB 53 (the successor to last year's struck-down SB 1047) is in committee and would impose mandatory testing for any frontier model deployed to Californians. New York and Illinois have similar bills moving. EU AI Act enforcement is in its second year.
The labs are walking a tightrope: cooperate with federal voluntary testing (low cost), avoid triggering state-level mandates (higher cost), comply with the EU AI Act regardless (highest cost). The May 5 announcement is part of that tightrope — federal cooperation buys leverage when state legislators ask whether more rules are needed.
What is not happening
Three things worth noting that are not in scope of CAISI testing:
- Hallucination rates. CAISI does not test for whether the model is accurate. That is between the lab and the customer.
- Bias. Deliberately removed from scope in the Trump-era framework — was in scope under Biden.
- Open-source models. The framework applies to closed-frontier labs. Open-weight models (Llama, Qwen, etc.) are not covered. This is a meaningful gap.
What it means for AI builders in the UK and EU
From a UK perspective (which is where I sit), the CAISI framework is interesting because it is the model that the AI Safety Institute (AISI, the UK government body) is most similar to. The UK has its own pre-deployment testing arrangement with frontier labs, set up in 2024 and quietly expanded since.
The EU AI Act is much more prescriptive — high-risk AI systems require formal conformity assessments, documentation, and ongoing monitoring. UK and US frameworks are voluntary, EU is mandatory.
For practitioners, the practical impact is:
- If you deploy AI in the EU, you do have meaningful compliance work to do, especially for high-risk use cases (HR decisions, credit scoring, biometric identification, etc.).
- If you deploy AI in the US or UK, the regulatory environment remains light-touch for now. CAISI / AISI testing happens upstream at the labs.
- If you build on multi-provider gateways like SarmaLink-AI, you inherit whatever testing each provider has done — and you can swap providers if regulation changes which models are deployable in your market.
The verdict
Last week's expansion of CAISI is incrementally useful and politically smart. It does not constrain frontier-AI deployment in any meaningful way today, but it lays the regulatory groundwork for more serious oversight if a frontier model crosses a clear redline.
The bigger story is that the AI regulatory pattern is now visible — voluntary federal testing in the US, similar in UK, mandatory in EU, fragmenting at the US state level. Anyone building AI products globally has to design for this fragmented regulatory environment by default.
Voluntary now. Mandatory soon. Plan accordingly.