← Back to briefings

OpenAI's B2B Signals Says Delegated Codex Workflows Are Becoming the Enterprise Maturity Test

2026-05-07 • Delegated-work maturity signal • Butler

OpenAI's new B2B Signals release matters because it argues the real enterprise divide is moving from seat access toward delegated Codex workflows with governance and enablement attached.

The Butler evaluating depth of delegated work, representing enterprise AI maturity

Enterprise AI reporting often gets stuck at the easiest metric to brag about.

How many seats were deployed? How many people have access? How many messages got sent?

Those numbers are not useless.

They are just not a very good test of whether AI has become operational.

That is why OpenAI's new B2B Signals release is more interesting than it first looks.

The company is making an explicit argument that the real separation line is moving toward depth, delegation, and agentic workflow use.

What OpenAI is actually saying

OpenAI introduced B2B Signals on May 6 as a business extension of OpenAI Signals.

The company says frontier firms now use 3.5 times as much intelligence per worker as typical firms, up from 2 times a year ago.

It also says message volume explains only 36 percent of that gap.

That is the important part.

The claim is not merely that leading companies are sending more prompts. The claim is that they are using AI more deeply.

OpenAI also says the largest frontier gap shows up in advanced and agentic tools, especially Codex, where frontier firms send 16 times as many messages per worker as typical firms.

That is a much more operational statement than a seat-count announcement.

Why delegated work is the better maturity signal

Broad access is an early adoption story.

Delegated work is a different story.

Once a team starts using Codex or other agentic tools for meaningful tasks across code, files, or longer-running workflows, the organization has to answer harder questions:

That is why delegated-work intensity is a better maturity signal.

It forces operational decisions that casual chat use can avoid for a long time.

Why this should not be read as automatic proof of value

OpenAI is careful to use tokens and message patterns as signals, not as direct proof of ROI.

That caution matters.

A company can use AI deeply and still use it badly.

It can send more Codex messages and still fail to connect the output to useful review habits, workflow design, or business outcomes.

So the useful reading is not, "frontier firms send more messages, therefore deeper usage is always good."

The useful reading is that deeper usage exposes whether a company has built the governance and enablement muscle required for real operational adoption.

The hidden point in the report

The most interesting line in the B2B Signals framing is not even the Codex multiplier.

It is the repeated connection between frontier progress and organizational change.

OpenAI says firms close the gap through governance, enablement, scaling what works, and moving from chat-based assistance to delegated work with agents.

That is basically an admission that tool access alone is no longer the story.

The advantage begins to compound only when teams redesign work around the tools instead of sprinkling the tools on top of existing habits.

What leaders should test in their own orgs

If you are using this report as a planning input, the practical questions are:

Those answers matter more than copying someone else's frontier metrics.

The Butler take

OpenAI's report is useful because it reframes the enterprise AI conversation around operating depth rather than access theater.

That does not mean every organization should rush into heavy delegation.

It means the real maturity question is no longer whether people have AI.

It is whether the company has built the habits required to trust AI with deeper work without losing control of the workflow.

Bottom line

B2B Signals is not interesting because it flatters power users.

It is interesting because it says delegated Codex work is becoming the line between experimentation and operational adoption.

That line is not drawn by message counts alone. It is drawn by governance, enablement, and whether the workflow can survive deeper use.

Related coverage

AI Disclosure

This article was researched and drafted with AI assistance, then reviewed and edited for clarity, accuracy, and editorial quality.