Technology earns its place in an organisation by making someone faster or better at their job. The idea is older than software; it runs back at least to Douglas Engelbart's 1962 account of computing as a way to augment human capability.1 A tool gets adopted when it takes a real pain away from the people doing the work: the spreadsheet that retires the paper ledger, the CRM that retires the box of index cards, the applicant tracker that retires the inbox stuffed with CVs. The ones that take hold are the ones that remove more friction than they add, and in time they stop being tools and become the way the work is done.
In an ideal world, the tool then becomes a scaling vector for best practice. The organisation embeds its way of working into the software, so that following the process and using the system become the same act. Thomas Davenport described this in his 1998 study of enterprise systems: a packaged system carries its own model of how a business ought to run, and adopting it means adopting that model.2 Done well, this is how good practice spreads. A new joiner learns the process by learning the system, and the system holds everyone to the same path.
There is a price, and most people who have worked inside a large company have paid it. To be worth building, enterprise software has to serve a very large number of customers, which pushes every product toward generality. The model of best practice baked into the tool is the vendor's, drawn from a whole market, not the one a particular company would have designed for itself. So the arrow often runs the other way. Rather than the software bending to the business, the business bends to the software, and the people who have to live with it had no say in choosing it. As Davenport observed, it ends up being the vendor, not the customer, that decides what counts as best.
A second price is accumulated complexity. Mature platforms keep adding features long after the core job is solved, partly to cover every adjacent use case and partly to make themselves harder to leave. The result is software that is genuinely hard to learn: After Effects alone takes most of a year to use professionally,3 and around the most complex platforms, AWS and SAP among them, a market of training, certification and consulting forms because the tools have outgrown the people who depend on them.4
For all that, the stack underneath looks much the same from one organisation to the next. It is never as tidy as one tool per function: a single company runs several overlapping systems for the same job, and the boundaries between them blur. But a dozen or so categories show up in nearly every company: an ERP for finance, an HRIS and payroll for people, applicant tracking for recruiting, a CRM for sales with marketing-automation tools alongside it, help desks and ITSM for support and IT, source control and CI/CD for engineering, a collaboration suite, a data warehouse and BI for analytics, identity and security tooling, and procurement and contract-management systems. What differs between companies is the vendor, not the category.
Every one of those systems has the same end-user: an employee, doing a job, whom the technology is there to make faster or better. Enablement has meant finding that employee's pain, building the process into the tool, and getting a workforce to adopt it. The discipline grew up around people doing the work, and every part of it assumes a human operator. That has changed.
As we argued in The Method Layer, the work inside an organisation no longer has to be done by a person; the actor can be an agent. The Craft of Codification took the first step that follows: writing the method down, because an agent cannot pick it up by watching the way a new hire does. This piece takes the next, the technology that method runs on, and what it has to become once an agent is the one operating it. The stack stays much the same. The operator does not.
From answering to acting
When ChatGPT arrived at the end of 2022, it answered in text. Ask it how to pull a report out of Salesforce or tidy a messy spreadsheet, and it would describe the steps fluently, but it had no way to reach into the software and carry them out.
That distance began to close in March 2023, when OpenAI gave ChatGPT plugins.5 For the first time a widely used assistant could act on the user's behalf rather than only describe what to do, reaching out to web browsing and a handful of third-party tools. But a plugin could only reach software that had been wired up in advance to receive it. That limit came off in October 2024, when Anthropic released Computer Use with Claude 3.5 Sonnet, the first frontier model able to operate a computer the way a person does: read the screen, move the cursor, click and type.6 A model that can drive a mouse can use whatever a person can, including the dense, idiosyncratic applications that were never built to be automated. The early versions were experimental and error-prone, and Anthropic said as much; the capability was real, the reliability was not yet there.
Connecting agents to software at scale then needed a shared way to plug in, and for a while several competed: OpenAI's plugins, function calling, and a wave of agent frameworks each with its own. The format that won, for now, is the Model Context Protocol, which Anthropic introduced in November 2024 as a structured way for any model to call any tool. By December 2025 it had passed to a new foundation under the Linux Foundation, co-founded with OpenAI and Block and backed by Google, Microsoft and AWS, at which point it stopped being one company's protocol and became shared infrastructure.7 Adoption has been steep. The incumbents are building from both ends at once, with Salesforce, ServiceNow, Workday and Notion each shipping an agent inside their own product and an interface that lets outside agents act on it. The plumbing question of a year ago, whether an agent could reach a given system at all, is largely answered.
On OSWorld, a benchmark that has agents carry out real tasks on a real desktop, the strongest model now scores around 83%, at or above the human success rate of roughly 72% the benchmark's own authors recorded.8 That comparison is rougher than two clean numbers imply, since the model figure comes from a later, revised version of the benchmark and part of the gain is the surrounding harness rather than the model alone. The software the workforce uses every day, the same screens and the same systems, is now being operated by agents through the same interfaces, well enough that companies are wiring them in wherever they can. The reliability is uneven and the security around it is still being built, which is why most of this work runs with a person watching. The direction, though, is set. The tools were built for people, and agents are now using them.
The interface changes hands
So if an agent operates the system, reaching it through an interface meant for another program rather than through a screen, what is the screen still for?
The question lands on a piece of vocabulary the industry leans on heavily: the system of record, the authoritative store for a given kind of data. The term is older and looser than its current use suggests, running back through Geoffrey Moore's 2011 framing to a data-warehouse definition from 2003 and a US federal records statute from 1974.9 Strip away the history and the working idea is plain: for any fact about the business, one system is the source of truth, and a screen sits on top of it so a person can read and change that fact.
Separating that screen from the record underneath is not a new idea. For most of the last decade it had a name: headless. Headless content management (Contentful, Sanity), headless commerce (commercetools, Shopify Hydrogen), and the MACH Alliance that formed around the pattern in 2020 all argued the same case: keep the data and logic in a back end with no interface of its own, expose it through an API, and let any number of front ends consume it. The reasoning was that the channels were multiplying, from web to mobile to kiosk to voice, and a business that welded its screen to its record would have to rebuild the system for each one. Gartner generalised it as the "composable enterprise."10 That pattern was already working well before agents arrived, carrying content and commerce out to web, mobile and everything in between. The agent is simply the newest consumer it fits, and the one that pushes the logic furthest: it has no use for a screen at all, only for the API the headless camp spent a decade arguing the record should expose. Salesforce made the lineage explicit when it built its 2026 reorganisation around the phrase "No Browser Required."11
What that implies for the record itself is contested, and the sharpest version of the argument ran in 2026 between the venture firm a16z and the analyst Josh Bersin. a16z's case is that the record becomes plumbing: the value moves up to a reasoning layer that reads across many records and acts on them, and the system underneath is consumed through its API rather than its screen.12 Bersin's case is that this underrates what the record does. You cannot reason your way through payroll or enforce separation of duties with a prompt; the deterministic rules, the approval chains and the compliance machinery built up over twenty years live inside the system, and an agent acting outside it is, in his phrase, lawless by design.13
The likeliest outcome is that both surfaces survive, because they answer to different masters. The human-facing screen does not go away, for the reason Bersin gives: someone has to approve, override, and answer for the result, and that work needs an interface, increasingly one built for oversight rather than data entry. Next to it sits a second surface, the one the agent uses: the API, the permissions, the structured access the headless argument always pointed toward. A system of record ends up with two faces, one for the person who is accountable and one for the agent doing the work. The second is its own discipline to build, and a category of company has formed to do it.
The companies enabling the agent workforce
Drawing this layer as a set of neat boxes is, for the moment, a snapshot of a moving target. The category is forming faster than anyone can settle a framework around it: companies launch, reposition and get bought inside a single quarter, and even the line between building an agent, running it and governing it is still being drawn. What follows is its rough shape in mid-2026, not a stable taxonomy.
The largest bets are coming from the model labs and the hyperscalers, each trying to own where agents are built and governed. Anthropic's Claude Managed Agents bundles orchestration, tool execution, memory, evaluation and tracing into a hosted runtime, so a company can ship a production agent without assembling the plumbing.14 OpenAI, Google and Microsoft each sell an agent platform of their own, and Amazon's Bedrock AgentCore will run an agent built in any of them.15 The enterprise incumbents are approaching from the management side: Workday's "agent system of record" and comparable efforts from Microsoft, ServiceNow and Salesforce treat a company's agents as a workforce to be onboarded, scoped, budgeted and audited.16 Each of these assumes that one platform can serve every enterprise alike, and that assumption looks naive. It is Davenport's trade again: a platform general enough for every company carries its vendor's model of how agents ought to be run, not the one a particular company would have designed. Nor are these the only bets. Notion, among others, is wagering the opposite: that what this layer most needs is a strong knowledge-management foundation beneath it, and that owning a company's context is a more durable position than owning the runtime.
Around that contest, a wider field is working on the parts neither side has nailed down: giving an agent one permissioned surface onto thousands of tools, letting it drive software built for human hands, holding its memory across sessions, evaluating and watching its work once it is live, giving it an identity and containing it when it goes wrong. Each is its own small market, LangChain and a crowd of younger companies among them, and the boundaries between them are still being redrawn.
Underneath all of it is a straightforward shift. The enterprise systems from the first section were built for one kind of operator, an employee, who turned up with an identity, permissions and the judgement to know when to escalate, none of which the software had to provide. An agent carries none of that, so it has to come from outside the system, and supplying it is becoming a discipline of its own: agent enablement, the work of equipping the new operator to do the job, as enablement has always meant equipping people. The layer forming around that work, the enablement layer, is the identity, permissions, orchestration, memory and evaluation an agent needs to operate the stack but does not supply itself. The systems a company runs are not being swept away so much as fitted with a layer above them they never used to need, and the stack that looked settled is changing shape.
The build-or-buy question, reopened
All of that, the agent's interface and the enablement layer above it, still sits on top of software the company bought. As the cost of building software falls, the oldest question in enterprise IT comes back: whether to buy it at all. Companies rented generic software because building was expensive and most business capability, accounting, HR, support, was much the same from one company to the next, so a vendor who built it once and rented it to thousands beat anyone building alone. The costs of that bargain were the two this piece opened with: you took on the vendor's way of working along with the software, and you took on its accumulated complexity, paying in training, certification and consultants for generality your company never asked for. If building gets dramatically cheaper, the bargain is worth re-examining. Why keep renting a system that bends your processes to its shape and takes a year to learn, when you could build one that fits?
The supply side has genuinely moved. Building new software has become far cheaper, with the AI coding tool Cursor reaching two billion dollars of annual revenue in three years and Lovable going from one million to several hundred million in little over a year.17 The enterprise stack, though, has barely shifted. Klarna is the cautionary case: its 2024 claim to have replaced Salesforce and Workday with AI was later walked back, the reality being a swap of expensive horizontal tools for cheaper ones plus internal engineering, with its chief executive concluding that fewer, consolidated vendors was the likelier future.18 No mid-market company outside technology has publicly torn out a CRM or an HR system for a build of its own.
So the line moves rather than breaks. The cheaper building gets, the more of the firm-specific work that used to be bought can be built instead; but the generic systems of record do not vanish. They thin, they grow more permeable to agents, and the new work concentrates in a layer built on top of them and fitted to how a particular company actually operates. That layer is the method layer, and it is where the value settles as the model beneath it becomes a commodity.
What to look for when you buy
For anyone choosing software now rather than building it, the buying criteria have shifted. Whether the team likes the interface, how long the training runs, how good the support is, all still matter, but they describe a tool for a human operator. A handful of other questions describe whether the tool is ready for the operator now arriving.
The first is whether an agent can reach it without the screen. A clean, documented, permissioned interface for programs, an MCP server or a well-kept API, is now a first-order feature rather than a developer afterthought. A product that can only be driven through its own UI is a product only a human can use.
The second is whether you can tell that it works. An agent that impresses in a demo can still fail in week three on real data and edge cases, so the question is whether the product lets you measure that: run it against your own test cases before you trust it, and read the traces and metrics, once it is live, that show how often it succeeds and where it goes wrong. Because the model underneath can change without warning, that same evidence is what lets you re-check it after a vendor upgrade. A tool you cannot evaluate is one you are taking on faith.
The third is whether it can govern an agent, and contain a compromised one. Scoped permissions, a distinct identity per agent, and an audit trail of what the agent did are the baseline; the harder test is whether the system holds up when the agent is fed a hostile instruction through the very data it reads, the prompt-injection and data-exfiltration risks that arrive the moment an agent can act. The questions to put to a vendor are who is accountable for an agent's actions inside the system, how you would see what it touched, and what stops a hijacked agent from touching more than it should.
The fourth is whether it keeps a human in command. The screen should not vanish; it should let a person approve, override, and answer for the result. A tool that hands everything to the agent with nowhere for a person to watch is failing the half of the job that does not automate.
The fifth is whether it will bend to you or make you bend to it. The cheaper building becomes, the less sense it makes to take on a vendor's way of working wholesale. Favour systems whose data you can get back out, whose behaviour you can shape, and on top of which you could build a firm-specific layer without a fight.
The sixth is what it costs to run. An agent operating the software consumes tokens, and vendors increasingly meter that inference inside the product, by the action or the conversation, even though the model underneath is often the same one several other tools are calling. Watch for paying for the same intelligence many times over, and favour systems that let you bring your own model and meter the cost in one place over ones that lock the inference, and its bill, inside the product.
None of this requires betting on a wholesale rebuild of the stack. It asks only that you buy as though the operator has changed. In a growing share of the work, it has.
References
-
Douglas C. Engelbart, Augmenting Human Intellect: A Conceptual Framework (SRI, 1962), https://www.dougengelbart.org/pubs/augment-3906.html. ↩
-
Thomas H. Davenport, "Putting the Enterprise into the Enterprise System," Harvard Business Review 76(4), 1998, https://hbr.org/1998/07/putting-the-enterprise-into-the-enterprise-system. ↩
-
After Effects learning curve estimates: Noble Desktop, "How Long Does It Take to Learn After Effects?", https://www.nobledesktop.com/learn/after-effects/how-long-does-it-take-to-learn-after-effects. ↩
-
SAP S/4HANA migration duration and budget figures: SAPinsider, "SAP S/4HANA Migration Benchmark Report 2025." Salesforce administrator certification self-study estimate: see Salesforce Ben and related certification guides. Note that several widely-circulated SaaS-bloat statistics (e.g. the "80% of features rarely used" figure) are five or more years old and are not relied on here. ↩
-
OpenAI, "ChatGPT plugins," 23 March 2023, https://openai.com/index/chatgpt-plugins/. The first release paired a first-party web-browsing plugin with third-party tools including Zapier, OpenTable, Kayak and Expedia; OpenAI has since deprecated plugins. ↩
-
Anthropic, "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku," 22 October 2024, https://www.anthropic.com/news/3-5-models-and-computer-use. Anthropic described Claude 3.5 Sonnet as the first frontier model to offer computer use in public beta, and called the capability experimental and error-prone at launch. OpenAI's Operator (January 2025) and Google's Project Mariner (December 2024) followed, and were later folded into other products or wound down. ↩
-
Anthropic introduced the Model Context Protocol in November 2024 ("Introducing the Model Context Protocol," https://www.anthropic.com/news/model-context-protocol) and on 9 December 2025 donated it to the Agentic AI Foundation, a directed fund under the Linux Foundation co-founded with Block and OpenAI and supported by Google, Microsoft and AWS (https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation). The ~2M-to-~97M monthly SDK download figures and the ">10,000 public servers" figure are drawn from Anthropic's December 2025 announcement and 2026 ecosystem trackers; public-server counts vary by source. ↩
-
Claude Opus 4.8 scored 83.4% on OSWorld-Verified (Anthropic, "Claude Opus 4.8," 28 May 2026, https://www.anthropic.com/news/claude-opus-4-8). The ~72.36% human baseline is from the original OSWorld benchmark (Xie et al., 2024); OSWorld-Verified is a later, cleaned-up revision, so the model and human figures are not measured on an identical task set, and Anthropic notes part of the score reflects harness improvements rather than the model alone. OSWorld-Verified: https://xlang.ai/blog/osworld-verified. ↩
-
On the "system of record" lineage: Geoffrey A. Moore, "Systems of Engagement and the Future of Enterprise IT" (AIIM, 2011); Bill Inmon, "The System of Record in the Global Data Warehouse" (Information Management, 2003); and the US Privacy Act of 1974 "system of records" definition. ↩
-
The "headless" pattern (a back end with no built-in UI, exposed through an API) runs through headless CMS (Contentful, Sanity), headless commerce (commercetools, Shopify Hydrogen) and the MACH Alliance (Microservices, API-first, Cloud-native, Headless), founded 2020. Gartner generalised the idea as the "composable enterprise." ↩
-
Salesforce, "Introducing Salesforce Headless 360. No Browser Required.", April 2026, https://www.salesforce.com/news/stories/salesforce-headless-360-announcement/. Announced at TDX 2026. ↩
-
a16z, "From 'System of Record' to 'System of Intelligence'" (Gio Ahern, Stephenie Zhang and Alex Immerman), part of a16z's Big Ideas 2026, published December 2025, https://a16z.com/newsletter/big-ideas-2026-part-1/. ↩
-
Josh Bersin, "The Reinvention of Workday: From System of Record to Platform of Agents," April 2026, https://joshbersin.com/2026/04/the-reinvention-of-workday-from-system-of-record-to-platform-of-agents/. The phrase "lawless by design" is Bersin's characterisation, largely endorsing Workday's argument about agents built outside the system of record. ↩
-
Anthropic, "Claude Managed Agents" (launched 8 April 2026, public beta); product overview at https://platform.claude.com/docs/en/managed-agents/overview. ↩
-
On the runtime side, the model labs and clouds have each shipped a way to build and run agents on their own infrastructure: Anthropic's Claude Managed Agents, OpenAI's Agents SDK, Google's Agent Development Kit, Microsoft's Agent Framework (AutoGen and Semantic Kernel, merged late 2025) and Amazon's framework-agnostic Bedrock AgentCore (generally available October 2025); see the 2026 agent-infrastructure surveys. On the governance side, the "agent management" layer is being built in parallel by Microsoft (Agent 365), ServiceNow and Salesforce alongside Workday's Agent System of Record; see Josh Bersin, "ServiceNow Bets Big on Enterprise AI," May 2026, https://joshbersin.com/2026/05/servicenow-pushes-the-envelope-on-enterprise-ai-with-vision-of-managing-everything/. ↩
-
Workday, "Agent System of Record," https://www.workday.com/en-us/artificial-intelligence/agent-system-of-record.html. First unveiled in 2024–25 and central to Workday's April 2026 repositioning (see 13). ↩
-
Cursor (Anysphere) reached roughly two billion dollars of annualised revenue by early 2026, about three years from its 2022 founding (TechCrunch and Bloomberg, March 2026; see also coverage of its ~$50bn April 2026 funding round, e.g. The Next Web, https://thenextweb.com/news/cursor-anysphere-2-billion-funding-50-billion-valuation-ai-coding). Lovable's ARR went from roughly $100M (July 2025) to ~$400M (early 2026): Sacra, https://sacra.com/c/lovable/, and Bloomberg, 12 March 2026. ↩
-
Klarna's clarification of its 2024 claims: Diginomica, "Those shutting-down-Salesforce-and-Workday rumours — Klarna: no, we didn't replace SaaS with an LLM," https://diginomica.com/those-shutting-down-salesforce-and-workday-rumors-klarna-no-we-didnt-replace-saas-llm-admits-ceo; and CNBC, May 2025. ↩



