~/Signals/Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market Tasks_

External signal·arXiv (MIT FutureTech)·Matthias Mertens, Neil Thompson, et al.

Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market Tasks

NeutralMid-Term (3-5 yrs)

Read the original at arXiv (MIT FutureTech)

“The main insight of this paper is that, across a large set of realistic and representative labor-market tasks addressable by LLMs, the downward slope between task success and task duration is, on average, surprisingly flat — i.e., more consistent with a rising tide rather than a crashing wave.”

Summary

This MIT FutureTech working paper tests whether AI automation arrives as abrupt 'crashing waves' that wipe out clusters of tasks at once or as a gradual, broad-based 'rising tide.' The authors evaluated frontier LLMs on over 3,000 text-addressable labor-market tasks drawn from O*NET, using more than 17,000 evaluations by workers who actually do those jobs. They find the success-duration curve is relatively flat, indicating a rising tide: gains spread across many tasks and durations simultaneously rather than blindsiding specific workers. Capability is already substantial and improving fast, with a model 'doubling time' of about 3.8 months, and the authors project most studied tasks could reach 80%-95% AI success rates by 2029.

Predictions for the future of work

The paper predicts AI automation will be broad and gradual rather than sudden, with capabilities improving rapidly: between 2024-Q2 and 2025-Q3 frontier models went from 50% success on 3-4 hour tasks to 1-week tasks, with failure rates halving every roughly 2.4-3.2 years. Extrapolation suggests most studied text-based tasks could hit 80%-95% success by 2029, implying substantial labor-market impact across many occupations at once. The 'rising tide' pattern means individual workers are less likely to be abruptly displaced, and the slow climb to near-perfect performance leaves a window for adjustment.

arxivmit futuretechneil thompsono*netllm benchmarkstask automationworking paperworker evaluations

Originally published by arXiv (MIT FutureTech)

Read the original at arXiv (MIT FutureTech)