Working Paper
As artificial intelligence (AI) is increasingly used to produce knowledge work, a widespread worry is that human competence at evaluating the results could decline. For instance, if students never write code, the concern is that they will be unable to judge the code an AI writes for them. Yet evaluation without production is also routine: film critics judge cinematography without operating cameras, and wine critics rate vintages they could not make. The authors resolve this tension with a framework for analysing the Decoupling of Expertise in Evaluation and Production (DEEP), which specifies when production experience is necessary for competent evaluation and when it is not. The authors first separate two regimes of evaluation: in Regime 1, quality is fully determined by observable artefact features; in Regime 2, quality also depends on process attributes the finished artefact does not reveal.
Next, the authors ask whether the information evaluators require in either regime can be acquired vicariously or only by producing. This framework predicts that the decline in human evaluative capabilities arising from cessation of production will be concentrated rather than universal. The authors also specify possible remedies by regime: training through simulation in Regime 1, process documentation in Regime 2, and modest quantities of production-for- calibration when these measures are infeasible.
Faculty
Professor of Strategy