What is AI Slop? Why Is It Costing Companies Millions?

Emilie Schario, COO at Kilo Code, used an AI tool to revise a blog post she had written about work-life balance. The AI handed back a polished draft — and added a detail that she had recently blocked her calendar to attend her daughter’s school play. Schario is a mother of three boys. She has no daughter. There was no school play.

She caught it. But that’s the point — she had to. She spent nearly half as much time reviewing the AI’s revision as she had writing the original post. A tool built to save time created more work than it eliminated.

A Workday-backed survey of 3,200 leaders and employees from companies with $100M+ revenue found that nearly 40% of AI’s value is lost to rework and misalignment. Only 14% of employees consistently get clear, positive outcomes from AI. The rest are spending their days catching what the AI got wrong.

There’s a name for this: AI slop. And if you’re building or running a product today, it is quietly eating your team’s time, your users’ trust, and your business ROI — whether you’re tracking it or not.

I’ve spent 17+ years building AI and SaaS products, working with over 100 companies from MVP to enterprise scale. What I see most often is not companies failing to adopt AI — it’s companies adopting AI without the systems to control its output quality. This is what’s actually happening, what companies have already learned the hard way, and what the fix looks like.

Covers why AI slop is spreading despite heavy investment, the real cost in money and trust, what YouTube is already doing about it, and a framework for getting output quality under control.

I write every week on AI implementation, SaaS strategy, and product frameworks for founders and technical leaders. Join 210,000+ subscribers.

Subscribe to the Newsletter →


What AI Slop Actually Means — and Why It’s Different From a Hallucination

Most people hear “AI slop” and think hallucinations — the model inventing a statistic, fabricating a quote, getting a name wrong. Those are real problems. But slop is broader. It refers to any AI-generated output that reaches an audience without adequate quality control — whether the output is factually wrong, hollow and generic, off-brand, or simply not useful enough to justify its existence.

Slop can be grammatically perfect and still be slop. The article that opens with “In today’s fast-paced digital landscape” and says nothing for 800 words. The customer support response that restates the complaint back to the user instead of answering it. The AI-written product description that hits every keyword but says nothing a buyer actually needs to know. Technically clean, substantively empty. That’s slop.

Merriam-Webster named “slop” as their word of the year in 2025 — a signal of how mainstream the frustration has become. The internet is flooded with AI-generated content across text, image, video, and audio. Platforms are drowning in it. Teams are drowning in reviewing it.

In December 2025, YouTube CEO Neal Mohan published his annual letter to creators. He named reducing AI slop and detecting deepfakes as two of the platform’s top priorities for 2026. His exact words: “It’s becoming harder to detect what’s real and what’s AI-generated. To reduce the spread of low quality AI content, we’re actively building on our established systems that have been very successful in combating spam and clickbait, and reducing the spread of low quality, repetitive content.”

This is the world’s largest video platform — one that’s paid out over $100 billion to creators since 2021 — saying AI slop is a big enough threat to make it a company-level priority. If YouTube is dedicating engineering and policy resources to this, the question every product leader should ask is: what am I doing about it inside my own platform?


The Numbers That Show Just How Much This Is Already Costing Businesses

The damage is not hypothetical. These are the numbers.

40% of AI’s value is lost to rework and misalignment — Workday/Hanover Research, 3,200 respondents, Nov 2025

95% of enterprise AI pilots fail to deliver measurable P&L impact — MIT study, Jan–Jun 2025

46% of global users actually trust AI systems — despite 66% using them regularly — KPMG, 48,000+ respondents

25% of AI projects delivered the expected returns — IBM Institute for Business Value, 2,000 CEOs globally

That 40% rework figure from the Workday study deserves more attention than it’s getting. For every dollar of productivity AI is supposed to generate, companies are spending 40 cents cleaning up what it got wrong. That’s not a productivity gain. That’s a subsidy on AI’s mistakes, paid for in your team’s time.

The MIT finding is equally damning. 95% of enterprise AI pilots — not 50%, not 70%, ninety-five percent — failed to show measurable profit-and-loss impact. And the MIT researchers were specific about why: the tools weren’t poorly built. The problem was how they were deployed — without proper integration, without quality controls, without the feedback loops needed to improve over time.

“Workslop” — When AI Creates Work Instead of Eliminating It

BetterUp Labs and Stanford researchers coined a term for what happens when AI output shifts the burden of thinking onto the people reviewing it: workslop. Their research found that 41% of workers have encountered workslop — AI output that looks polished on the surface but is generic and hollow, forcing colleagues to re-do the real thinking. Each incident wastes close to two hours.

A concrete example of workslop in action appeared in a LinkedIn thread from a management consultant: a team used AI for all of their data analysis for a client presentation. The client’s feedback was that it “felt like the team was just reading the slides” — not helping them understand what the data meant. The AI produced correct charts. It produced zero insight. That gap — between technically correct output and actually useful output — is the slop problem at its core.

The Trust Gap Is the Metric That Should Worry You Most

The KPMG number — 66% using AI, only 46% trusting it — represents a trust deficit that is getting worse, not better. Among developers, the situation is even more striking: 84% plan to use AI coding tools, but only about a third trust the outputs. That trust level has declined from previous years. The issue isn’t the obvious errors developers can catch immediately. It’s the “almost right” outputs — code that looks correct, passes a quick review, then breaks in production. Those are the errors that erode trust fastest, because they’re the ones that go live.

Users on content platforms are experiencing the same thing. Reddit communities are vocal about AI slop killing engagement — users can sense manufactured content, even without being able to explicitly identify it, and they disengage. When engagement drops, ad revenue follows. The chain reaction from trust erosion to revenue impact is real and measurable, it just often happens slowly enough that companies don’t attribute it correctly.


Three Reasons This Is Happening — And Why Most Companies Miss All Three

AI slop is not random. Across the companies I’ve worked with, it tends to come from the same three places:

Speed was prioritized over standards when AI was first deployed

The pressure to “do AI” came from leadership, from investors, from competitive fear. Implementation timelines were short. And in the rush, almost nobody stopped to write down: what does acceptable AI output look like for us? What are the hard failures — the outputs that would damage our brand? What are the soft failures — the outputs that are technically fine but not good enough to justify existing? Without that document, quality becomes subjective, inconsistent, and impossible to improve systematically. The Workday research confirms this: 66% of leaders say skills training is a priority, yet only 37% of the employees dealing with the most AI rework are actually receiving it. The gap between intention and execution is wide.

Companies have no clear measurement for AI output quality

You cannot improve what you don’t measure. Most teams deploying AI have no rubrics, no scoring systems, no thresholds for what requires human review versus what’s safe to publish. The question “is this good enough?” gets answered differently by different people on different days. That inconsistency is how slop accumulates. And critically, fewer than half of employee job descriptions have been updated to reflect AI capabilities — which means people are balancing faster AI output against the same accuracy and judgment expectations they had before AI was part of their workflow.

Human review was designed as a gate, not as a learning system

Many companies added human review to their AI workflows — but as a final filter, not as a feedback mechanism. Every time a human corrects an AI output, that correction is data. It tells you where the model is failing, in what specific ways, at what frequency. Companies that capture and act on that data can tighten their prompts, adjust their models, and reduce the error rate over time. Companies that treat review as just a quality gate generate no signal from the corrections — and the same errors keep coming through, week after week.


What the Companies Handling AI Quality Well Are Actually Doing

The MIT research that identified 95% failure rates also isolated what the successful 5% had in common: they focused tightly on specific problems and executed with proper infrastructure. Not “AI everywhere.” Specific use cases, with quality systems built around them from the start.

They start with use cases where “good” and “bad” output are easy to tell apart

The MIT researchers found that back-office automation, finance, and procurement deliver better AI ROI than sales and marketing functions. The reason is straightforward: it’s easier to objectively verify whether an invoice was processed correctly than to verify whether an AI-written marketing email captures the right tone. If you start AI deployment in domains where quality is hard to assess, you’ll generate slop you can’t even accurately identify. Start where the quality signal is clear. Build the quality infrastructure. Then expand to more ambiguous use cases once your systems are proven.

YouTube’s response: treat AI slop like spam

YouTube CEO Neal Mohan’s 2026 letter described the platform’s approach explicitly: building on systems that were already successful at combating spam and clickbait, and applying them to low-quality AI content. This is the right instinct. AI slop is a new variant of an old problem — content that prioritizes volume over value. The detection and demotion methods that worked for spam are being adapted for slop. For companies building content platforms, this is the direction. Slop detection needs to be a first-class system, not an afterthought.

The “workslop” research points toward a specific solution

BetterUp Labs and Stanford’s research on workslop identified the core issue clearly: top-down “just use AI” mandates without guidance create hollow, low-quality output. Their recommendation mirrors what I’ve seen work in practice: leaders need to set clear quality standards, model thoughtful AI usage, and create accountability for output quality — not just output volume. The shift from measuring what AI produces to measuring whether what it produces is actually useful is the organizational change that matters most.

Using AI to evaluate AI output at scale

Research on multi-agent evaluation frameworks has shown that AI agents can be used to evaluate AI output — essentially building automated quality assurance into the generation pipeline itself. This is where the most sophisticated AI product teams are heading. Not more human reviewers for everything, but smarter triage systems that route low-confidence outputs to human review and approve high-confidence outputs automatically. The goal is to concentrate human judgment where it has the highest leverage, not eliminate it.


A Framework for Bringing AI Output Quality Under Control in Your Product

This is the framework I use. It’s not a complete playbook — every use case has its own specifics — but this starting structure applies across the board.

1. Write down what acceptable output looks like before you deploy anything

This is the step most companies skip entirely, and it’s why their quality problems persist. Sit down with your team and document: what are the hard failures — outputs that would damage your brand or mislead a user? What are the soft failures — outputs that are technically fine but not valuable enough to publish? What does excellent look like? This becomes your evaluation rubric. Without it, every quality decision is a judgment call that will be made differently by different people on different days. With it, you have a standard you can actually train against and measure improvement on.

2. Add a review stage before AI output reaches your users — even a small one

Even spot-checking 10% of outputs before they go live catches the worst failures before they reach your audience. As you accumulate data from those reviews, you can build automated classifiers that handle the obvious cases — freeing human review for the edge cases that actually require judgment. The patterns that emerge from early manual reviews will tell you exactly what to automate. You cannot build good automated quality systems without first understanding where the failures are coming from.

3. Measure AI output quality from day one — not as an afterthought

Track correction rates, categories of errors, time spent on review, and user feedback signals. This data tells you where the model is failing, how often, and in what specific ways. It also gives you a baseline so you can tell whether changes to your prompts, models, or review processes are improving things — or just moving the problem around. Without measurement, “we’re improving quality” is a belief. With it, it’s a fact you can prove.

4. Treat every human correction as system data, not just a one-off fix

Every time someone edits, rejects, or rewrites AI output, that correction carries information. Build processes that capture it and feed it back into your quality improvement cycle. This doesn’t require model fine-tuning — it can be as simple as updating prompt templates when a specific error category keeps appearing. The companies that are pulling ahead on AI ROI are the ones treating quality improvement as a continuous loop, not a launch task you check off once.

5. Train your team to evaluate AI output, not just produce it faster

The Workday research is clear: the training gap is real, and it’s directly causing the rework problem. The most important training is not “how to use AI tools” — it’s “how to critically evaluate AI output.” Give your team explicit criteria for what makes output acceptable versus what needs editing versus what should be discarded entirely. Run through real examples together. Build shared vocabulary and shared judgment. This is the skill that converts AI from a liability into an asset.


Why Getting This Right Now Creates a Durable Advantage — Not Just a Fixed Problem

Right now it feels like everyone is dealing with AI slop equally — the same trust deficits, the same rework costs, the same inconsistent outputs. So it doesn’t feel like a differentiator. It feels like a universal condition.

That’s going to change. And it’s going to change faster than most companies expect.

The companies building proper AI output quality infrastructure today are building something that compounds. When your AI-generated content is consistently accurate, useful, and on-brand — while a competitor’s is generic, unreliable, and trust-eroding — that difference becomes visible to your users, to search engines, and to advertisers. The gap between companies with mature quality systems and companies without them is going to widen significantly over the next two to three years.

YouTube’s response to AI slop is instructive here. They’re treating it as an existential platform quality issue — not a public relations problem, not a policy footnote. The platforms and products that invest in quality systems now will earn user trust that is very hard for late movers to recover. Users who’ve had their feeds ruined by slop don’t come back easily once they’ve found alternatives.

The OECD’s research on generative AI productivity is explicit on this point: the gains from AI are real, but they are heavily dependent on deployment quality. The hallucination and reliability risks are not going to be solved by better models alone. They require organizational systems and processes. The companies that build those systems now will be the ones with strong ROI numbers to show. Everyone else will still be investing and wondering why it’s not working.


AI Slop Is a Systems Problem. It Has a Systems Solution.

40% of AI value lost to rework. 95% of pilots not delivering measurable ROI. Only 46% of users trusting AI output. These are from large, rigorous studies across thousands of companies and tens of thousands of users. This is what AI deployment looks like at scale right now, without quality infrastructure.

The technology is not the bottleneck. The systems around the technology are. And “move fast and hope it’s good enough” has a measurable cost — in time, in user trust, in business outcomes that don’t materialize despite significant investment.

The fix exists. It’s not glamorous — it doesn’t make for a great press release. It’s rubrics, measurement, feedback loops, and training. It’s treating AI output quality with the same rigor you’d apply to any other core product feature. But it’s the work that separates companies building durable AI products from companies generating expensive noise.

Start by asking your team three questions right now: Do we have a written definition of acceptable AI output quality? Do we have a system to measure it? Do we have a feedback loop that makes it better over time? If any of those answers is no — that’s where you begin.

I publish frameworks like this every week — on AI implementation, product strategy, and what’s working inside real SaaS companies. 210,000+ readers already subscribed.Join the Newsletter →

References

  1. Nearly 40% of AI’s value is lost to rework — Workday/Hanover Research survey of 3,200 leaders and employees at $100M+ revenue companies (Business Insider, 2026)
  2. YouTube CEO Neal Mohan names reducing AI slop a top 2026 priority in annual creator letter (CNBC, January 2026)
  3. 95% of enterprise AI pilots fail to show P&L impact; KPMG 48,000-person trust survey — hallucinations and AI credibility (Forbes, 2025)
  4. Only 25% of AI efforts delivered expected returns — IBM Institute for Business Value & Oxford Economics, 2,000 CEOs surveyed
  5. BetterUp Labs & Stanford research on “workslop” — 41% of workers affected, nearly 2 hours wasted per incident (LinkedIn/HBR, 2025)
  6. AgentEval: Multi-agent frameworks for evaluating AI output quality (arXiv, 2023)
  7. The effects of generative AI on productivity, innovation, and deployment quality — OECD AI Papers, 2025

Swarnendu De Avatar