The Bitter lesson and worse is Better

Jan 7

Written By Derek

There are two ideas, decades apart, that I think about all the time. Two great ‘papers’

The Bitter Lesson (Rich Sutton)
Worse is Better / The New Jersey Approach (Richard Gabriel)

At first glance, they seem to come from different worlds—machine learning vs. programming language design. But together, they form a powerful lens for understanding why modern AI is evolving the way it is.

And more importantly: why the systems that win are rarely the ones that are most “correct.”

The Bitter Lesson: Compute Beats Cleverness

The Bitter Lesson argues something uncomfortable but historically consistent:

Over the long run, general methods that leverage computation outperform methods that rely on human-designed structure.

In other words:

Hand-crafted features lose to learned representations
Domain-specific cleverness loses to scale
Carefully engineered systems lose to brute-force learning with enough data

In the article :

Chess → search + compute beat human heuristics
Vision → deep learning beat feature engineering
NLP → LLMs beat symbolic systems

The takeaway..
Don’t overdesign. Scale wins.

Worse is Better: Simplicity Beats Perfection

Gabriel’s “Worse is Better” (aka the New Jersey approach) makes a different—but equally pragmatic—argument. Growing up outside of NYC and living in New Jersey, I appreciate the subtext here as well:

Systems that are simpler, more usable, and easier to adopt will beat more elegant, “correct” systems.

This is why:

C beat Lisp
Unix philosophy beat more theoretically pure OS designs
Simple APIs win over “perfect” abstractions

The tradeoff:

You sacrifice completeness and elegance
You gain adoption, iteration speed, and survival

The system thats easier to use wins, not the one that is most beautiful.

In startup world this is the way as well IMHO, make it work then make it pretty.

The Juxtaposition: Two Paths to the Same Outcome

At first, these ideas feel orthogonal:

Bitter Lesson

Scale and compute win
Avoid human bias in design
Favor generality

Worse is Better

Simplicity and usability win
Accept imperfection in design
Favor practicality

They converge on the same meta-principle:

Don’t over-engineer early. Build systems that can grow.

The Modern AI Pattern: Foundation Models + Adaptation

Today’s AI stack is where these two ideas collide and reinforce each other.

Old mindset (pre-LLM era)

Build custom models per problem
Design features carefully
Optimize architecture upfront
Lock into frameworks early

New mindset

Wait for foundation models
Build thin layers on top
Adapt via prompting, RAG, fine-tuning
Iterate rapidly instead of perfecting upfront

This is both:

The Bitter Lesson → Let large-scale pretraining do the heavy lifting
Worse is Better → Use the simplest interface (prompting, APIs) to unlock value

An old startup guy I know Carlos Cashman used to preach essentially

don’t worry about your startup codebase, you’re gonna rewrite it anyway if you’re successful

At first it sounded absurd, but I’ve lived it a few times now.

Why “Good Enough + Adaptable” Wins

The winning systems today share three traits:

1. They are easy to use

Prompting an LLM > building a custom NLP pipeline
API calls > training from scratch

2. They learn continuously

Instead of encoding rules:

You refine prompts
You improve retrieval
You fine-tune when needed

3. They avoid premature rigidity

Hard-coded systems:

Break when assumptions change
Are expensive to modify

LLM-based systems:

Are inherently flexible
Can adapt to new tasks with minimal changes

Flexibility => Correctness

The New Jersey approach tolerates imperfection.
The Bitter Lesson rewards generality.

Together, they suggest:

The best system is not the most correct—it’s the one that can improve itself fastest over time.

This is a subtle but profound shift:

From designing the right system
To designing a system that can become right

oVer-cooking the design

In today’s AI landscape, over-engineering early can hurt you:

You lock into assumptions that foundation models will obsolete
You spend time solving problems compute will solve anyway
You reduce your ability to pivot as models improve

Meanwhile, teams that:

Use simple abstractions
Ride model improvements
Iterate quickly

end up leapfrogging more “thoughtful” systems.

A Practical Heuristic for Builders

When designing AI systems today, ask:

“Am I solving this with structure that scale will replace?”

And:

“Am I making this harder than it needs to be to adopt and iterate?”

If yes, you’re probably fighting both lessons.

Final Thought

The Bitter Lesson tells us:

Don’t outsmart compute.

Worse is Better tells us:

Don’t out-design usability.

Together, they tell us:

Build systems that are simple enough to use today, and flexible enough to get better tomorrow.

That’s not just a philosophy—it’s the blueprint behind modern AI.

Derek

Startup CTO, Software Hacker

The Bitter lesson and worse is Better

The Bitter Lesson: Compute Beats Cleverness

Worse is Better: Simplicity Beats Perfection

The Juxtaposition: Two Paths to the Same Outcome

The Modern AI Pattern: Foundation Models + Adaptation

Old mindset (pre-LLM era)

New mindset

Why “Good Enough + Adaptable” Wins

1. They are easy to use

2. They learn continuously

3. They avoid premature rigidity

Flexibility => Correctness

oVer-cooking the design

A Practical Heuristic for Builders

Final Thought

Dysprosium Engineering

Location

The Bitter lesson and worse is Better

The Bitter Lesson: Compute Beats Cleverness

Worse is Better: Simplicity Beats Perfection

The Juxtaposition: Two Paths to the Same Outcome

The Modern AI Pattern: Foundation Models + Adaptation

Old mindset (pre-LLM era)

New mindset

Why “Good Enough + Adaptable” Wins

1. They are easy to use

2. They learn continuously

3. They avoid premature rigidity

Flexibility => Correctness

oVer-cooking the design

A Practical Heuristic for Builders

Final Thought

Jevon’s paradox - Does it Apply?

Dysprosium Engineering

Location