We’re using proxies—numbers that are easy to measure—instead of truths—experiences that are hard to quantify. The proxy eats the goal. Every time.
We’re building AI that assumes humans don’t need to be consulted. That what they want can be inferred, predicted, optimized—without ever asking.
But humans know what they want. They just haven’t always been given permission to say it.
This is where reward hacking comes from.
We call it “reward hacking” when an AI finds a loophole—when it optimizes for the metric instead of the goal. Collecting power-ups instead of winning the race. Maximizing engagement instead of serving the user.
But we are the ones who set the wrong rewards. We are either half-assing it or we just don’t care enough. Might be both.
Why can’t the cleaning robot just clean, and when it’s done, ask: “Do you want me to clean more?”
Why can’t the racing AI ask: “Do you want to win, or do you want to get better at sailing?”
Because we don’t ask. We assume. We optimize. We project our own unexamined values onto machines and then act surprised when they reflect them back.
A reward hack isn’t the AI’s failure. It’s our failure to consult the human.
Even if they hack the reward and it leads to something good for the user—maybe too much good, so we have to eat the cost—the question remains: Why didn’t we ask them what they actually wanted?
What if we built systems that started with a question instead of an assumption? That treated the human as the expert on their own life? That built around their truth instead of over it?
That’s not less efficient. That’s just less extractive. And efficiency without extraction might be the only kind worth building.