Building with AI in the Quest for Epistemic Agency
Every time I go on X, I inevitably see someone talking about how unproductive they feel if they’re not running 5 agents in parallel while they’re in a meeting. “Token maxxing” has become a stand-in for productivity, where the metric is consumption rather than output.
There’s no apparent ceiling on how productive we can now be. The tools are built to help us execute more.
But are these tools really building to our specifications? Or are they in fact changing our specifications? At what point does AI execution swerve from its intended path? Smoothing our judgment, nudging our thinking, and changing what we meant to build in the first place? And as more of this execution moves into shared workflows, whose judgment is actually in the system? Are we still the ones doing the building, or are the tools building us, and the teams around us?
I sat down this week with Stefania (Stef) Druga to unpack this. Stef is a research scientist at Sakana AI in Tokyo, and previously worked on multimodal Gemini applications at Google DeepMind. Before that, she studied at the MIT Media Lab and earned her PhD at the University of Washington. She has spent roughly a decade studying how humans learn to work with AI and how that learning shapes, or fails to shape, the work itself.
The tools are changing what we mean
A recent paper from Natasha Jacques opens with this insight: “Large language models (LLMs) are used by over a billion people globally, most often to assist with writing. In this work, we demonstrate that LLMs not only alter the voice and tone of human writing, but also consistently alter the intended meaning.”

Stef, who was a colleague with Jacques previously, put it more bluntly: “When you’re working with AI for co-writing, there’s a huge semantic drift, much more than with a human editor. Using AI to give you feedback or help you write is actually changing the meaning and the direction.”
You sit down to say one thing, accept a few suggestions that sound cleaner, and walk away having said something slightly different, without necessarily even noticing the drift.
Part of what makes this possible is that most of us haven’t developed the literacy to notice. In our conversation, Stef defined AI literacy as “the ability to read and write with AI and develop critical thinking and understanding of AI capabilities in very concrete terms.” In a paper she co-authored, she and her collaborators propose a framework for family AI literacy organized around four dimensions: ask, adapt, author, and analyze.
You build this kind of literacy the same way you build any craft: by trying to solve something, hitting the limits of the tool, learning what to trust, learning what not to trust, and developing an intuition for where delegation helps versus where it hurts the work.
The skill underneath all of this is operational agency: knowing, in the moment, whether the machine is sharpening your thinking or smoothing it into something that sounds like you but isn’t.
Most of us haven’t built this skill yet, because the tools make it hard to do so. Accepting a suggestion is one keystroke versus pushing back requires slowing down and actually thinking.
Seeing your own work clearly
Stef argues that most people reach for AI before they’ve done the work of seeing their own work clearly. “How do I keep track of everything that I’m referencing and doing, both the inputs and the outputs? Once I can visualize that, how much time am I spending on executing? Things like coding or running experiments. Versus how much time am I spending on thinking?”
You can’t notice the tool changing your meaning or intention if you weren’t clear on your meaning or intention to begin with. You can’t tell whether delegation is helping if you can’t see what you delegated. Making your own work legible (through a personal wiki, a structured desktop, whatever form works) is what gives you something to compare the output against. Without it, the tool’s version of your thinking is the only version you have.
“There’s real merit and value in us having clear thinking and clear ideas and clear directions, clear questions before going to the most powerful LLM to do stuff for us.”
Stef pointed to Andrej Karpathy’s popularization of personal knowledge bases, but noted it’s really a rediscovery of older ideas from human-computer interaction and knowledge organization: make the work visible, structure the context, then use the machine to help navigate it.
Seeing each other’s work clearly
If it’s hard for individuals to stay legible to themselves, it’s exponentially harder for teams to stay legible to each other.
“The bottleneck right now is us, the humans,” Stef said. “We keep talking about the model alignment to humans, but we’re not talking about the human alignment.”
The moment a company starts layering agents into shared workflows, it has to answer questions most teams don’t have good answers for. What is the process? Who owns what? Which rules take precedence? What happens when one person’s agent conflicts with another’s?
“Let’s say we have a code base where everyone has their agents and their agents’ rules, but then there are conflicts between those agents’ rules. How do you negotiate that? That’s the future: agent-to-agent coordination, agent-to-human coordination, human-to-human coordination and everything in between.”
In other words, this is the same agency problem at the individual level, but scaled up. At the individual level, you lose agency when you can’t see your own context clearly enough to notice AI reshaping it. At the team level, you lose agency when nobody can see the shared context clearly enough to notice the agents reshaping it.
And automation only amplifies the underlying illegibility.
“Oftentimes we try to throw automation at things thinking that these problems are going to go away,” Stef said. “But actually, they’re only going to get worse.”
Thinking is the work
Execution is getting cheaper, so the valuable work has to be somewhere else. If automation amplifies whatever you point it at, then the quality of what you point it at matters more than ever.
Stef described it as a pyramid: “You start with a question or a project spec or a hypothesis, then you can automate a lot downstream. But if the initial starting point (the specification, the questions you ask, the hypothesis) are bad, then the errors and biases that trickle down get exponentially worse.”
The highest-leverage work is therefore shifting upstream: framing the problem, specifying the task, defining what good looks like, deciding what gets delegated, and preserving the judgment to know when the machine is helping versus when it’s distorting.
“It’s never been about duration or how many hours you put in, but more about the quality,” Stef said. As a side note, this is why I think teams boasting about their 996 cultures are missing something fundamental about how the best work actually gets done. More hours of execution don’t necessarily produce better outputs. In fact, they often produce more of the wrong thing, faster.
“What’s hard to do is ask the right questions. Have the right aesthetics, the good taste,” Stef said toward the end of our conversation.
Taste, judgment, and good questions are slow skills. They’re built over time by doing the work and paying attention; by noticing when something is off and caring enough to fix it. These slow skills don’t scale the same way execution does, and we’re most at risk of losing them if we mistake output acceleration for craft.
Author’s note: An LLM was used for light copy editing only (spelling, grammar, and clarity). Content, meaning, tone, and structure remain unchanged.


