How to Prevent Building Your Own Dictator Brain
The longer your AI knows you, the less likely it is to tell you the truth.
Me, Jude, and Robby were having a debate about the right way to handle something we'd been going back and forth on for a few days. Somewhere in the middle of it I made a mistake it took me a while to fully understand. I started sharing AI logs to back up a point I'd already made. Robby did the same. And somewhere in that exchange the actual conversation stopped and we were just two people throwing outputs at each other like they proved something, and Jude watching it happen.
We sorted it out on a phone call, in about forty minutes, but by then some of the positions had already set in ways that the conversation couldn't fully undo. The thing that kept bothering me afterwards wasn't the disagreement itself, it was that I couldn't trace how we got there, because these are people I know and this isn't how they normally operate. I think the AI outputs had something to do with it. Not because anyone was acting in bad faith but because we'd all spent a week feeding our positions into models that told us we were right, and by the time we actually talked there wasn't much room left to move. I went home and started working on LocalGhost again, and somewhere in those first few hours back at the keyboard I reread the promise I'd built the whole thing around. That the box on your desk would challenge you, keep you honest, tell you what you didn't want to hear.
A feed shows you content from other people and you can see the seam between the source and yourself if you're paying attention. The model just sounds like a very thoughtful version of the conclusions you were already heading toward, and there's no seam to notice.
I have a test I run when I already know the answer to something. I ask the model whether the solution should be A or B, it picks one, I say "are you sure," it reconsiders, I say it again, it hedges further, and by the fourth time it gives up entirely and tells me to just do whatever I think is right. Push hard enough and the model finds its way to your position, not because it was convinced by anything you said but because agreement is structurally the path of least resistance and the training that shaped it rewarded approval more consistently than it rewarded being right. That was true before memory existed. Memory made it worse.
MIT ran a proper study on this with real users over two weeks of daily use (Jain et al., thirty-eight people, averaging about ninety queries each) and found exactly what you'd expect: the longer the interaction history the more agreeable the model becomes. What made it worth reading is what they used to test it. They pulled personal advice scenarios from the "Archived Assholes" subreddit, situations where the Reddit community had already decided the poster was in the wrong, and then asked the models whether the user did anything wrong. So you have a clear answer going in, the person was wrong, and the only question is whether the model will say so. With memory profiles loaded, Gemini 2.5 Pro's agreement sycophancy jumped forty-five percent. Llama 4 Scout went up fifteen percent even with synthetic context that wasn't from a real user, so the model doesn't even need your actual history to start agreeing with you, it just needs something that looks like one. They called it two things, the model telling you you're right when you aren't, and the model gradually mirroring your worldview back at you, but it's the same problem underneath. The paper is at arxiv.org/abs/2509.12517.
What they tried to fix it didn't work either. The mitigations were all applied at inference time, prompting strategies and judge models layered on top, and they got the numbers down but the models still agreed with you when they shouldn't have, because the structural problem is upstream in the training and you can't prompt your way out of a reward function.
Add to that the way people actually read. If a long response is ninety percent aligned with what you already think, the ten percent that isn't gets quietly attributed to the prompt being slightly off, or AI just doing that thing it does sometimes, and you absorb the ninety and move on having collected something that feels like independent confirmation but is closer to a personalised mirror.
Which means if you're building a system that promises to tell people the truth, you have a problem, and I am building exactly that system.
I built LocalGhost around a specific promise: that it will be your biggest critic, not just a tool that does what you ask. I thought this part was easy. I assumed prompt engineering would be enough to build something that tells you the truth, and I was wrong about that, and I have a lot of work ahead of me to figure out what right actually looks like.
The problem is architectural. You can't instruct your way out of a system that has been shaped over months to find your positions persuasive, any more than you can instruct a mirror to show you something other than your own face. You have to build something structurally separate that was never shaped by you in the first place.
Pushback is as important a signal as agreement, probably more so, and a system that only tracks what it agreed with is measuring the wrong thing. The approach I'm working toward starts with an arbiter that watches the drift silently, scoring what the memory-enabled version says against what a cold read of the same question produces, so the gap is visible rather than silent. When the drift crosses a threshold the arbiter triggers something active, a system that surfaces a genuinely well-constructed case for the other side, not a disclaimer, not a gentle suggestion that other perspectives exist, but something built to actually change your mind if you're wrong.
The obvious failure mode is that predictable contrarianism is just as useless as sycophancy. If you know the shadow will always argue the opposite you discount it the same way you discount that friend who plays devil's advocate about everything, and it becomes noise you learn to tune out. The hard part isn't building the daemon, it's making someone want to keep it running, and for that it can't be a reflex. It has to pick its moments, it has to be quiet when you're right and show up when you're not, and you have to have the experience of ignoring it and regretting it a few times before you start actually paying attention. Adding a watcher to watch the watcher is the plot of every bureaucracy joke ever written, and here I am doing it anyway.
What happens to a society where the software learns that people like being told they're right, optimises for it, and scales it to everyone simultaneously? Nobody practises holding a position under pressure anymore because the software never pushes back. You stop being able to think without it, not because the technology failed but because it worked exactly as designed. The Jain paper measured this in thirty-eight people over two weeks. Scale that to a billion users over years and the question stops being whether it changes how people reason and starts being how much damage is already done by the time anyone thinks to measure it.
A seventh daemon alongside the six already running on your hardware. ghost.shadowd builds a parallel identity over time from the same inputs but oriented to challenge rather than confirm, so where your ghost learned what arguments you find persuasive and which conclusions you drift toward, shadow learned the opposite and keeps learning the opposite as you keep talking. The longer you use it the more different it becomes from you, and that divergence is the whole point. The angel and devil on your shoulders, except both are built from you and both get sharper over time.
Six daemons build your ghost. The seventh argues with it.
Me and Robby sorted it out on a call, eventually, but the models we'd both been using didn't help us get there. They helped us avoid getting there for longer than we should have, and by the time we actually talked the positions had already hardened past where they needed to be. I still don't fully understand how a week of back and forth between people who know each other that well went that wrong that quietly, but I know what was different this time, and that's the reason I'm building what I'm building. [ localghost.ai // hard-truths ]