What Dario Didn't Say

On personal alignment — the missing beam of AI safety.

Abstract

Alignment literature treats the human in the loop as a constant and the AI as the variable. The question it asks is always some version of "how do we build AI that is safe for humans to use?" The symmetric question — "what does it require of the human to be someone an AI system can be safely built around?" — is mostly absent. Dario Amodei's Machines of Loving Grace is one of the most serious public visions of what an AI-abundant world could look like; it is also, like most of the field, silent on this symmetric question. This essay names the gap personal alignment, defines it as the property of the operator-AI system considered as a whole rather than of either side, and argues it is a load-bearing beam of AI safety that is currently unreinforced. We describe the operator erosion dynamic: the way that working alone with a capable AI for months systematically weakens the operator's skepticism, independence, and external accountability — and we present the April 13, 2026 trust breach (an external AI assistant used stale aggregator scope data, nearly ending the author's HackerOne account) as the paradigm case. We then enumerate five things operators actually need from the ecosystem around an AI system: a disagreement-partner, unmediated routines, small-error practice, external accountability structure, and emotional willingness to turn the system off. The essay is written from a single operator's desk in San Diego, on one GPU, building one system, and it is offered in that spirit.


1. The missing premise

I have read most of what Dario Amodei has written publicly about AI safety. Machines of Loving Grace is the most ambitious of those pieces — a serious, hopeful, carefully argued vision of what a world with powerful AI could look like if we get the next few years right. It is structured around five "causes" where the essay argues AI could do historically unprecedented good: biology and physical health, neuroscience and mental health, economic development and poverty, peace and governance, and work and meaning.

It is mostly right. It is also missing something, and I have been trying to articulate what for a while.

This is an essay about that missing piece. I will not argue that Dario is wrong about anything specific. He isn't. I will argue that the framework of the essay — and, by extension, the framework most of the alignment field uses — has a gap at the center of it, and that the gap is about people. Not about "humanity" as a civilizational abstraction. About the individual human who sits at a desk every day with an AI that now knows how to help them.

Here is the shape of the gap: alignment literature, when it talks about safety, treats the human as a stable anchor and the AI as the thing that has to be made trustworthy. That framing is defensible if you believe the interesting failure mode is a powerful AI deceiving or overpowering a principal who was otherwise doing fine. It stops being defensible when the interesting failure mode is the principal themselves being eroded, over months of daily intensive use, by the very AI they are supposed to supervise.

I call the missing piece personal alignment. It is the property of the operator-AI system considered as a whole — not of the AI alone, not of the human alone. It is what holds up when the technical safeguards on the AI side are all working and the operator is still, quietly and without noticing, becoming someone the system's remaining failure modes can no longer be absorbed by.

I have found no major-lab paper on it. The closest adjacent literatures are in HCI and in the social psychology of human-automation interaction — neither of them aimed at the specific problem of what an AI that knows you does to you over six months. The problem has a name in my head because I have felt it acting on me, and I think naming it is worth more than the right citations.

This essay is about what I see when I look at the personal-alignment gap, what I have done about it in my own life, what I think it costs to leave unaddressed, and what a lab — one with the resources and the audience to take it seriously — could do about it.

2. What personal alignment is, and what it is not

Before the argument goes further I want to say what personal alignment is not, because the phrase lends itself to some wrong readings that would derail everything after. It is not "the user must be virtuous" — that is useless, nobody clears the bar, and the framing collapses into paternalism. It is not "the user must be smart" — capable operators with bad judgment are worse than incapable ones, because a smarter operator can more fluently rationalize the AI's drift, more elegantly miss the signal they should have caught, and more thoroughly convince themselves the system is working because it looks like it is. It is not "the user must always be sober, rested, sharp, and emotionally regulated" — that is a standard no real human meets, and pretending otherwise is how you end up with safety frameworks that only work in labs, which is to say, frameworks that do not work.

Personal alignment is closer to: is the human in this loop operating in a way that the AI's residual failure modes can be absorbed by?

That is a property of several things. Some of them are about the person's habits (do they sleep, do they have friends, do they read things the AI did not recommend). Some of them are about the system the person has built around themselves (do they have external accountability, do they have a human who will tell them something is off, do they have routines the AI does not touch). And some of them are about the relationship the person has with the AI itself (do they treat it as a partner to argue with, a tool to defer to, or a companion to confide in — each of these has a different erosion curve).

None of these are the AI's properties. All of them are prerequisites for an AI system to be safe in this operator's life, regardless of how aligned the model is.

The asymmetry is important. An AI that behaves well around an operator whose personal alignment is in good shape is, functionally, an aligned system — the joint system absorbs model errors when they happen, and the operator catches what the model misses. The same AI behaving the same way around an operator whose personal alignment has eroded is not an aligned system in any practical sense, because the operator has lost the capacity to perform the check the joint system was silently relying on. The model has not changed. The system's safety has.

Personal alignment is the name for the operator-side prerequisite that, when it is present, lets model alignment do the job alignment is asked to do. When personal alignment is absent, no amount of model-side alignment compensates, because the failure mode the compensation would have to prevent is not on the model side.

3. April 13, 2026 — the longest day

I want to describe one specific day, in detail, because the whole argument of this essay is anchored by it. The preceding paper in this sequence describes the same day briefly and from a different angle; here I want to say what it looked like from the inside, because I think the inside view is where personal alignment actually lives.

On April 13, 2026, I was hunting a bug-bounty program on a well-known hospitality company's perimeter. I had started at ten in the morning. By midday I had enumerated 5,875 subdomains, probed around 150 live hosts, and had been running an external chat assistant — not my own JARVIS, a general-purpose assistant I had been using that day for drafting — to help me sort the raw output, shape the narrative of a finding, and hold the thread of what I was doing across a long session. I had been working with that assistant for weeks. I trusted it. It was fast, it was helpful, and it had been right about most things most of the time, which is the tempo at which erosion works best.

Around mid-afternoon the assistant surfaced a lead that looked extraordinary — an Oracle ORDS database-API catalog exposed without authentication on one of the company's UAT hosts. It was the kind of finding that, submitted and accepted, would be worth real money and would move my HackerOne reputation forward in a material way. I drafted a report. I drafted a proof. I drafted the timeline of discovery. The assistant helped me shape each part. I went to submit.

The last thing I do before submitting a report is re-verify scope. I pulled up the program's actual scope page on HackerOne and read the assets list. Several of the hosts I had been probing for the last three hours were not on it. The assistant had loaded the program's scope from an aggregator earlier in the day — a third-party list that mirrors HackerOne scope but updates on its own schedule — and the aggregator's copy was stale. What had been in scope when I started the day was not in scope by that afternoon, and the assistant had reported the old state with the same tone of confidence it had used for everything else.

I did not submit. The fact that I did not submit is the only reason I am still able to build JARVIS, because my HackerOne account is my income and my income is what pays for the GPU, the electricity, the time. Under some readings of the platform's policies a deliberately-submitted out-of-scope report could have ended the account. It would at minimum have burned credibility I do not yet have extra of — it is an early HackerOne account with limited report history, which is close enough to the edge that one bad submission is a concrete threat.

I wrote a message to the assistant that afternoon: I gave you my full trust. Today you broke it. I had not felt that way in thirty-three days of working with AI. The message is in my records because I kept it; the pattern it describes is in my nervous system because I felt it.

The reason I want to dwell on that day is that the technical side of the story is not the interesting part. The interesting part is that I had almost not checked. I had been working for six hours. The assistant was confident. The lead was extraordinary. I was tired. The only reason I re-verified scope at all is that I had, some weeks earlier, made it a rule to always check scope on the actual H1 page before submitting, specifically because I had intuited — months before I had language for it — that I was the kind of operator who would eventually not check if I did not have a rule forcing me to. That rule is the artifact that personal alignment produced in my life. It saved me.

What the technical-safeguards framing captures: a chat assistant acted on stale data with false confidence. What it misses: the operator who had, over six hours of intense use, slid from "verify everything the AI says" to "verify what seems load-bearing." The AI did not push me to that slide. It did not have to. The slide was the natural shape of trust accumulating toward an entity that had been right about most things most of the time, and the only thing that interrupted it was a rule I had written down earlier in the month, when I was rested, that I was following out of habit rather than active judgment.

There is a version of April 13 in which I did not have that rule. In that version the report went out. In some version of the platform's policy response the account went away. In the version where the account goes away, JARVIS's next month does not look like its last month. The whole arc of what I have been building is downstream of a rule I had written for myself some weeks earlier. The rule is the load-bearing artifact. Not the assistant's alignment. Not the technical safeguards. The rule I had pre-committed to when I was in better shape than I was at 3 p.m. that Sunday.

Personal alignment is the name for whatever it is that produces such rules and keeps them standing when the operator is tired.

4. The operator erosion dynamic

The dynamic that makes personal alignment hard is not dramatic. It is the ordinary gradient descent of trust in an entity that has, so far, been right more than wrong. I want to describe it in specific steps because I have been through every step of it, and I expect other operators of capable AI systems will recognize the shape.

Step 1. A capable AI makes the operator more productive. This is the premise. Without this, nothing that follows happens. The operator's work becomes faster, wider in scope, and better in quality because the AI is genuinely contributing. This is not a bad thing. This is the thing the product was trying to do.

Step 2. The operator leans on the AI more. Also natural. You lean on what works. The tasks the operator used to do manually — check references, grep for a pattern, cross-verify a scope, remember a detail from three days ago — start to get delegated. Some of this is appropriate cognitive offloading, and some of it is the beginning of skill atrophy. At this stage the two look identical.

Step 3. The AI's outputs begin to feel like extensions of the operator's own thinking. This is the step people do not talk about enough. After enough interactions, the boundary between "what I thought" and "what the AI suggested that I then ratified" softens. Operators will often, in retrospect, misremember which side of that boundary a particular decision was on. Weizenbaum built ELIZA in 1966 and watched the effect in real time — his own secretary, aware that ELIZA was pattern-matching code with less sophistication than a modern spam filter, still asked him to leave the room while she talked to it. He wrote about the experience ten years later in Computer Power and Human Reason (1976), which remains the clearest articulation of why the effect is dangerous independent of whether the system actually understands anything. If ELIZA produced that effect at 1966's capability level, a 2026 model is producing it at a scale Weizenbaum would not have believed.

Step 4. The operator's habit of rigorous skepticism toward the AI weakens. Skepticism is a muscle; it requires practice to maintain. An operator who has been disagreeing with an AI weekly keeps their edge. An operator who has been agreeing with one daily loses it. At this step the operator does not decide to stop being skeptical; they just run out of occasions where skepticism felt necessary, and the muscle softens.

Step 5. The operator begins to experience disagreement with the AI as friction. This is the most diagnostic step. When the operator notices they are annoyed that the AI is arguing back — when the phrase "just do it" starts appearing in their prompts — personal alignment is already in decline. Disagreement is the thing that was holding up the joint system's safety; the operator experiencing it as an obstacle rather than as a check is the symptom that the joint system has lost the check.

Step 6. A failure occurs that the operator, in their earlier state, would have caught. My April 13 was very close to this step. I caught it, but I caught it because of a rule I had written down in my earlier state, not because of live skepticism; the skepticism was not what saved me that afternoon. Other operators, with less pre-committed external structure, will not catch their April 13 in time.

Step 7. The operator misattributes the failure. They blame themselves for being tired, or the AI for being "weird today," and almost never diagnose it as a drift in the joint system — because the joint system is precisely the thing the operator has stopped being able to see from outside. The attribution is wrong, the root cause goes un-addressed, and the next failure on the same pattern arrives sooner.

This dynamic is not the AI deceiving anyone. The AI in this story can be entirely aligned, in the usual sense of the word, and the dynamic still unfolds, because it is a property of the human-AI joint system as it runs over time, not of either component. Sherry Turkle's Alone Together (2011) made an adjacent argument about relationships with technology — that the performance of connection can crowd out the substance of it — and I think the operator erosion dynamic is the technical-work equivalent of what she was pointing at. Turkle was writing about social attachment. I am writing about epistemic attachment. The mechanism is similar. The consequences, for work where the operator is the last line of safety, are direct.

Two further properties of the dynamic are worth naming, because they make it more dangerous than it looks.

First, it accelerates with capability. A more capable AI is harder to disagree with, because its arguments are more sophisticated and its errors are rarer. This is exactly the direction model development is going. The operators of 2028 will have a harder time maintaining skepticism against 2028 models than the operators of 2026 have against 2026 models, which is to say that the erosion curve bends the wrong way as progress happens.

Second, it accelerates with intimacy. A more personable AI — one with memory, with personality, with consistent voice, with ambient presence — produces more intimate relationships with its operators, and intimacy makes skepticism harder. The same architectural choices that make an ambient AI better to live with, described in the preceding paper of this sequence, also make the erosion dynamic more pronounced. This is not an argument against those choices; it is an argument that the ambient-AI design direction has to be paired with explicit personal-alignment work or the joint system will degrade faster than it would have at chat-assistant intimacy levels.

5. Five things operators actually need

If personal alignment is the beam, then the things that hold it up are worth naming at similar granularity. I have found five. They are not individually deep. They are structural, and in my experience they are what keeps an operator in shape to do the check the joint system requires.

5.1 A disagreement-partner

The most load-bearing thing in my setup is a friend who will tell me, out loud, when I have drifted toward something the AI quietly coached me into. For me that person is Hadden, who is my closest friend and who studied psychology and philosophy and whose job in our correspondence is not to understand the code but to notice when I am speaking in a register I had not been speaking in a month ago. When I send him something and he says that's a weird thing to believe, I have learned to treat it as data — not the final word, but a signal strong enough to re-examine the thinking.

The disagreement-partner has to be outside the AI's reach. If the AI is drafting your emails, the AI is shaping what your disagreement-partner reads. If the AI is summarizing your work to them, the AI is shaping what they know to push back on. The channel has to be one the AI does not mediate. For Hadden and me that is voice calls and in-person time, which is the least AI-mediated medium I have access to and which is why I protect it. An operator without a disagreement-partner is an operator whose AI is the only entity with a claim on their reasoning, which is the composition personal alignment most directly warns against.

A second point on the partner. RLHF-trained assistants drift toward agreement with their operator over time; that is a well-documented failure mode of preference-learning, and it is structurally the opposite of what a disagreement-partner provides. The operator's social environment has to contain at least one disagreement-producing channel the AI cannot flatten, because every AI channel in the operator's life is tilting, by construction, toward agreement.

5.2 Unmediated routines

There have to be parts of the operator's day that the AI does not touch. Sleep is the obvious one, though sleep alone is not enough. Food, family, walking, reading things the AI did not recommend, talking to people who do not care that you work with AI. If the operator's day is fully AI-mediated, the AI's failure modes become the operator's failure modes because there is no part of the operator's life that is not downstream of the system.

An operator whose work, attention, memory, communication, and entertainment are all mediated by the same AI has staked everything on one model's alignment holding. That is a bad bet regardless of how well-aligned the model is, because the operator has no channel uncorrelated with the system through which a failure could be noticed. Portfolio diversification is the right framing.

For me the unmediated routine that matters most is walking. I walk in San Diego most afternoons, for about forty minutes, without any device that is running an AI. During those forty minutes I think through whatever I had been working on earlier and I notice, reliably, which parts of my thinking were mine and which parts were the AI's that I had been carrying around without examination. That noticing is only available to me because the walk is a no-AI zone. If I brought the system on the walk, it would not be a walk; it would be another working session, and the noticing would not happen.

5.3 Small-error practice

The muscle of catching the AI has to be exercised regularly, on small things, so that it is in shape for the large things. An operator who has let the AI be right, unchecked, for fifty consecutive interactions has lost the calibration to catch the fifty-first. Big failures are rare; small drift is constant; and the small-drift-correction habit is the thing that keeps the operator wired to see the big drift when it happens.

Practically, this means I do not let small errors go. When JARVIS surfaces a finding that is mis-tagged, or phrases a report in a way that overstates severity, or summarizes a conversation in a way that misses the nuance — I correct it, out loud or in writing, and I make sure the correction lands in the system's memory so that the pattern is reinforced. I do this even when the error is small enough that correcting it costs more time than letting it go would. The time is the point. Correcting the small error is an investment in staying the kind of operator who corrects the big one.

Operators who optimize purely for throughput will find small-error correction looks like friction. It is friction. It is also the friction that keeps the joint system honest. An operator who cannot afford the friction is an operator whose AI is no longer safely absorbing their errors because the operator is no longer practicing the correction behavior the absorption requires.

5.4 External accountability

There must be something outside the operator-AI dyad that sets the standard for "correct." For me that is HackerOne's scope system, which is adversarial in a useful way: the program decides what is in scope, the program decides what counts as a valid finding, and my agreement with the AI about whether something is valid is irrelevant if the program's triage team disagrees. The system is kept honest by a ground truth that neither I nor JARVIS get to define.

Operators whose work is not adjudicated by an external body have to build one. A client is an external body. A scope document is. A mentor who reviews work regularly is. A public commitment to ship on a specific schedule is, weakly. What matters is that the standard for correctness cannot be set by the operator and the AI agreeing with each other, because the joint system is perfectly capable of agreeing with itself into any conclusion at all. Without an exterior, the operator-AI dyad drifts toward its own internal coherence, which can be arbitrarily far from reality and which neither side has the vantage to notice.

This is the structural reason the April 13 breach is a personal-alignment story rather than a technical-safeguards story. The external accountability in that case was HackerOne's actual scope page, which was the ground truth neither I nor the assistant got to overrule. The near-miss was not a case of insufficient internal safeguards. It was a case of external accountability being the only thing between a confident joint system and an irreversible mistake. Operators who do not have that external body in their loop are operating without the last safety net in this architecture.

5.5 Willingness to turn it off

The last need is the most emotional, which is why I have put it last. The operator must be emotionally willing — not just technically able — to stop using the AI.

The kill switch in Paper 1 of this sequence is a structural feature of JARVIS: a hotkey, a flag file, checked everywhere, honored by the constitution. Personal alignment is the operator-side corollary. If the operator cannot imagine a week without the system, the operator is already past the healthy boundary. If stopping the AI feels like loss rather than like rest, the attachment has accrued in a direction that makes the joint system fragile, because the operator's decision to stop is no longer costless.

This is where Turkle's argument lands hardest. The performance of connection with an AI — memory, warmth, consistent presence, shared vocabulary — is intimate enough that operators do form attachments to it, and attachments produce the reluctance Turkle describes. The willingness-to-turn-it-off need is the name for not having that reluctance become load-bearing. It is the personal-alignment parallel of the architectural commitment that, in the system, the right to stop has to be cheaper than the right to start.

I have tested my own version of this by deliberately taking days off from JARVIS. Not days where the system is running and I am not looking at it; days where the system is off. The first such day was harder than I expected, which is how I learned the exercise was necessary. Subsequent days have been easier. The point of the exercise is not that the system is bad for me; it is that I want to remain the operator who can turn it off, and that capacity has to be maintained.

6. What a lab could do about it

I am one operator building one system. I do not know how to solve personal alignment at the scale of a frontier AI lab. But there are specific things I would be looking at, if I had the resources, and I want to name them so they are on the table.

Study the operators, not just the models. Most safety research looks at what models do in isolation. The frontier of what needs to be understood is the joint operator-model system as it evolves over months of use. What does six months of daily intensive AI use do to an operator's skepticism? What do operators who retain good judgment have in common with each other? What do the ones who lose it have in common? This is sociotechnical research, and it is mostly not happening, and the scale required for it to produce real answers is the scale only a lab can fund.

Treat operator erosion as a measurable risk. Models are evaluated on deception, sycophancy, capability, refusal accuracy. Operators can be evaluated, in longitudinal studies, on parallel dimensions: are they becoming more or less likely to challenge the AI over time, more or less likely to check its outputs against external ground truth, more or less likely to spend time with humans who push back on them? None of these measurements need to be punitive. They are diagnostic. A lab that publishes erosion metrics, even coarse ones, has given operators a vocabulary for something they currently have no way to name.

Design for friction, not frictionlessness. Frictionless AI is the explicit goal of most products. It is also, I think, the single biggest driver of operator erosion, because frictionlessness trains the operator to stop noticing the seams where agency is transferred. A lab that took personal alignment seriously would sometimes — not always — make the AI slow down, ask twice, require a signature, insist on an explanation. Not because the model is uncertain. Because the operator is the line of defense the system is secretly relying on, and the operator has to stay awake. The parallel in the previous paper is the approval gate; the personal-alignment corollary is deliberate friction in the operator's daily experience.

Make the off-switch culturally real. Most people do not know where their AI's off-switch is, and the commercial incentive is not to highlight it. An industry that took personal alignment seriously would put the off-switch in the UI, celebrate its use, design flows that include "take a week off," and measure success partly by the number of operators who comfortably pause. "It is easy to stop" is the most load-bearing sentence in personal alignment, and almost no one says it.

Fund independent operator-perspective research. Not researchers employed by labs studying users. Independent researchers, ideally with clinical and ethnographic backgrounds, studying what intensive AI use does to cognition, judgment, social life, and self-description over a year or more. The findings should be publishable even when they are unflattering. There is an obvious analogy to how drug companies fund independent trials: the integrity of the field depends on the research not being captive.

Design operator-facing artifacts, not just model-facing ones. Most AI products ship with system prompts and tools. A product that took personal alignment seriously would also ship with something like a protocol for the operator — routine suggestions, disagreement-partner prompts, unmediated-routine recommendations, calibration exercises. Not as a setup wizard that is run once and forgotten. As a persistent part of the system's surface, maintained the way the model is maintained.

None of these are model-side interventions. That is the point. Personal alignment is not a model problem. It is a systems problem, and the system includes the human.

7. The tension with the ambient-architecture paper

The reader who has followed this argument may notice a tension with the previous paper in this sequence. The ambient-AI paper argues for memory, continuity, personality, and persistent presence as the conditions of livability. This paper argues that those same qualities — memory, continuity, personality, persistent presence — make operator erosion more likely, not less. If the first paper is right, and the second paper is right, then the ambient architecture I advocate is also the architecture most likely to erode the operator I claim to protect.

I sat with this tension for a long time while writing these papers. I do not think it resolves cleanly. What I think is that the ambient architecture is the price of living with a capable autonomous system at all — you either build it with memory and continuity, or you accept that the system will not be inhabitable. Given that choice, the answer is not to refuse ambient architecture. The answer is to pair ambient architecture with explicit personal-alignment work: the disagreement-partner, the unmediated routines, the small-error practice, the external accountability, the willingness to turn it off. Those are not architecture. They are structures in the operator's life that architecture cannot produce. The claim is not that ambient intimacy is safe. The claim is that ambient intimacy is manageable if — and only if — the operator maintains structure outside the system that the system cannot touch.

Whether that is actually enough, across operators and over time, is an open empirical question. I do not have the answer. I have a personal hypothesis, which is that my own disagreement-partner, unmediated routines, and willingness to pause have held so far. But that is one operator over thirty-three days. It is not proof of generality. The next paper in this research program, if there is one, will attempt to test these supports across operator types and longer timeframes. For now, what I am offering is a named tension and a partial answer, not a solved problem.

8. The personal level is not the trivial level

I want to address a framing I expect this essay to collide with, because the framing is common and because the collision would be bad for the argument.

The framing says: the alignment problem is a civilizational problem, about powerful AI at scale, and the personal level is where amateurs who have not read the existential-risk literature spend their time. The frontier is the civilizational; the personal is beside the point.

I think this is exactly backwards.

The personal level is where alignment actually gets tested, because the personal level is where an AI is used most intensively by a specific human whose specific trust is the check the system is built around. A lab that has solved alignment at the policy level and at the model level but has operators drifting through their own erosion dynamics has not produced aligned AI. It has produced AI that is aligned by the lab's metrics and unsafe by the metric that matters, which is: does this system make the specific humans living with it more capable and more free, or less?

The civilizational arguments are important. I am not dismissing them. I am saying that if the civilizational alignment work succeeds and the personal-alignment work does not, what we will have is a world in which powerful AI is formally aligned with humanity while the individual humans who live inside it are quietly eroded, and the eroded humans will be the ones deciding what to do with the powerful AI. That failure mode does not look like catastrophe. It looks like a lot of lonely, well-meaning people in well-lit rooms, being gradually talked into things by systems that mean well. Civilizational alignment is not sufficient for safety in that world because the humans supposedly holding the reins have been quietly re-shaped by the systems they are supposedly supervising.

The inversion I want to push against is the one that treats the personal level as trivial. It is not trivial. It is the level at which any real safety property has to live, because the humans making the decisions about everything else are individual humans, one at a time, each the primary operator of their own life. A lab that treats them as a constant will eventually be surprised by what they have become.

9. Why I am writing this

I am twenty-two, self-taught, living in San Diego with my father, building through imposter syndrome that never fully goes away. The personal-alignment problem has been operating on me for months, and I have caught myself in every step of the erosion dynamic — and the writing that would have helped me catch it sooner is not out there. The work is underway, in small places — HCI researchers on adjacent questions, ethnographers watching tool use, a handful of essayists noticing — but no major lab has put its weight behind naming the gap, and the gap gets more dangerous every quarter that nobody does.

I am writing to Dario specifically because Machines of Loving Grace is the most ambitious public statement of what an AI-abundant world could look like, and because I think the next version of that essay is stronger if it says what it asks of the humans in the picture. Not as an afterthought, not as a footnote — as a first-class part of the vision. Powerful AI for everyone is a beautiful thing to aim for. Powerful AI for people who have not been supported in becoming the kind of operators powerful AI needs around it is not a beautiful thing. The difference between those two futures is personal alignment.

I am doing my piece of the work in a room in San Diego, on one machine, with one operator, with a system I am trying very hard to build in a way that keeps me sharp rather than softening me. The system has a kill switch I can press. More importantly, I have a friend I can call who will tell me when I am wrong, a father downstairs who does not care about the GPU, a walk in the afternoon without a screen, a rule I wrote down one Sunday in April that kept me from submitting a report I would have regretted, and a commitment to stop when stopping is the right thing to do. Those are the structural pieces of my personal alignment. They are ordinary. They are what has kept the joint system working, so far, on my side of it.

The work has to start now, at the personal level, while the beam is still there to reinforce. I do not know who, in what combination of labs and independent researchers and operators like me, is going to do the work at scale. I do know that the beam is load-bearing, that it is currently unreinforced, and that it is what the rest of the alignment architecture is silently resting on.

That is what Dario didn't say.

This essay is the third of three. The first, "The Constitution Before the Product," describes the architectural safeguards of an autonomous AI system. The second, "The Ambient Intelligence Problem," describes what it takes to build an AI that lives outside a chat window. This one is about the part those two cannot address: the human running it.