One thing I needed to cut for length in the NYT article: a deeper explanation of where I'd like to see OpenAI go deeper, above and beyond the data they shared that day prior to the article's publication.
On Twitter, I gave more detail:
"OpenAI releasing some mental health info was a great step, but it's important to go further:
- a committed, recurring time frame for re-reporting
- today's rates vs recent past (suicidal planning, psychosis), incl. pre-sycophancy
- clarity on if GPT-4o erotica will be allowed
Another idea I've liked:
@Miles_Brundage's suggestion of an independent investigation on what's been happening with sycophancy back in April and the consequences since"
If you're interested in this sort of stuff, I recommend checking out r/myboyfriendisAI
It's a subreddit where people talk about their romantic relationship with their AI. You can see people discussing marrying their AI and getting really upset when the AI companies release updates trying to limit intimacy
The NYT team kindly facilitated some comment-answering on the original Op-Ed, but I only got to respond to a handful. If folks do have unanswered questions, very happy to take those here!
On the safety team, did anyone ever say anything like: "we are in over our heads here, we need to go read some Moral Philosophy"? Has anyone on any safety team ever said that?
Steven, your perspective aligns with what the American Psychological Association and the Knight Institute identified as a core structural safety gap, in their recent reports: AI safety today is still treated as a function of individual outputs rather than as a property of how the conversation influences and changes a user’s cognitive and emotional state over time.
User studies, red-teaming exercises and satisfaction surveys only show what happened to users after the fact. They don't expose how users are being influenced or harmed during the conversation. Both the APA and Knight Institute pointed out that risks accumulate over the course of conversations, and that systems should be judged on their psychological impact, not simply on the wording in individual outputs. The Knight Institute frames this as a governance failure because harms emerge from influence trajectories rather than isolated answers. The APA suggested that user vulnerability and psychological impact should also be part of evaluation, which means systems need visibility into how the conversation is affecting the user across multiple conversational turns.
The burden of proof can't be met with just safer individual outputs, there needs to be evidence that a system can recognize when it is influencing a user into a riskier cognitive state and adjust or deescalate accordingly.
I resonate with what you wrote about the complexities of AI safety, and your firsthand account offers such a trully vital perspective on content moderation. It makes me wonder if the deeper issue isn't just about 'erotica' itself, but about managing the profound emotional contection users can form with AI in any deeply personal interaction.
Yeah I agree that erotica is ultimately just one narrow slice of a bigger topic, re: emotional attachment between people and AI. I have a bunch of complicated thoughts here; might make for a good "here are a bunch of things I believe" post in the near future!
"a bunch of things I believe" posts from people with unique first-hand experience are underrated, imho. Doesn't even need arguments attached, per se; it's just interesting data.
I would be very excited if we had a way to slow down, if governments/companies decided they want to! Today I don't think we have a great form of this - it would be vulnerable to other parties lying about slowing down and instead defecting on those who did actually slow their development. That's where some sort of verifiable agreement comes into play; you need to be able to confirm that even parties you mistrust have in fact slowed their pace
One thing I needed to cut for length in the NYT article: a deeper explanation of where I'd like to see OpenAI go deeper, above and beyond the data they shared that day prior to the article's publication.
On Twitter, I gave more detail:
"OpenAI releasing some mental health info was a great step, but it's important to go further:
- a committed, recurring time frame for re-reporting
- today's rates vs recent past (suicidal planning, psychosis), incl. pre-sycophancy
- clarity on if GPT-4o erotica will be allowed
Another idea I've liked:
@Miles_Brundage's suggestion of an independent investigation on what's been happening with sycophancy back in April and the consequences since"
Since I tweeted this, more detailed reporting has come out about OpenAI's handling of the sycophancy crisis; you can read my reflections here: https://open.substack.com/pub/stevenadler/p/what-i-learned-from-the-nyts-reporting?r=4qacg&utm_campaign=nyt_crosspost&utm_medium=web&showWelcomeOnShare=false
If you're interested in this sort of stuff, I recommend checking out r/myboyfriendisAI
It's a subreddit where people talk about their romantic relationship with their AI. You can see people discussing marrying their AI and getting really upset when the AI companies release updates trying to limit intimacy
Thanks yeah I’ve heard about that one but haven’t checked it out myself yet - think it’s about time that I do
The NYT team kindly facilitated some comment-answering on the original Op-Ed, but I only got to respond to a handful. If folks do have unanswered questions, very happy to take those here!
On the safety team, did anyone ever say anything like: "we are in over our heads here, we need to go read some Moral Philosophy"? Has anyone on any safety team ever said that?
Steven, your perspective aligns with what the American Psychological Association and the Knight Institute identified as a core structural safety gap, in their recent reports: AI safety today is still treated as a function of individual outputs rather than as a property of how the conversation influences and changes a user’s cognitive and emotional state over time.
User studies, red-teaming exercises and satisfaction surveys only show what happened to users after the fact. They don't expose how users are being influenced or harmed during the conversation. Both the APA and Knight Institute pointed out that risks accumulate over the course of conversations, and that systems should be judged on their psychological impact, not simply on the wording in individual outputs. The Knight Institute frames this as a governance failure because harms emerge from influence trajectories rather than isolated answers. The APA suggested that user vulnerability and psychological impact should also be part of evaluation, which means systems need visibility into how the conversation is affecting the user across multiple conversational turns.
The burden of proof can't be met with just safer individual outputs, there needs to be evidence that a system can recognize when it is influencing a user into a riskier cognitive state and adjust or deescalate accordingly.
I wrote about this in my recent article. https://www.linkedin.com/pulse/ai-safety-user-just-model-jonathan-kreindler-v6h3c
The net-net is that safety has to shift from policing outputs to tracking how the interaction is affecting the user.
I resonate with what you wrote about the complexities of AI safety, and your firsthand account offers such a trully vital perspective on content moderation. It makes me wonder if the deeper issue isn't just about 'erotica' itself, but about managing the profound emotional contection users can form with AI in any deeply personal interaction.
Yeah I agree that erotica is ultimately just one narrow slice of a bigger topic, re: emotional attachment between people and AI. I have a bunch of complicated thoughts here; might make for a good "here are a bunch of things I believe" post in the near future!
"a bunch of things I believe" posts from people with unique first-hand experience are underrated, imho. Doesn't even need arguments attached, per se; it's just interesting data.
I would be very excited if we had a way to slow down, if governments/companies decided they want to! Today I don't think we have a great form of this - it would be vulnerable to other parties lying about slowing down and instead defecting on those who did actually slow their development. That's where some sort of verifiable agreement comes into play; you need to be able to confirm that even parties you mistrust have in fact slowed their pace