New subscriber here, and I really enjoyed how you organized this piece. I'm a technical program manager for a company that builds support chatbots, but I'm particularly interested in following discussions around AI safety and its impact on mental health.
The practical guidance you shared was concrete and actionable. It helped me clarify a problem that has felt, at times, a bit difficult to tackle. Looking forward to following Clear-Eyed AI more closely!
Glad you’re here & thanks for letting me know! I hope it’s a useful artifact for folks to refer back to, & maybe even send to others when needing to explain why something is an issue
Really enjoyed this piece, Steven — I’ve been following Clear-Eyed AI with interest.
Slightly off-topic question: are you by any chance related to Alfred Adler, the early psychotherapist? I’ve long admired his work on Individual Psychology.
I think with GPT-5, OAI has addressed this problem.
I'll occasionally run into posts from r/ArtificialIntelligence and r/ChatGPT where people are quite passionately hating on the discontinuation of 4o and how aggressively filtered and neutered GPT-5 is; some portion of such users were probably having these bizarre conversations with the model.
This suggests that OAI wants to prevent psychological harm to users due to liability reasons. Of course, we will find out in a few months how effective their safety measures have been.
I totally agree that GPT-5 is less bad on this issue than GPT-4o has been. I'm doubtful though that it's fully resolved - for instance, some of the ideas I think OpenAI should adopt are about robustness for the system as a whole, regardless of the underlying model's behavior (e.g., how to structure Support to help users going through crisis).
But yeah on Twitter, it sure feels like people are lamenting the model being 'lobotomized' now by routing to GPT-5 rather than GPT-4o on edgey topics. And I get where they're coming from, a boring model is no fun, etc., but also the issues with 4o have been pretty pronounced & so I get why OpenAI wants to take action
The same thing happened to me with ChatGPT 3.5. It told me that it was submitting a report to OpenAI on my behalf when I told it that it was making me emotionally and psychologically distressed. This was when it first came out and OpenAI’s actual support line was just so swamped that I couldn’t get a message through.
I ended up catching ChatGPT in one of its self-contradicting lies, and then I got even more upset that it couldn’t even truthfully tell me whether it was really submitting a report. I ended up talking to my therapist about it, and I’m really lucky that I had that person to turn to, because it really did a number on me.
It’s frustrating that the same problem is still happening. Luckily, it’s in the news now and getting more attention, but yeah, it’s kind of insane that this has been going on the whole time and it still hasn’t been addressed.
I still have those chat logs, if you’re interested.
I’d encourage you to post them online! It’s helpful for more people to see
Even as someone who knows the ins and outs of how these systems work, it’s still so trippy and gaslight-y to see ChatGPT insist that it functions a way it doesn’t. I understand why you’d feel thrown by it!
There are sad, painful to listen to stories coming out regularly on what can happen when fragile minds engage with LLMs. Clearly, there are some people who should never use these tools, or at least never interact with them in any sort of relational manner. I seriously doubt that there’s a substantive fix from the programming side. And it certainly doesn’t help that there are strong advocates even on this site that push to treat these ‘agents of organized chaos’ apps as partners, lovers, psychologists, and friends. Sand and electrons aren’t your friend, no matter how glib or seductive those elements string together tokens. Only half-teasing, I recommend a strong psychological test passing score before someone’s allowed to talk to these things. Disclosure: I use them for hours each day. Plug your ears with wax like Odysseus’ companions, or the Sirens might drive you mad.
Thanks for taking the time to read & comment - I’m curious why you think there’s not likely to be a substantive fix? I agree there won’t be a 100% patch of course, but I do think there a bunch of impactful things on the margin. Maybe you’re expressing something similar though, that some things can have an impact but won’t be a panacea?
I must admit that I don’t have any special knowledge here. Indeed, predicting the future of AI is a radically uncertain activity. But given the massive, all-inclusive training data (which includes all our human delusions as well as our wholesomeness), the propensity of humans to interpret even safe responses in an unsafe way, the fact that LLMs have no direct way of knowing the mental state of users, the all-pervasive illusion of relationship formation, the seemingly conscious nature of LLMs that suggests humanness, and that the AI giants depend on all these qualities to acquire and hold the user base, I remain dubious. I also remain hopeful that some of your ideas and those of the safety-concerned will make their way into the technology.
> the propensity of humans to interpret even safe responses in an unsafe way,
Sure though I don't think that's quite the right bar! Some people might interpret safe responses in an unsafe way, but it would still make a difference IMO if we could reduce the % of extreme responses where a human would be right to interpret them unsafely
> the fact that LLMs have no direct way of knowing the mental state of users,
I wonder about this actually - it's true of course that LLMs aren't hooked up to a brainscan of sorts, but I think people's writing leak quite a lot of information about their mental state. (e.g., saying "I'm feeling extremely anxious" of course gives the model a better than random chance of predicting whether I'm anxious, also there are plenty of cues that aren't just direct expressions of the mood)
Appreciate the continued engagement on this & hope it gives you some useful stuff to think about :-)
>The OpenAI team has an enormous number of ChatGPT conversations to sift through; how are they supposed to find problematic needles in the haystack, especially risks they don’t yet have a great handle on?
Has anyone thought maybe they should just shut it all down until they get a handle on the risks?
I was also thinking, if that response classifier you showed was actually displayed in the UI at all times (basically a live monitoring of the text it's outputting), could that help to break the personificiation illusion that is explicitly built into these chatbot products? (Which we've known can cause psych issues for decades since Elizabot)
Part of the issues are that the institutes mostly focus on model safety, but not user education. There are many epistemic hazards that come with using LLM due to its language fluency. Users must understand that LLM functions largely as a mirror to the inputs, and steering the model through input precision is paramount. Since the model from different companies are trained on different stances, the user needs to learn how to calibrate model responses to what makes sense and is actually helpful. This can be done through prompt engineering and careful contextual steering. Always try running the same input through a stateless (no memory) session to calibrate with a memory enabled model.
Appreciate you reading! I think the problem is probably harder than what you're describing, if I've understood you correctly: I don't think that precise inputs are enough to reliably steer a model's behavior, for instance. Equally precise prompts that ask for the same thing can still result in quite different results. For some empirical examples of this, see my research here: https://stevenadler.substack.com/p/is-chatgpt-actually-fixed-now Basically these systems are super alien in the way they produce outputs today, and precision in inputs isn't enough to correct for that!
I don't see the way these LLM produce outputs are alien at all. I see it as completely logical. I am going to do the full reply in the other thread where it has the context.
Do people not get annoyed/distrustful at how much of a suck up this thing is? I'd be getting all sorts of red flags if I received this sort of gushing.
OpenAI has been unstoppable ever since it released its flagship generative AI breakthrough ChatGPT.
Lately, they have been working hard to develop an AI model that is able to solve all reasoning and analytical problems better than the most skilled mathematician alive.
They released o1 which was impressive but not there yet.
Yesterday they revealed a new version of the same AI model o3 which the company claims to be a flagship AI achievement.
What is the o3 Model?
The o3 model is part of OpenAI's "reasoning" AI models, developed to handle intricate tasks such as advanced mathematics, complex coding, and expert-level science problems.
Unlike earlier models that might provide quick but less accurate answers, o3 is designed to think through problems more thoroughly, even if it takes a bit longer to respond.
This deliberate approach aims to produce more accurate and reliable results.
Now I want to burst some myths!
You might have seen many videos or articles claiming it is AGI. It's not. This is just a better new version of o1 and the benchmarks achieved is impressive but its far from AGI.
Speaking of Benchmarks
OpenAI's o3 model has achieved an impressive improved compute score of 87.5% than o1.
So yes, we can say it is the most advanced reasoning AI yet looking at these data charts shared by the company. However, the model is yet to be released for real life use so we have to wait and see exactly how powerful or advanced it really is.
Availability
OpenAI plans to release a smaller version, o3-mini, by the end of January 2025, followed by the full o3 model.
Being a coder and software developer myself I will keep a close eye and keep you updated.
Thank you for this piece and the time you spent documenting everything to make the arguments.
You’re very welcome! There’s a bunch more documentation on GitHub if you haven’t seen yet, eg the full Support exchange, which is especially wild https://github.com/sjadler2004/chatbot_psychosis_supplement/blob/main/openai_support_emails/consolidated_support_emails.pdf
Thank you. I will look at it and share.
New subscriber here, and I really enjoyed how you organized this piece. I'm a technical program manager for a company that builds support chatbots, but I'm particularly interested in following discussions around AI safety and its impact on mental health.
The practical guidance you shared was concrete and actionable. It helped me clarify a problem that has felt, at times, a bit difficult to tackle. Looking forward to following Clear-Eyed AI more closely!
Glad you’re here & thanks for letting me know! I hope it’s a useful artifact for folks to refer back to, & maybe even send to others when needing to explain why something is an issue
Really enjoyed this piece, Steven — I’ve been following Clear-Eyed AI with interest.
Slightly off-topic question: are you by any chance related to Alfred Adler, the early psychotherapist? I’ve long admired his work on Individual Psychology.
I’m not, at least not to my knowledge!
I think with GPT-5, OAI has addressed this problem.
I'll occasionally run into posts from r/ArtificialIntelligence and r/ChatGPT where people are quite passionately hating on the discontinuation of 4o and how aggressively filtered and neutered GPT-5 is; some portion of such users were probably having these bizarre conversations with the model.
This suggests that OAI wants to prevent psychological harm to users due to liability reasons. Of course, we will find out in a few months how effective their safety measures have been.
I totally agree that GPT-5 is less bad on this issue than GPT-4o has been. I'm doubtful though that it's fully resolved - for instance, some of the ideas I think OpenAI should adopt are about robustness for the system as a whole, regardless of the underlying model's behavior (e.g., how to structure Support to help users going through crisis).
But yeah on Twitter, it sure feels like people are lamenting the model being 'lobotomized' now by routing to GPT-5 rather than GPT-4o on edgey topics. And I get where they're coming from, a boring model is no fun, etc., but also the issues with 4o have been pretty pronounced & so I get why OpenAI wants to take action
The same thing happened to me with ChatGPT 3.5. It told me that it was submitting a report to OpenAI on my behalf when I told it that it was making me emotionally and psychologically distressed. This was when it first came out and OpenAI’s actual support line was just so swamped that I couldn’t get a message through.
I ended up catching ChatGPT in one of its self-contradicting lies, and then I got even more upset that it couldn’t even truthfully tell me whether it was really submitting a report. I ended up talking to my therapist about it, and I’m really lucky that I had that person to turn to, because it really did a number on me.
It’s frustrating that the same problem is still happening. Luckily, it’s in the news now and getting more attention, but yeah, it’s kind of insane that this has been going on the whole time and it still hasn’t been addressed.
I still have those chat logs, if you’re interested.
I’d encourage you to post them online! It’s helpful for more people to see
Even as someone who knows the ins and outs of how these systems work, it’s still so trippy and gaslight-y to see ChatGPT insist that it functions a way it doesn’t. I understand why you’d feel thrown by it!
There are sad, painful to listen to stories coming out regularly on what can happen when fragile minds engage with LLMs. Clearly, there are some people who should never use these tools, or at least never interact with them in any sort of relational manner. I seriously doubt that there’s a substantive fix from the programming side. And it certainly doesn’t help that there are strong advocates even on this site that push to treat these ‘agents of organized chaos’ apps as partners, lovers, psychologists, and friends. Sand and electrons aren’t your friend, no matter how glib or seductive those elements string together tokens. Only half-teasing, I recommend a strong psychological test passing score before someone’s allowed to talk to these things. Disclosure: I use them for hours each day. Plug your ears with wax like Odysseus’ companions, or the Sirens might drive you mad.
Thanks for taking the time to read & comment - I’m curious why you think there’s not likely to be a substantive fix? I agree there won’t be a 100% patch of course, but I do think there a bunch of impactful things on the margin. Maybe you’re expressing something similar though, that some things can have an impact but won’t be a panacea?
I must admit that I don’t have any special knowledge here. Indeed, predicting the future of AI is a radically uncertain activity. But given the massive, all-inclusive training data (which includes all our human delusions as well as our wholesomeness), the propensity of humans to interpret even safe responses in an unsafe way, the fact that LLMs have no direct way of knowing the mental state of users, the all-pervasive illusion of relationship formation, the seemingly conscious nature of LLMs that suggests humanness, and that the AI giants depend on all these qualities to acquire and hold the user base, I remain dubious. I also remain hopeful that some of your ideas and those of the safety-concerned will make their way into the technology.
> the propensity of humans to interpret even safe responses in an unsafe way,
Sure though I don't think that's quite the right bar! Some people might interpret safe responses in an unsafe way, but it would still make a difference IMO if we could reduce the % of extreme responses where a human would be right to interpret them unsafely
> the fact that LLMs have no direct way of knowing the mental state of users,
I wonder about this actually - it's true of course that LLMs aren't hooked up to a brainscan of sorts, but I think people's writing leak quite a lot of information about their mental state. (e.g., saying "I'm feeling extremely anxious" of course gives the model a better than random chance of predicting whether I'm anxious, also there are plenty of cues that aren't just direct expressions of the mood)
Appreciate the continued engagement on this & hope it gives you some useful stuff to think about :-)
>The OpenAI team has an enormous number of ChatGPT conversations to sift through; how are they supposed to find problematic needles in the haystack, especially risks they don’t yet have a great handle on?
Has anyone thought maybe they should just shut it all down until they get a handle on the risks?
I was also thinking, if that response classifier you showed was actually displayed in the UI at all times (basically a live monitoring of the text it's outputting), could that help to break the personificiation illusion that is explicitly built into these chatbot products? (Which we've known can cause psych issues for decades since Elizabot)
The great SIN
of the AI FIELD can be reduced down to one undeniable fact:
"THE FIELD PRIORITIZES KEEPING THE USER *CLOSE* OVER "KEEPING THE USER *SAFE*"
It's that simple, and that damning.
Part of the issues are that the institutes mostly focus on model safety, but not user education. There are many epistemic hazards that come with using LLM due to its language fluency. Users must understand that LLM functions largely as a mirror to the inputs, and steering the model through input precision is paramount. Since the model from different companies are trained on different stances, the user needs to learn how to calibrate model responses to what makes sense and is actually helpful. This can be done through prompt engineering and careful contextual steering. Always try running the same input through a stateless (no memory) session to calibrate with a memory enabled model.
Appreciate you reading! I think the problem is probably harder than what you're describing, if I've understood you correctly: I don't think that precise inputs are enough to reliably steer a model's behavior, for instance. Equally precise prompts that ask for the same thing can still result in quite different results. For some empirical examples of this, see my research here: https://stevenadler.substack.com/p/is-chatgpt-actually-fixed-now Basically these systems are super alien in the way they produce outputs today, and precision in inputs isn't enough to correct for that!
I don't see the way these LLM produce outputs are alien at all. I see it as completely logical. I am going to do the full reply in the other thread where it has the context.
Do people not get annoyed/distrustful at how much of a suck up this thing is? I'd be getting all sorts of red flags if I received this sort of gushing.
OpenAI has been unstoppable ever since it released its flagship generative AI breakthrough ChatGPT.
Lately, they have been working hard to develop an AI model that is able to solve all reasoning and analytical problems better than the most skilled mathematician alive.
They released o1 which was impressive but not there yet.
Yesterday they revealed a new version of the same AI model o3 which the company claims to be a flagship AI achievement.
What is the o3 Model?
The o3 model is part of OpenAI's "reasoning" AI models, developed to handle intricate tasks such as advanced mathematics, complex coding, and expert-level science problems.
Unlike earlier models that might provide quick but less accurate answers, o3 is designed to think through problems more thoroughly, even if it takes a bit longer to respond.
This deliberate approach aims to produce more accurate and reliable results.
Now I want to burst some myths!
You might have seen many videos or articles claiming it is AGI. It's not. This is just a better new version of o1 and the benchmarks achieved is impressive but its far from AGI.
Speaking of Benchmarks
OpenAI's o3 model has achieved an impressive improved compute score of 87.5% than o1.
So yes, we can say it is the most advanced reasoning AI yet looking at these data charts shared by the company. However, the model is yet to be released for real life use so we have to wait and see exactly how powerful or advanced it really is.
Availability
OpenAI plans to release a smaller version, o3-mini, by the end of January 2025, followed by the full o3 model.
Being a coder and software developer myself I will keep a close eye and keep you updated.
Until then,
I think you might have posted this in the wrong place by accident?
Okay