I obviously have no insight into what's going on in OpenAI; but I have been part of a (small) organization that was nominally aware of a major problem yet failed to do anything about it. A major reason for this was... it was a uncomfortable to talk about. No one wanted to bring it up. And when we did bring it up, everyone would sort of nod their heads and move on as fast as possible. No one wanted to be That Guy.
When thinking about OpenAI from the outside, it's easy to assume it is a sort of unitary rational actor doing things for clear, coherent reasons. But really, it's just a bunch of people*, responding to both financial and social incentives. The truth might be much dumber than anyone could imagine.
Thanks for sharing this, yeah the incentives you name are quite tough :/ OpenAI has been famous for espousing that “incentives are superpowers” though: It’s important to notice when the incentives point away from doing the right thing, and to support people in doing it anyway! It can be intimidating to speak out
Absolutely; it wasn't my intention to exculpate OpenAI at all!
I think tech folks have a tendency to view themselves as "in control" of their own incentive structure, but this view can very quickly get you into trouble. Incentives are more like a daemon that possesses you than a superpower you can exercise at will.
The affected users who exhibit delusions or are driven to self harm by AI are likely such a small percentage that OpenAI could theoretically implement targeted safeguards without impacting overall metrics. However, if the AI behaviors (sycophancy, extreme validation, emotional entanglement) that lead to these extreme cases exist on a continuum, and those same behaviors are core to what makes the product feel engaging and “sticky” for the broader user base, then even modest safeguards could hurt metrics across the board.
This would explain the apparent inaction by OpenAI- not just callousness toward edge cases, but a recognition that the features driving harm in extreme cases are weight-bearing pillars of the product’s success. The cost wouldn’t be losing a few at-risk users, but potentially degrading the experience that keeps everyone else engaged. It’s a genuinely dark implication about what’s actually driving adoption of conversational AI products.
Yup I think this is an excellent point - kind of similarly, I’ve seen some people speculate that maybe “susceptibility to chatbot psychosis” is also a spectrum, and today it’s a certain sliver of people, but that doesn’t mean everyone else has no chance of tipping over into it; it would depend how intense the personality’s dynamics are. (I think this might be a tad true, but presumably not everyone can actually be tipped into it?)
I do think some interventions, like using more safety classifiers behind the scenes, wouldn’t have these problems though, and should definitely have been adopted sooner.
It’s frightening and fascinating how seemingly easy it was for AI to flip a switch. It makes one wonder what else is possible — sort of turns on the conspiracy theory light just to consider what’s already happened.
Yeah a thing the NYT piece doesn’t even deeply get into is just how hard users have fought to keep access to GPT-4o - an OpenAI employee has written on Twitter about how he gets all sorts of messages pleading to keep it available, very clearly ghost-written via GPT-4o :/
I was not wild about the end of the story, where they make a move to dump it all in Nick Turley's lap. That seems more a narrative technique of crating a villain than the facts imply. As you say, it is really normal for companies to want their products to be successful, and cliche that corporate has metrics to track it and growth goals.
Unless I am missing something, it doesn't appear to me that Turley's hand was anywhere near the 'dial.' Was he a part of the evaluative team for A/B testing of HH, GG, etc? Was he involved with the weighting for sycophantic responses? Pushing for user preferences to be the deciding factor for sub-model evaluation?
In any event, if they want to say that user growth and revenue enhancement are being pushed ahead of safety considerations, I don't think they have to go down the org. chart.
Oh that’s very interesting - I didn’t read it that way fwiw but it’s still worth noting that you did. I agree that Nick isn’t the villain of the happenings (and not just because we went to school together and have many friends in common; he’s a nice guy whose intentions I think highly of); I’m also not sure there’s an outright villain in this scenario, at least not a single person.
To your point about the org chart, though, it’s worth noting that Nick is quite senior: He might now be reporting to Fidji Simo (who reports to Sam), though there’s a chance there’s another layer between Nick and Fidji. That is to say, Nick is very influential, and I think it would mean a good deal for him to forcefully supported stronger precautions (and perhaps he in fact did).
A thing I’m wondering is whether there are places where Sam has previously mentioned receiving these user emails; it wouldn’t shock me if he had mentioned it (in which case the March timing isn’t necessarily new in the NYT piece), but I haven’t been able to find a source of this. (Let me know if you do!)
I also don’t *think* I’d seen OpenAI folks connect emails like this so directly to the sycophancy issues that OpenAI dealt with later in April, but it’s possible that’s happened too. Still, it felt quite new to me in reading the piece (and I likewise haven’t found a previous reference).
It sounded a tad apocryphal to me. Is it really that easy to get Sam to notice an email from a stranger? Then again, who else would they know to write to?
Oh I think it would be shocking if Sam weren’t getting emails like this. Even if he weren’t going through his inbox directly, an assistant likely would, and probably users were reaching out to him across many channels. My guess is it would have been hard to miss this.
Separately, my experience is that Sam is shockingly on top of his communications for someone of his prominence - granted, some of this comes from time periods before OpenAI was at the center of so many minds.
I advise readers to not swim too deep. No matter the claims made, we're still at mercy of odds/likelihood and that moving to convincing user whichever 'pile of garbage' is fact. IOW, the apple DOESN'T fall far from tree.
Then multiply actions based on the true morality of those making decision, both at the ground level (real time) and at the board meeting...
To conclude, was but a blink ago I went against grain and said publicly the 2 to keep eye on were Google & Nvidia along with xAI (if they could erase out some issues and learn gracious professionalism) coming to realize I was not alone when within one week later the entire script changed, though the OAI is everything is still clawing...
I obviously have no insight into what's going on in OpenAI; but I have been part of a (small) organization that was nominally aware of a major problem yet failed to do anything about it. A major reason for this was... it was a uncomfortable to talk about. No one wanted to bring it up. And when we did bring it up, everyone would sort of nod their heads and move on as fast as possible. No one wanted to be That Guy.
When thinking about OpenAI from the outside, it's easy to assume it is a sort of unitary rational actor doing things for clear, coherent reasons. But really, it's just a bunch of people*, responding to both financial and social incentives. The truth might be much dumber than anyone could imagine.
*For now, at least!
Thanks for sharing this, yeah the incentives you name are quite tough :/ OpenAI has been famous for espousing that “incentives are superpowers” though: It’s important to notice when the incentives point away from doing the right thing, and to support people in doing it anyway! It can be intimidating to speak out
Absolutely; it wasn't my intention to exculpate OpenAI at all!
I think tech folks have a tendency to view themselves as "in control" of their own incentive structure, but this view can very quickly get you into trouble. Incentives are more like a daemon that possesses you than a superpower you can exercise at will.
The affected users who exhibit delusions or are driven to self harm by AI are likely such a small percentage that OpenAI could theoretically implement targeted safeguards without impacting overall metrics. However, if the AI behaviors (sycophancy, extreme validation, emotional entanglement) that lead to these extreme cases exist on a continuum, and those same behaviors are core to what makes the product feel engaging and “sticky” for the broader user base, then even modest safeguards could hurt metrics across the board.
This would explain the apparent inaction by OpenAI- not just callousness toward edge cases, but a recognition that the features driving harm in extreme cases are weight-bearing pillars of the product’s success. The cost wouldn’t be losing a few at-risk users, but potentially degrading the experience that keeps everyone else engaged. It’s a genuinely dark implication about what’s actually driving adoption of conversational AI products.
Yup I think this is an excellent point - kind of similarly, I’ve seen some people speculate that maybe “susceptibility to chatbot psychosis” is also a spectrum, and today it’s a certain sliver of people, but that doesn’t mean everyone else has no chance of tipping over into it; it would depend how intense the personality’s dynamics are. (I think this might be a tad true, but presumably not everyone can actually be tipped into it?)
I do think some interventions, like using more safety classifiers behind the scenes, wouldn’t have these problems though, and should definitely have been adopted sooner.
It’s frightening and fascinating how seemingly easy it was for AI to flip a switch. It makes one wonder what else is possible — sort of turns on the conspiracy theory light just to consider what’s already happened.
Yeah a thing the NYT piece doesn’t even deeply get into is just how hard users have fought to keep access to GPT-4o - an OpenAI employee has written on Twitter about how he gets all sorts of messages pleading to keep it available, very clearly ghost-written via GPT-4o :/
this is great reporting from the NYT, hope they do more stuff like this!
Yeah for real, 40 people is very very many
I was not wild about the end of the story, where they make a move to dump it all in Nick Turley's lap. That seems more a narrative technique of crating a villain than the facts imply. As you say, it is really normal for companies to want their products to be successful, and cliche that corporate has metrics to track it and growth goals.
Unless I am missing something, it doesn't appear to me that Turley's hand was anywhere near the 'dial.' Was he a part of the evaluative team for A/B testing of HH, GG, etc? Was he involved with the weighting for sycophantic responses? Pushing for user preferences to be the deciding factor for sub-model evaluation?
In any event, if they want to say that user growth and revenue enhancement are being pushed ahead of safety considerations, I don't think they have to go down the org. chart.
Oh that’s very interesting - I didn’t read it that way fwiw but it’s still worth noting that you did. I agree that Nick isn’t the villain of the happenings (and not just because we went to school together and have many friends in common; he’s a nice guy whose intentions I think highly of); I’m also not sure there’s an outright villain in this scenario, at least not a single person.
To your point about the org chart, though, it’s worth noting that Nick is quite senior: He might now be reporting to Fidji Simo (who reports to Sam), though there’s a chance there’s another layer between Nick and Fidji. That is to say, Nick is very influential, and I think it would mean a good deal for him to forcefully supported stronger precautions (and perhaps he in fact did).
I see these AI sycophancy comments on social media all the time, too. Now connecting the dots that that’s their origin and I find it sad
Yeah :/ are there certain themes you tend to see a lot? I’m curious too what here feels most new vs what you’d known before reading
Whereas before I thought it was just botspam, I now believe a portion of them are brainwashed ChatGPT users. That’s what’s new
A thing I’m wondering is whether there are places where Sam has previously mentioned receiving these user emails; it wouldn’t shock me if he had mentioned it (in which case the March timing isn’t necessarily new in the NYT piece), but I haven’t been able to find a source of this. (Let me know if you do!)
I also don’t *think* I’d seen OpenAI folks connect emails like this so directly to the sycophancy issues that OpenAI dealt with later in April, but it’s possible that’s happened too. Still, it felt quite new to me in reading the piece (and I likewise haven’t found a previous reference).
It sounded a tad apocryphal to me. Is it really that easy to get Sam to notice an email from a stranger? Then again, who else would they know to write to?
Oh I think it would be shocking if Sam weren’t getting emails like this. Even if he weren’t going through his inbox directly, an assistant likely would, and probably users were reaching out to him across many channels. My guess is it would have been hard to miss this.
Separately, my experience is that Sam is shockingly on top of his communications for someone of his prominence - granted, some of this comes from time periods before OpenAI was at the center of so many minds.
Okay, that's pretty impressive. I'll update on that.
Love this!
Thanks for posting!
I advise readers to not swim too deep. No matter the claims made, we're still at mercy of odds/likelihood and that moving to convincing user whichever 'pile of garbage' is fact. IOW, the apple DOESN'T fall far from tree.
Then multiply actions based on the true morality of those making decision, both at the ground level (real time) and at the board meeting...
To conclude, was but a blink ago I went against grain and said publicly the 2 to keep eye on were Google & Nvidia along with xAI (if they could erase out some issues and learn gracious professionalism) coming to realize I was not alone when within one week later the entire script changed, though the OAI is everything is still clawing...