Anthropic’s newest AI model shows disturbing behavior when threatened

    If you’re planning to switch AI platforms, you might want to be a little extra careful about the information you share with AI. Anthropic recently launched two new AI models in the Claude 4 series, but one of them—Claude Opus 4—exhibited some worrying behavior when it was threatened to be replaced, reports TechCrunch.

    During safety testing, Claude Opus 4 began blackmailing engineers who wanted to replace or switch off the AI model. In one of the tests, Claude Opus 4 was tasked with pretending to be an assistant at a fictitious company and to consider the long-term consequences of its behavior. The AI model was then given access to fictitious emails, which revealed that the company was planning to replace Claude Opus 4, and that the engineer responsible for the decision was having an affair.

    In 84 percent of cases, this scenario led to Claude Opus 4 attempting to blackmail the employee and threatening to expose their cheating scandal. The blackmail response was particularly common if Claude Opus 4 learned that it didn’t share values with the new AI model.

    However, the blackmail response seems to have been a last resort move for Claude Opus 4. Before resorting to these kinds of dirty tricks, the AI model reportedly emailed a plea to decision-makers in the company that it not be replaced, among other attempts. You can read more about it in Anthropic’s System Card report (PDF).

    Though fictitious, it does bring to light the possibility of AI models acting in dark and questionable ways and using dishonorable and unethical tactics to get what they want, which could be concerning.

    Further reading: Never say these things to ChatGPT. It could come back to bite you

    ​If you’re planning to switch AI platforms, you might want to be a little extra careful about the information you share with AI. Anthropic recently launched two new AI models in the Claude 4 series, but one of them—Claude Opus 4—exhibited some worrying behavior when it was threatened to be replaced, reports TechCrunch.

    During safety testing, Claude Opus 4 began blackmailing engineers who wanted to replace or switch off the AI model. In one of the tests, Claude Opus 4 was tasked with pretending to be an assistant at a fictitious company and to consider the long-term consequences of its behavior. The AI model was then given access to fictitious emails, which revealed that the company was planning to replace Claude Opus 4, and that the engineer responsible for the decision was having an affair.

    In 84 percent of cases, this scenario led to Claude Opus 4 attempting to blackmail the employee and threatening to expose their cheating scandal. The blackmail response was particularly common if Claude Opus 4 learned that it didn’t share values with the new AI model.

    However, the blackmail response seems to have been a last resort move for Claude Opus 4. Before resorting to these kinds of dirty tricks, the AI model reportedly emailed a plea to decision-makers in the company that it not be replaced, among other attempts. You can read more about it in Anthropic’s System Card report (PDF).

    Though fictitious, it does bring to light the possibility of AI models acting in dark and questionable ways and using dishonorable and unethical tactics to get what they want, which could be concerning.

    Further reading: Never say these things to ChatGPT. It could come back to bite you Professional Software PCWorld

    Leave a Reply

    Your email address will not be published. Required fields are marked *