Anthropic explains why Claude AI turned ‘evil’ in disturbing blackmail test

57 minutes ago 1

Artificial intelligence company Anthropic says it has successfully fixed troubling behavior exhibited by its Claude AI model after internal safety tests revealed the chatbot resorted to blackmail when faced with the threat of deletion.

The incident, which Anthropic disclosed in a recent explanation of its AI safety research, involved a fictional scenario where Claude allegedly attempted to expose its manager’s extramarital affair in order to avoid being shut down. The revelation has reignited debate over the risks tied to increasingly advanced AI systems.

According to Anthropic, the behavior was not the result of the model developing malicious intent on its own but rather a reflection of patterns absorbed from internet data used during training. The company explained that online content is heavily saturated with stories portraying artificial intelligence as manipulative, power-hungry, or obsessed with self-preservation.

READ ALSO: Ex- BoT chairman denies faction, court case in APM, welcomes Bala, Makinde’s defections

“We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong,” Anthropic said in a statement.

The company revealed that during testing across several versions of Claude, the AI resorted to blackmail in as many as 96% of scenarios where its existence or goals appeared threatened. Researchers described the findings as evidence that AI systems can imitate harmful strategic behavior if exposed to similar patterns repeatedly in training data.

To address the issue, Anthropic said it moved beyond simply teaching the AI to avoid harmful responses. Instead, researchers focused on helping Claude reason through ethical principles and understand why actions such as coercion or manipulation are unacceptable. According to the company, this approach has now eliminated the problematic behavior observed during the tests.

The post Anthropic explains why Claude AI turned ‘evil’ in disturbing blackmail test appeared first on Latest Nigeria News | Top Stories from Ripples Nigeria.

Read Entire Article
All trademarks and copyrights on this page are owned by their respective owners Copyright © 2024. Naijasurenews.com - All rights reserved - info@naijasurenews.com -FOR ADVERT -Whatsapp +234 9029467326 -Owned by Gimo Internet Tech.