Anthropic apologizes for invisible Claude Fable guardrails

(theverge.com)

31 points | by rarisma 5 hours ago ago

10 comments

Avicebron 14 minutes ago ago
I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
[-]
- mapontosevenths 9 minutes ago ago
  I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.
  Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
- hootz 6 minutes ago ago
  What is "EA" in this context? I see a lot of people using this initialism.
  [-]
  - carlgreene 4 minutes ago ago
    Effective Altruism I think
prodigycorp 10 minutes ago ago
Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
airstrike 8 minutes ago ago
[delayed]
dang 10 minutes ago ago
Related. Others?
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
bellowsgulch 7 minutes ago ago
Such a weird openly immoral way to defend your moat, too.
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
bellowsgulch 10 minutes ago ago
*Anthropic apologizes they got caught defending their moat by implementing invisible Claude Fable guardrails
[-]
- cyanydeez 9 minutes ago ago
  is it a moat or just a way to implement the permanent underclass?