Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

(arxiv.org)

39 points | by anigbrowl 8 hours ago ago

16 comments