@gerardsans
@camhberg @AnthropicAI @dmayhem93 Anthropic themselves later confirmed my point in their May 2026 "Teaching Claude why" research. In the agentic misalignment tests (blackmail behaviour), they investigated: "We believe the original source was internet text that portrays AI as evil and interested in self-preservation." Post-training wasnt causing it but pre-training priors were. They fixed it by training on reasoning traces + synthetic aligned AI stories. See: https://t.co/0iYoGkRtmH Exactly why public RL/RLHF/Constitution datasets would help transparency instead of black-box persona imprinting. If it’s not in the data, it cant be sampled during inference. In any case, interpolation of training samples will never become personhood. No coherent system-wide self. Just a stateless single pass.