LK

Lea Krause

1 records found

Helpful, harmless, honest?

Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback

This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF ...