PE

Petter Ericson

2 records found

Helpful, harmless, honest?

Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback

This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF ...
This article explores the rapidly developing field of Critical AI Studies and its relation to issues of class and capitalism through a hybrid approach based on distant reading of a newly collected corpus of 300 full-text scientific articles, the creation of which is itself a firs ...