This study investigates how hermeneutical injustices can become encoded in the Reinforcement Learning from Human Feedback processes used to fine-tune large language models (LLMs). While current research on fairness in LLMs has focused on bias and fairness, there remains a signifi
...
This study investigates how hermeneutical injustices can become encoded in the Reinforcement Learning from Human Feedback processes used to fine-tune large language models (LLMs). While current research on fairness in LLMs has focused on bias and fairness, there remains a significant gap concerning subtler harms such as hermeneutical injustice. Using adults diagnosed with ADHD as a case study, this research explores how their unique communication and cognitive patterns may be misrepresented or excluded from the RLHF pipeline. The research adopts a qualitative literature review methodology, focusing specifically on real-world RLHF implementations by AI companies. The RLHF pipeline was divided into stages of human feedback collection, reward modeling, and policy optimization. Then, these stages of the RLHF were analyzed through the lens of hermeneutical injustice using interpretive desiderata: representation, flexibility, and authenticity. The findings highlight several conceptual risks. Limited annotator diversity and restrictive feedback formats may exclude neurodivergent voices. Reward models can unintentionally suppress atypical expressions, while policy optimization strategies, especially those prone to mode collapse, can erase some communication styles. Overall, the study shows that without deliberate attention to epistemic inclusion, RLHF processes may perpetuate hermeneutical injustices and undermining the epistemic fairness of LLMs.