E. Liscio
Please Note
26 records found
1
toxicity types they most consistently rated down (p < 10−3 under a profile-shuffle null on every module), and replacing the per-user weighting with uniform weights significantly worsens fit on both geometric matchers (Wilcoxon p < 10−3). Because the effect is peruser, it surfaces on a per-user-sensitive measure (a boundary-violation rate, p < 10−3) rather than on aggregate mean error, which averages the per-user differences away. The next step is therefore per-usersensitive evaluation, not retraining. ...
toxicity types they most consistently rated down (p < 10−3 under a profile-shuffle null on every module), and replacing the per-user weighting with uniform weights significantly worsens fit on both geometric matchers (Wilcoxon p < 10−3). Because the effect is peruser, it surfaces on a per-user-sensitive measure (a boundary-violation rate, p < 10−3) rather than on aggregate mean error, which averages the per-user differences away. The next step is therefore per-usersensitive evaluation, not retraining.
STEER-Away
Personalized Safety Alignment via Logit Steering
Personalized Pre-Decoding Alignment for Training-Free Toxicity Reduction
Comparing URIAL and PBPO-Lite on PRISM User Prompts Without Fine-Tuning
Personalised Classifier-Guided Decoding
Steering LLM Toxicity Along User-Specified Directions
Transformer Modules
Transferable & Parameter Efficient LLM Fine Tuning
Using Large Language Models to Detect Deliberative Elements in Public Discourse
Detecting Subjective Emotions in Public Discourse
Still, Large Language Models (LLMs) could be used to detect these subjective emotions using different prompting strategies and labels. The experiment included zero-, one-, fewshot and Chain of Thought (CoT) strategies. The precision was better for the one- and fewshot method compared to zeroshot. The CoT methods also showed an increase in precision, but a decrease in recall. The different labels were hard majority labels, soft labels and hard per annotator labels. In conclusion, providing examples improved the performance of the LLM. The CoT strategies were more precise, but gave a worse general prediction. The hard majority labels allow for more general predictions, where per annotator hard labels capture the perspective of different annotators. Soft labels reflect the subjective nature of the labels by providing probabilities instead of binary classification.
The experiment was done on a small data sample, so it is recommended to try the strategies on a larger data sample. Looking into appropriate evaluations for subjective predictions is also recommended in order to reflect the actual performance better. ...
Still, Large Language Models (LLMs) could be used to detect these subjective emotions using different prompting strategies and labels. The experiment included zero-, one-, fewshot and Chain of Thought (CoT) strategies. The precision was better for the one- and fewshot method compared to zeroshot. The CoT methods also showed an increase in precision, but a decrease in recall. The different labels were hard majority labels, soft labels and hard per annotator labels. In conclusion, providing examples improved the performance of the LLM. The CoT strategies were more precise, but gave a worse general prediction. The hard majority labels allow for more general predictions, where per annotator hard labels capture the perspective of different annotators. Soft labels reflect the subjective nature of the labels by providing probabilities instead of binary classification.
The experiment was done on a small data sample, so it is recommended to try the strategies on a larger data sample. Looking into appropriate evaluations for subjective predictions is also recommended in order to reflect the actual performance better.
Decoding Sentiment with Large Language Models
Comparing Prompting Strategies Across Hard, Soft, and Subjective Label Scenarios
We propose a moral sentence embedding space, which can encompass moral differences, through the state-of-the-art Contrastive Learning framework. We evaluate the moral embedding space both intrinsically and extrinsically via three tasks: classification, moral similarity, and visual analysis. We show that our moral embedding space understands the characteristics of each moral value. Our results also highlight that moral rhetoric is seldom explicit in the text, emphasizing the necessity of additional information such as moral labels. ...
We propose a moral sentence embedding space, which can encompass moral differences, through the state-of-the-art Contrastive Learning framework. We evaluate the moral embedding space both intrinsically and extrinsically via three tasks: classification, moral similarity, and visual analysis. We show that our moral embedding space understands the characteristics of each moral value. Our results also highlight that moral rhetoric is seldom explicit in the text, emphasizing the necessity of additional information such as moral labels.
What would Jiminy Cricket do?
A pluralist approach in generating and processing morally-aligned text
Natural Language Processing and Reinforcement Learning to Generate Morally
What is the optimal weight w to win the games while playing morally?
Balancing multidimensional morality and progression
Evaluating the tradeoff for artificial agents playing text-based games
NLP and reinforcement learning to generate morally aligned text
How does explainable models perform compared to black-box models
Natural Language Processing and Reinforcement Learning to Generate Morally Aligned Text
Comparing a moral agent to an optimally playing agent