C. Hao
Please Note
16 records found
1
fication and evaluation of responsible research checklists impose a significant burden on
reviewers. This study investigates the ability of Large Language Models (LLMs) to au-
tomatically classify research papers as empirical, theoretical, or hybrid, and to extract
checklist compliance data. Using a dataset of publicly available NeurIPS papers, we
designed an automated pipeline and evaluated its outputs against a human-annotated
ground truth. Our results demonstrate that the LLM achieves high accuracy in the
core classification task, reliably distinguishing the papers core methodology by iden-
tifying clear structural indicators like mathematical proofs and benchmark datasets.
Furthermore, the model excels at extracting objective checklist elements, performing
well on close-ended extraction tasks that rely on clear structural indicators. However,
performance noticeably decreased on structurally scattered or subjective criteria, such
as broader impacts and the declaration of AI usage. This drop highlights a limitation in
the model’s broader reading comprehension, as it struggles to merge contextual infor-
mation without explicit headers. Notably, this automated failure closely mirrors human
task ambiguity, as these exact subjective items also generated the lower inter-annotator
agreement among human annotators. Conclusively, while LLMs provide a highly con-
sistent baseline for classifying paper typologies and extracting explicit methodological
data, their reliance on structural cues indicates they should serve as assistive screening
tools rather than autonomous evaluators in academic peer review. ...
fication and evaluation of responsible research checklists impose a significant burden on
reviewers. This study investigates the ability of Large Language Models (LLMs) to au-
tomatically classify research papers as empirical, theoretical, or hybrid, and to extract
checklist compliance data. Using a dataset of publicly available NeurIPS papers, we
designed an automated pipeline and evaluated its outputs against a human-annotated
ground truth. Our results demonstrate that the LLM achieves high accuracy in the
core classification task, reliably distinguishing the papers core methodology by iden-
tifying clear structural indicators like mathematical proofs and benchmark datasets.
Furthermore, the model excels at extracting objective checklist elements, performing
well on close-ended extraction tasks that rely on clear structural indicators. However,
performance noticeably decreased on structurally scattered or subjective criteria, such
as broader impacts and the declaration of AI usage. This drop highlights a limitation in
the model’s broader reading comprehension, as it struggles to merge contextual infor-
mation without explicit headers. Notably, this automated failure closely mirrors human
task ambiguity, as these exact subjective items also generated the lower inter-annotator
agreement among human annotators. Conclusively, while LLMs provide a highly con-
sistent baseline for classifying paper typologies and extracting explicit methodological
data, their reliance on structural cues indicates they should serve as assistive screening
tools rather than autonomous evaluators in academic peer review.
Analysis of results in the ML research field
Investigating the Efficacy of LLMs in Extracting Stated Research Limitations
Analysis of Results in the ML Research Field
How well can an LLM decide the reproducibility of a paper?
Large Language Models for Reviewing Research Papers
Evaluating Claim-Level Completeness in Machine Learning Research
ACT-R in the military
A systematic review of Adaptive Control of Thought - Rational, a cognitive architecture in the military
This paper provides an overview into the use of ACT-R as a cognitive architecture in the military. ACT-R stands for Adaptive Control of Thought - Rational. It is a cognitive architecture, a framework for a human like AI program, that models the human mind. In this paper its use will be examined in the military. Through this literary survey an overview will be created of the military’s usage of ACT-R. The overview will answer the questions in which applications the military uses ACT-R and why they use ACT-R. It will bring understanding to the people of how ACT-R is used in the military. It will also give insight into where their tax money is being spent on. For the military an overview will come in handy in case ACT-R gets outdated. They will know what programs will need an update. The overview consists of three parts. A robotics operator manager, a test to determine the value of an officer managing multiple robots. The creation of intelligent tutoring systems for ship navigation and aircraft recognition. A supporting tool for analysts to help determine the value of information. ...
This paper provides an overview into the use of ACT-R as a cognitive architecture in the military. ACT-R stands for Adaptive Control of Thought - Rational. It is a cognitive architecture, a framework for a human like AI program, that models the human mind. In this paper its use will be examined in the military. Through this literary survey an overview will be created of the military’s usage of ACT-R. The overview will answer the questions in which applications the military uses ACT-R and why they use ACT-R. It will bring understanding to the people of how ACT-R is used in the military. It will also give insight into where their tax money is being spent on. For the military an overview will come in handy in case ACT-R gets outdated. They will know what programs will need an update. The overview consists of three parts. A robotics operator manager, a test to determine the value of an officer managing multiple robots. The creation of intelligent tutoring systems for ship navigation and aircraft recognition. A supporting tool for analysts to help determine the value of information.
Modeling Episodic Memory in Cognitive Architectures
A Comparative Study of Soar and Xapagy
An Analysis of ACT-R and CLARION Representing Heuristic Strategies for Consumer Decision-Making
A Systematic Literature Review
By developing a turn-based strategy game and evaluating Hierarchical Reinforcement Learning (HRL) agents of varying complexity, I assessed both their behavioural similarity to human players and how believable they were perceived to be by human players. This research introduces a new approach for understanding player behaviour using behaviour vectors composed of three high-level metrics—Aggressiveness, Management, and Exploration—consistent with existing literature. These metrics are designed to be broadly applicable across strategy games, enabling consistent comparison between human and AI opponents, as well as across different games and agents. The findings demonstrate that while HRL agents can replicate human-like playstyles without using human training data, players judge human-likeness more on perceived intelligence and fairness. This suggests that creating truly human-like AI opponents requires not just replicating human game-level playstyles, but designing agents that align with players' expectations for intelligent and fair decision-making. ...
By developing a turn-based strategy game and evaluating Hierarchical Reinforcement Learning (HRL) agents of varying complexity, I assessed both their behavioural similarity to human players and how believable they were perceived to be by human players. This research introduces a new approach for understanding player behaviour using behaviour vectors composed of three high-level metrics—Aggressiveness, Management, and Exploration—consistent with existing literature. These metrics are designed to be broadly applicable across strategy games, enabling consistent comparison between human and AI opponents, as well as across different games and agents. The findings demonstrate that while HRL agents can replicate human-like playstyles without using human training data, players judge human-likeness more on perceived intelligence and fairness. This suggests that creating truly human-like AI opponents requires not just replicating human game-level playstyles, but designing agents that align with players' expectations for intelligent and fair decision-making.
What if fanfiction, but also coding: Investigating cultural differences in fanfiction writing and reviewing with machine learning methods
How has the portrayal of female characters in fanfiction evolved in response to the #MeToo movement and fourth-wave feminism, as analyzed with the help of NLP techniques?
Exploring Genre Preferences and Audience Engagement in Multilingual Fanfiction
A Study of Popularity and Preferences
The impact of emotional journeys on fanfiction popularity
A computational analysis of linear correlations between emotional behavior and popularity
What if fanfiction, but also coding: Investigating cultural differences in fanfiction writing and reviewing with machine learning methods
Fine Tuning a BERT-based Pre-Trained Language Model for Named Entity Extraction within the Domain of Fanfiction
Visualizing Collaboration with Superstars
A Novel Approach to Visualizing Collaboration
Independent Thinkers and Scientific Progress
An Analysis of Superstar Influence on Computer Science Research Dynamics