AA

A. Anand

32 records found

This paper investigates the relation between the educational value of input code and the subsequent inference performance of code large language models (LLMs) on completion tasks. Results were attained using The Heap dataset and using SmolLM2, StarCoder 2 and Mellum models. Perfo ...
Large Language Models (LLMs) are increasingly integrated into development workflows for tasks such as code completion, bug fixing, and refactoring. While prior work has shown that removing low-quality data—including data smells like Self-Admitted Technical Debt (SATD)—from traini ...
Large Language Models (LLMs) are increasingly used for code-centric tasks. However, their training data often exhibits data smells that may hinder downstream quality. This research focuses on the “Uneven Natural Languages” smell and the presence of non-English text in source code ...

Enhancing Diabetes Care through AI-Driven Lie Detection in a Diabetes Support System

Testing the validity of lie detection using an SVM model trained on linguistic cues

This paper presents a deception-detection module for a diabetes support system, addressing the challenge of unreliable patient self-reporting and ultimately attempting to improve diabetes care. The research is for a system called CHIP developed by the Hybrid Intelligence project ...
Unreliable patient self-reporting complicates diabetes management. This study investigates how AI-generated summaries of patient-chatbot conversations can be structured to help healthcare professionals detect deception and non-adherence. To address this, we developed a novel pipe ...

Detecting Patient Information Conflicts through Conflict Reasoning in Knowledge Graphs

Enhancing Accuracy and Reliability in a Diabetes Support System

Lifestyle management systems aim to provide personalized health guidance by interpreting patient's self-reported data. However, these systems often overlook the temporal consistency of behavioral patterns, risking inaccurate or misleading recommendations. To address this, we pres ...
Bayesian Neural Networks (BNNs) offer uncertainty quantification but are computationally expensive, limiting their practical deployment. This paper introduces a neuron-level pruning framework that reduces BNN complexity while preserving predictive performance. Unlike existing wei ...

To Deceive or Self-Deceive?

Framing Language to Discourage Deception in Diabetes Lifestyle Management Systems

Deceptive self-reporting in diabetes lifestyle management (DLM) systems limits their ability to offer meaningful and accurate support. Deception can function as a self-protective mechanism, driven by factors such as low self-esteem or the desire to protect self-image. This resear ...

Entropy-Based Modeling For Detecting Behavioral Anomalies in Users of a Diabetes Lifestyle Management Support System

Identifying non-adherence indicators in a chatbot-based diabetes support system

Individuals with diabetes face rigorous demands when it comes to managing their health, yet patients sometimes struggle to stay adherent to treatment. CHIP is an AI-based conversational platform that allows patients to report lifestyle factors and receive personalized suppor ...

Efficient Query Estimation by Vector Averaging in Dual-Encoder Re-Ranking

Estimating Query Embeddings as Weighted Average of Document Embeddings and Lightweight Query Encoding

A central problem in information retrieval (IR) is passage ranking, where the task is to retrieve passages from a corpus and order them in decreasing relevance to an arbitrary search query.
Traditional lexical retrieval methods are susceptible to the vocabulary mismatch probl ...

Exploring Neural IR Approaches in Europeana

Unlocking Multilingual Insights for Cultural Heritage Search

Europeana is a digital library of Europe's cultural heritage, housing a large corpus of data representing artworks, literature, historical locations and many culturally significant items. Europeana currently relies of traditional text-matching retrieval, such as BM25, to facilita ...
Efficient and effective information retrieval (IR) systems are needed to fetch a large number of relevant documents and present them based on their relevance to the input queries. Previous work reported the use of sparse and dense retrievers. Sparse retrievers offer low latency b ...

Finding the Needle in the Pre-Trained Model Zoo

The Use of Rich Metadata and Graph Learning to Estimate Task Transferability

The democratization of machine learning through public repositories, often known as model zoos, has significantly increased the availability of pre-trained models for practitioners. However, this abundance can make it difficult to choose the most suitable pre-trained model for fi ...
Synthetic tabular data generated by tabular generative models represent an effective means of augmenting and sharing data. It is of paramount importance to trace and audit such synthetic data, avoiding potential harms and risks associated with inappropriate usage. While watermark ...

Improving Adversarial Attacks on Decision Tree Ensembles

Exploring the impact of starting points on attack performance

Most of the adversarial attacks suitable for attacking decision tree ensembles work by doing multiple local searches from randomly selected starting points, around the to be attacked victim. In this thesis we investigate the impact of these starting points on the performance of t ...

Bilateral teleoperation with force feedback aims to transmit human expertise over long distances by transferring the sensation of physical contact. One of the primary challenges in achieving this goal is the ultra low latency requirement. Tactile internet and ...

Logs to the Rescue

Creating meaningful representations from log files for Anomaly Detection

This thesis offers a comprehensive exploration of log-based anomaly detection within the domain of cybersecurity incident response. The research describes a different approach and explores relevant log features for language model training, experimentation with different language ...
In recent years, there has been a growing interest among researchers in the explainability, fairness, and robustness of Computer Vision models. While studies have explored the usability of these models for end users, limited research has delved into the challenges and requirement ...
The advancement of wireless communication technologies has transformed how we exchange information in our daily lives. However, the increasing demand for wireless communication faces challenges due to limited radio wave bandwidth availability. In this context, visible light commu ...