V. Viswanathan | TU Delft Repository

Scaling Reasoning and Ranking Feedback for Multi-Hop Question Answering

From Many Thoughts to One Truth

Master thesis (2025) - T. Rajiv Kumar, Avishek Anand, Venktesh Viswanathan, Sicco Verwer

Open-domain question answering (ODQA) often requires integrating evidence from multiple sources and reasoning across several steps. While recent work has made progress on retrieval and reasoning independently, their combined optimization remains challenging. Standard retrieval methods may fail to surface all relevant documents, while reasoning models can generate confident but incorrect answers when evidence is incomplete, noisy, or inconsistent. This limits both accuracy and the reliability of model predictions, particularly for multi-hop and compositional questions.

This thesis explores strategies to jointly enhance retrieval and reasoning for ODQA. On the retrieval side, we introduce a dynamic answer frontier mechanism that prioritizes candidate documents based on semantic consistency across multiple generated answers, guiding iterative document expansion over a retrieval graph. This consistency-driven approach improves recall by promoting documents aligned with the most reliable reasoning traces. On the reasoning side, we apply test-time scaling (TTS), generating multiple candidate answers per question and training a verifier model to select the most trustworthy one. The verifier evaluates both semantic correctness and grounding in retrieved evidence, mitigating the effects of misleading or irrelevant documents.

We evaluate the proposed approach on two challenging ODQA benchmarks, MuSiQue and 2WikiMultiHopQA, which require complex multi-hop reasoning and resist shortcut solutions. Experimental results show that our method improves evidence recall and downstream answer accuracy over strong baselines, including standard retrieval pipelines and semantic uncertainty-based re-ranking methods. Qualitative analysis reveals better handling of compositional queries, including temporal comparisons and multi-hop relational reasoning, along with improved resilience to noisy retrievals and reduced divergence from relevant evidence. The study also identifies remaining challenges, such as reliance on agreement as a proxy for correctness and the computational cost of TTS, pointing to future directions involving principled uncertainty measures, end-to-end feedback integration, and efficiency improvements for exploring larger reasoning spaces. Together, these findings underscore the value of integrating semantic consistency-driven retrieval with verifier-guided reasoning selection to advance robustness and trustworthiness in complex ODQA systems. ...

Open-domain question answering (ODQA) often requires integrating evidence from multiple sources and reasoning across several steps. While recent work has made progress on retrieval and reasoning independently, their combined optimization remains challenging. Standard retrieval methods may fail to surface all relevant documents, while reasoning models can generate confident but incorrect answers when evidence is incomplete, noisy, or inconsistent. This limits both accuracy and the reliability of model predictions, particularly for multi-hop and compositional questions.

This thesis explores strategies to jointly enhance retrieval and reasoning for ODQA. On the retrieval side, we introduce a dynamic answer frontier mechanism that prioritizes candidate documents based on semantic consistency across multiple generated answers, guiding iterative document expansion over a retrieval graph. This consistency-driven approach improves recall by promoting documents aligned with the most reliable reasoning traces. On the reasoning side, we apply test-time scaling (TTS), generating multiple candidate answers per question and training a verifier model to select the most trustworthy one. The verifier evaluates both semantic correctness and grounding in retrieved evidence, mitigating the effects of misleading or irrelevant documents.

We evaluate the proposed approach on two challenging ODQA benchmarks, MuSiQue and 2WikiMultiHopQA, which require complex multi-hop reasoning and resist shortcut solutions. Experimental results show that our method improves evidence recall and downstream answer accuracy over strong baselines, including standard retrieval pipelines and semantic uncertainty-based re-ranking methods. Qualitative analysis reveals better handling of compositional queries, including temporal comparisons and multi-hop relational reasoning, along with improved resilience to noisy retrievals and reduced divergence from relevant evidence. The study also identifies remaining challenges, such as reliance on agreement as a proxy for correctness and the computational cost of TTS, pointing to future directions involving principled uncertainty measures, end-to-end feedback integration, and efficiency improvements for exploring larger reasoning spaces. Together, these findings underscore the value of integrating semantic consistency-driven retrieval with verifier-guided reasoning selection to advance robustness and trustworthiness in complex ODQA systems.

Efficient Fact-checking through Supporting Facts Extraction from Large Data Collections

Master thesis (2024) - K.R. Nanhekhan, A. Anand, V. Viswanathan, P.K. Murukannaiah

Amidst the rampant spread of misinformation, fact-checking of diverse claims made on the internet has become a pertinent task to mitigate this problem. Manual fact-checking cannot scale up with this demand and is very cumbersome, therefore instead automated fact-checking can be used. However, existing work has primarily focused on the fact-verification part rather than evidence retrieval for large data collections, leading to scalability issues for practical applications. In this study, we address this gap by exploring various methods for indexing a succinct set of supporting facts extracted from large data collections and enhancing the retrieval phase of the fact-checking pipeline. Our evaluation, consisting of measuring the performance and efficiency, is performed on the state-of-the-art claim datasets HoVer and WiCE, where we utilised the English Wikipedia as a large evidence data collection. Overall our results underscore the effectiveness of integrating supporting facts and advanced retrieval techniques for fact-checking pipelines in practical applications. We achieve, through a combination of indexing supporting facts together with Dense retrieval and Index compression, a massive improvement over the original fact-checking pipeline. This is up to a 10.0x speedup using a CPU-based approach and up to a 20.0x speedup using a GPU-based approach, while only incurring a modest loss of less than 6 points in accuracy. ...

Co-designing data-enabled information support for different chronic patient communities

Master thesis (2024) - D. Quijada Fernández, R.H.M. Goossens, Jiwon Jung, V. Viswanathan

This project aimed to facilitate information support between clinicians and patients that is dynamic to the milestones in their care path and can be incrementally adapted to different chronic diseases at ErasmusMC. The project strived to envision a foundational service that informs holistically about the doubts and concerns of patient communities throughout their care journey and can be progressively incorporated into clinicians’ workflows.

Research was done to find patterns between the online patient stories from community support forums and to identify value opportunities for intervention that align with the clinicians’ aspirations, motivations and needs. The research activities included:

Desk research of relevant literature (Chapter 2).
Contextual inquiry through a combination of human interpretation of patient experience data and computational analysis (Chapter 3).
Co-creation sessions to gather information about opportunities for improving information support from a data-enabled design perspective (Chapter 4).

The data categories derived from the contextual inquiry were used to map transactional services in the online patient support groups and ideate on new transactional services for the context of remote patient monitoring. The co-creation sessions inspired a service vision and a set of guiding principles that were used to conceptualise a service system for information support, which could improve the curation of patient support knowledge resources. It was decided to focus on information support among the different types of social support due to the co-exploration of the data categories with clinicians.
Ideation on a service system enabling dynamic and incremental information support resulted in three essential modules or features of the service system:
The first module, dynamic guidance, enables Erasmus MC to use recurrent milestones in the personalised care plan of patients to standardise the provision of information resources in templates. The patient community could progressively rate the usefulness and clarity of such resources to provide recommendations to the rest of the patient community.
The second module, PX data collection, offers the efficient collection of patients’ self-reported concerns and doubts for internal system and content improvements.
The third module, community appraisal, discusses how the development and moderation of conversations among peers could not only facilitate patients’ self-evaluation and emotional support but also the periodic research of shifting or uncovered areas of concerns, experiences and doubts among the patient community.
The interconnections between these modules have been conceptualised through a service blueprint, which was presented to ML and AI researchers to refine the supporting software processes.
These service features or modules could strategically be developed and implemented within existing eHealth applications within specific departments or in a foundational self-monitoring application for ErasmusMC that is shared by different departments (e.g., surgical oncology, pulmonology).

Outcomes
Thematic categorization of patient experience data has been established, which can be used to cluster results of unsupervised topic modelling for other patient communities and compare the results. A better understanding of guiding principles to design data-enabled services and systems, which facilitate information support for patient communities, has been achieved. A service system is proposed to standardise and incrementally fine-tune resources for different patient communities. Future developments are envisioned which encompass state-of-the-art machine learning techniques and interface/service design.
...

This project aimed to facilitate information support between clinicians and patients that is dynamic to the milestones in their care path and can be incrementally adapted to different chronic diseases at ErasmusMC. The project strived to envision a foundational service that informs holistically about the doubts and concerns of patient communities throughout their care journey and can be progressively incorporated into clinicians’ workflows.

Research was done to find patterns between the online patient stories from community support forums and to identify value opportunities for intervention that align with the clinicians’ aspirations, motivations and needs. The research activities included:

Desk research of relevant literature (Chapter 2).
Contextual inquiry through a combination of human interpretation of patient experience data and computational analysis (Chapter 3).
Co-creation sessions to gather information about opportunities for improving information support from a data-enabled design perspective (Chapter 4).

The data categories derived from the contextual inquiry were used to map transactional services in the online patient support groups and ideate on new transactional services for the context of remote patient monitoring. The co-creation sessions inspired a service vision and a set of guiding principles that were used to conceptualise a service system for information support, which could improve the curation of patient support knowledge resources. It was decided to focus on information support among the different types of social support due to the co-exploration of the data categories with clinicians.
Ideation on a service system enabling dynamic and incremental information support resulted in three essential modules or features of the service system:
The first module, dynamic guidance, enables Erasmus MC to use recurrent milestones in the personalised care plan of patients to standardise the provision of information resources in templates. The patient community could progressively rate the usefulness and clarity of such resources to provide recommendations to the rest of the patient community.
The second module, PX data collection, offers the efficient collection of patients’ self-reported concerns and doubts for internal system and content improvements.
The third module, community appraisal, discusses how the development and moderation of conversations among peers could not only facilitate patients’ self-evaluation and emotional support but also the periodic research of shifting or uncovered areas of concerns, experiences and doubts among the patient community.
The interconnections between these modules have been conceptualised through a service blueprint, which was presented to ML and AI researchers to refine the supporting software processes.
These service features or modules could strategically be developed and implemented within existing eHealth applications within specific departments or in a foundational self-monitoring application for ErasmusMC that is shared by different departments (e.g., surgical oncology, pulmonology).

Outcomes
Thematic categorization of patient experience data has been established, which can be used to cluster results of unsupervised topic modelling for other patient communities and compare the results. A better understanding of guiding principles to design data-enabled services and systems, which facilitate information support for patient communities, has been achieved. A service system is proposed to standardise and incrementally fine-tune resources for different patient communities. Future developments are envisioned which encompass state-of-the-art machine learning techniques and interface/service design.