M. Khosla
Please Note
27 records found
1
Q-learning, as a well-known reinforcement learning algorithm, is prone to overestimation of action values. Such overestimation is mainly due to the use of the maximization operator when updating the Q function. Although existing approaches attempt to reduce overestimation bias, they typically retain the maximization or minimization operator in the update process. Recognizing that these operators are the root cause of biased value estimation, we aim to eliminate these operators altogether. An existing tabular RL algorithm, QV-learning, jointly learns a state-value function and an action-value function without using the maximization or minimization operator; however, it leaves the analysis related to overestimation bias unaddressed. We fill this gap by conducting a targeted evaluation of QV-learning with experience replay applied, demonstrating its significant effectiveness in addressing overestimation bias and superior sample efficiency. Notably, we provide a theoretical analysis of the optimal convergence of QV-learning, which is absent from prior studies. Moreover, we propose a novel deep RL extension of QV-learning, called Deep VQ-Networks (DVQN). Given the noisy learning environment in the deep RL setting, DVQN accounts for the exploration policy's bias towards the overestimated actions, thereby reducing the collection of poor data caused by overestimation and improving training efficiency. We evaluate DVQN across ten Atari game domains and demonstrate that it achieves performance that is either superior to or comparable with baselines including: Deep Q Networks, Deep SARSA, Deep Double Q Networks, Clipped Deep Double Q Networks, Averaged DQN, Dueling DQN and DQV-learning.
Bicycle transportation, a low-carbon option, is essential for promoting sustainable urban mobility. However, predicting bicycle traffic is challenging due to limited investments in data collection, especially in smaller cities. This paper proposes a multi-source transfer learning spatial-temporal graph neural network (Multi-TLSTGCN) for accurate bicycle traffic prediction in target cities with limited available data. This study first examines how to transfer knowledge from single source domain to the target domain while mitigating the risk of negative transfer. Following this, a multi-source adaptive transfer learning approach is developed to optimize traffic prediction in the target domain by adaptively integrating knowledge from multiple sources. Finally, the performance of the Multi-TLSTGCN model is evaluated under various levels of target data scarcity and compared with models that do not incorporate source domain knowledge. The experimental results demonstrate several key insights: 1) Models fine-tuned with a single-cluster pre-trained source model where the clusters are formed based on similar traffic patterns are more effective at minimizing negative knowledge transfer than those fine-tuned with single-city pre-trained source models. 2) The proposed Multi-TLSTGCN outperforms baseline models in bicycle traffic prediction, showing promise for accurate predictions in data-scarce environments; and 3) The Multi-TLSTGCN model remains robust across varying levels of data scarcity, exhibiting only a slight decrease in accuracy as the availability of target data decreases, in contrast to models relying solely on target domain data. These findings highlight the Multi-TLSTGCN model as an effective and promising solution for bicycle traffic prediction with limited data availability.
Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transformer-based language models, which are notoriously inefficient in terms of resources and latency.We propose Fast-Forward indexes - vector forward indexes which exploit the semantic matching capabilities of dual-encoder models for efficient and effective re-ranking. Our framework enables re-ranking at very high retrieval depths and combines the merits of both lexical and semantic matching via score interpolation. Furthermore, in order to mitigate the limitations of dual-encoders, we tackle two main challenges: Firstly, we improve computational efficiency by either pre-computing representations, avoiding unnecessary computations altogether, or reducing the complexity of encoders. This allows us to considerably improve ranking efficiency and latency. Secondly, we optimize the memory footprint and maintenance cost of indexes; we propose two complementary techniques to reduce the index size and show that, by dynamically dropping irrelevant document tokens, the index maintenance efficiency can be improved substantially.We perform an evaluation to show the effectiveness and efficiency of Fast-Forward indexes - our method has low latency and achieves competitive results without the need for hardware acceleration, such as GPUs.
Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods.
Graph structures are ubiquitous throughout the natural sciences. Here we develop an approach that exploits the quantum source's graph structure to improve learning via an arbitrary quantum neural network (QNN) ansatz. In particular, we devise and optimize a self-supervised objective to capture the information-theoretic closeness of the quantum states in the training of a QNN. Numerical simulations show that our approach improves the learning efficiency and the generalization behavior of the base QNN. On a practical note, scalable quantum implementations of the learning procedure described in this paper are likely feasible on the next generation of quantum computing devices.
Zorro
Valid, sparse, and stable explanations in graph neural networks
With the ever-increasing popularity and applications of graph neural networks, several proposals have been made to explain and understand the decisions of a graph neural network. Explanations for graph neural networks differ in principle from other input settings. It is important to attribute the decision to input features and other related instances connected by the graph structure. We find that the previous explanation generation approaches that maximize the mutual information between the label distribution produced by the model and the explanation to be restrictive. Specifically, existing approaches do not enforce explanations to be valid, sparse, or robust to input perturbations. In this paper, we lay down some of the fundamental principles that an explanation method for graph neural networks should follow and introduce a metric RDT-Fidelity as a measure of the explanation's effectiveness. We propose a novel approach Zorro based on the principles from rate-distortion theory that uses a simple combinatorial procedure to optimize for RDT-Fidelity. Extensive experiments on real and synthetic datasets reveal that Zorro produces sparser, stable, and more faithful explanations than existing graph neural network explanation approaches.
MuCoMiD
A Multitask graph Convolutional Learning Framework for miRNA-Disease Association Prediction
Growing evidence from recent studies implies that microRNAs or miRNAs could serve as biomarkers in various complex human diseases. Since wet-lab experiments for detecting miRNAs associated with a disease are expensive and time-consuming, machine learning techniques for miRNA-disease association prediction have attracted much attention in recent years. A big challenge in building reliable machine learning models is that of data scarcity. In particular, existing approaches trained on the available small datasets, even when combined with precalculated handcrafted input features, often suffer from bad generalization and data leakage problems. We overcome the limitations of existing works by proposing a novel multitask graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (associations between miRNAs/diseases and protein-coding genes (PCGs), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multitask setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we conduct large-scale experiments on the standard benchmark datasets as well as on our proposed large independent testing sets and case studies. MuCoMiD obtains significantly higher Average Precision (AP) scores than all benchmarked models on three large independent testing sets, especially those with many new miRNAs, as well as in the detection of false positives. Thanks to its capability of learning directly from raw input information, MuCoMiD is easier to maintain and update than handcrafted feature-based methods, which would require recomputation of features every time there is a change in the original information sources (e.g., disease ontology, miRNA/disease-PCG associations, etc.). We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.
There has been significant progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives, and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. We argue that most of the UNRL approaches either model and exploit neighborhood or what we call context information of a node. These methods largely differ in their definitions and exploitation of context. Consequently, we propose a framework that casts a variety of approaches-random walk based, matrix factorization and deep learning based-into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study their differences which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering nine popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks-node classification and link prediction. We find that for non-attributed graphs there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition, we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
Background: Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prevention and treatment of virus-related diseases. However, the task of predicting protein–protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. Results: We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein–protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein–protein interaction prediction model. Conclusions: Our approach achieved competitive results on 13 benchmark datasets and the case study for the SARS-CoV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein–protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer.
The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.