A. Katsifodimos
38 records found
1
In the digital age, the proliferation of personal data within databases has made them prime targets for cyberattacks. As the volume of data increases, so does the frequency and sophistication of these attacks. This thesis investigates database security threats by deploying open s
...
Security researchers and industry firms employ Internet-wide scanning for information collection, vulnerability detection and security evaluation, while cybercriminals make use of it to find and attack unsecured devices. Internet scanning plays a considerable role in threat
...
The advancement of artificial intelligence (AI) has led to an increased demand for both a greater volume and quality of data. In many companies, data is dispersed across multiple tables, yet AI models typically require data in a single table format. This necessitates the merging
...
This thesis embarks on the quest to efficiently compute similarities between data streams in real-time, a task burgeoning in importance with the advent of big data and real-time analytics. At the heart of this endeavor is the expansion of the Condor framework to accommodate new p
...
Similarity joins are operations which involve identifying similar pairs of records within one or multiple datasets. These operations are typically time-sensitive, as timely identification of relations can lead to increased profitability. Therefore, it is advantageous to analyze t
...
General-purpose GPUs, renowned for their exceptional parallel processing capabilities and throughput, hold great promise for enhancing the efficiency of data analytics tasks. At the same time, recent developments in query execution engines have integrated the support of OLAP oper
...
The use of data streams has increased a lot over the last two decades or so. and
With this increase comes the need for fast and consistent fault recovery. Rollback
recovery mechanisms from traditional distributed systems have been adapted successfully for stream engines. ...
With this increase comes the need for fast and consistent fault recovery. Rollback
recovery mechanisms from traditional distributed systems have been adapted successfully for stream engines. ...
Serverless computing has allowed developers to write pieces of code comprising solely of the necessary functionality whilst not having to think about the underlying infrastructure. One prominent model is Function-as-a-Service (FaaS), where the code is structured into functions th
...
Today's need for highly available systems leads to data partitioning and replication across multiple nodes. Providing strong transactional consistency in a distributed database requires extensive communication. For this, algorithms such as two phase commit are used. These communi
...
The adoption of the serverless architecture and the Function-as-a-Service model has significantly increased in recent years, with more enterprises migrating their software and hardware to the cloud. However, most applications require state management, leading to the use of extern
...
Distributed databases often struggle to fulfill their transactional isolation guarantees due to sharding and replication. As a result, the problem of checking isolation levels is consistently receiving attention from academia and industries. Transactional dependency graphs form a
...
Enriching Machine Learning Model Metadata
Collecting performance metadata through automatic evaluation
As the sharing of machine learning (ML) models has increased in popularity, more so-called model zoos are created. These repositories facilitate the sharing of models and their metadata, and other people to find and re-use an existing model. However, the metadata provided for mod
...
Serverless computing is an increasingly popular paradigm in cloud computing where many of the operational challenges of running cloud applications, like server provi- sioning and management, are left to the cloud provider. A popular form of server- less computing is Functions-as-
...
Matching schemas is a fundamental task in data integration and semantic web applications. However, generating labeled data for schema matching tasks is challenging, requiring an efficient and effective approach. This thesis addresses this challenge by investigating schema matchin
...
In real-world scenarios, users provide invaluable data; however, this data is inherently incoherent, incomplete, and duplicated, i.e., different data rows refer to the same real-world object. Merging duplications to a single entry broadens the knowledge of a given real-world obje
...
Stream Processing Engines (SPEs) are called upon to help solve problems around big and volatile data, while satisfying the needs for near real-time processing. In order for such systems to be considered effective solutions to such problems at scale, efficient elasticity and non d
...
In this thesis we aim to research and design different neural models for session recommendation. We investigate the fundamental neural models for session recommendation, namely BERT4Rec, SASRec and GRU4Rec and subsequently use our findings to design a simpler but performant neura
...
Generating synthetic images has wide applications in several fields such as creating datasets for machine learning or using these images to investigate the behaviour of machine learning models. An essential requirement when generating images is to control aspects such as the enti
...
As serverless computing grows in popularity, developers are demanding more from existing serverless models. One example is the emergence of Stateful Function as a Service (SFaaS), in which state is added to operators in existing Function as a Service (FaaS) models, to support mic
...
The workflow of a data science practitioner includes gathering information from different sources and applying machine learning (ML) models. Such dispersed information can be combined through a process known as Data Integration (DI), which defines relations between entities and a
...